Showing posts with label java. Show all posts
Showing posts with label java. Show all posts

Wednesday, August 19, 2009

Domain-Driven Design, Services, and Data Access

The term Domain-Driven Design, or DDD, has long become a buzz phrase in the software community – after it was coined and popularized by Eric Evans in his book “Domain-Driven Design: Tackling Complexity in the Heart of Software”. The book does an excellent job summarizing the essential principles of object-oriented software design based on domain modeling, stressing the importance of strategic design, refactoring, etc., as well as documenting various practical design patterns (some of which I like better than the others, by the way.) It is fair to say that long before the book first came out in 2004 and the DDD term was coined, the importance and necessity of proper modeling of software systems based on distinct clearly defined functional domains was well understood among the better software engineers. However, the well-written and intelligently organized book on the subject was immediately embraced as a consolidated reference and effective tool in evangelizing the concepts of good design.

To represent a system in a clear and effective way, a model should draw a clean distinction between its artifacts based on their roles and functional domains they relate to, as well as accurately define the relationships between such artifacts and between the domains themselves. It is not only important and extremely useful to distinguish the model artifacts by their roles and purposes, but to group them by the functional domains.

Today, few people argue the benefits of the DDD principles. As DDD has gained wide acceptance and recognition, it inevitably became a buzz phrase. With that came some unwanted side effects.

Like any sophisticated and effective design methodology, DDD has not been born out of nothing. It has its prerequisites. Understanding of DDD requires familiarity and deep understanding of the programming fundamentals, structured design, object-oriented principles, modularity, re-factoring, etc. More than anything, to appreciate and benefit from DDD the programmer must truly understand the dangers of excessive complexity of software, and feel the sincere need to avoid such complexity by applying intelligent design and creative thinking. In other words, mechanical following the patterns listed in a book will lead you nowhere. Your heart and mind must be in it!

Lately, I have been reading or hearing comments similar to this: “Oh, we don't use services; we use DDD.” That usually comes with an implication that “services” are so last season...

Hmmm... Why? Is such point of view based on only a superficial familiarity with DDD (perhaps, limited to a second-hand familiarity with one or two of the design patterns from the Evans book?) Or is it, perhaps, based on a misinterpretation of Martin Fowler's article in which Fowler talks about the flaws of the anemic domain model? In the anemic model, domain objects are reduced to all but data holders with getters and setters, while procedural services implement all the behavior applicable to the domain objects. Fowler – rightfully so – dismisses such approach as an anti-pattern and a quite incompetent [mis]use of objects. I whole-heartedly agree with him, and anyone who thinks that objects in a well-designed model should encapsulate both data and behavior. In the context of a Domain Model, this means that domain entities must implement domain-entity-specific logic that belongs on those entities, the logic essential to the definition of the nature of the entity, the logic without which no instance of the entity would make complete sense. Such logic must include the implementation of the ways the entity manages its own data as well as the relationships with other domain entities that the entity instances directly rely on.

The logic inside an entity class should not, however, include anything that is completely foreign to the domain model in question, e.g. the business logic specific to a particular application. That includes any data access logic. Such logic, according to Eric Evans, should live in the Application Layer, or... services.

In other words, domain entities should contain as much logic as possible, but no more than that!

However let's take a look at how some programmers (and architects!) interpret this seemingly simple and obvious concept. It is not uncommon to see application logic and data access logic wired directly into the domain entity classes. Needless to say, this makes the domain objects rigid, hard-wired for a specific application, and often all but useless when new business scenarios come up – even within the same enterprise. Have these folks, perhaps, skipped the “Introduction to Structural Programming” and “Object-Oriented Programming, Part I” classes on their way to becoming Sr. Software Engineers?

Embedding any notion of persistence (be it self-persisting methods or a reference to some object repository) is not my idea of a good design pattern, by any means. (This is where I might disagree with Eric Evans, and many others, but I am sticking to my guns on this subject, and please don't crucify me for this.)

Just the fact that an instance of A may under certain circumstances participate in scenario B, does not mean that the logic for B must be forever burnt into A making the two inseparable. The basic rules of software design and common sense suggest that we must always do our best to separate and de-couple things that do not need to be molded together. Not the other way around!

Naturally, one of the most essential steps in object modeling is to properly identify which data and which behavior actually belong on each given object/class. That is not always a trivial task by any means. I may sound as if I am talking to my 3-year-old when I repeat the same thing over and over: not all functionality that may be applied to a class must live on that class. It seems such an obvious and easy to understand point. But why do programmers continue to stuff application logic into their domain entities? And, may I note, very proudly so. I have lost count of the self-proclaimed DDD nouveau experts whose idea of good OO design is consolidating all (thinkable and unthinkable) business logic inside a domain entity.

When asked, they always point at the Evans DDD book. As if the book really promotes such nonsense.

Have they really read and understood the book? Doesn't Eric Evans specifically state that any application logic should live in the Application Layer, or … er... application services? Amazingly, this simple idea is overlooked by the zealots who proudly state that they "use DDD, and not services."

All this only proves to me how dangerous any tool, methodology, or teaching can be in the hands of those who have no patience or desire to actually learn it and understand the principles behind it. Instead, many people prefer to skim the surface and stop as soon as they come across something that seems like a quick solution to their immediate problem. Unfortunately, such attitude is very common in the software industry today, in general. It is common for programmers to search for easy, ready-to-use solutions on the internet, grab the first one that looks like it does the job, then cut and paste it into their applications without understanding how it works. And it may not. I have seen that too often...

De-coupling the application logic from domain logic provides possibilities to use the same domain models in various contexts and applications, without having to produce duplicates, without creating more work and hurdles. Any functionality that does not conceptually belong on the domain entity should live on a different type of object.

Generally speaking, use-case-specific operations must live inside the object that implements the use case. It does not matter how you call such objects. Personally, I see no problem with calling such objects “services.” If you don't like the word or think it has been stigmatized, feel free to call it something else. Just don't put your specific application's logic into a true domain entity class.

Use cases are normally specific to a particular application, not a generic domain.

Domain models - generally - should be designed to be useful for more than one particular application. At the very least, they should be re-usable by potentially multiple applications within the enterprise, if appropriate, or multiple portlets within a portal, etc. One generic thing that a Domain may helpfully provide for the applications to implement is the API for domain service objects - whenever appropriate, of course.


Functional Domains as Reusable Components

It is convenient and very effective to consolidate each distinctive functional domain inside a dedicated component. This means that all software artifacts for the given domain are physically grouped together. In terms of Java packages, this suggests that all classes for the given domain are packaged under the same root package. Applications may use these domain packages as JAR dependencies, and, if necessary, provide their own, application-specific implementations of the domain service APIs (if the domain model provides such API), including the application-specific data access detail, – in their “application layers.”

People like to say that they don't believe in re-use, that any new project requires writing brand new classes from scratch anyway, etc. I categorically disagree with such philosophy. I can't tell you how many times I was able to benefit from re-using generic domain components I had written – by using them on more than one application. The key here is thoughtful design and getting things done well the first time around. There is no point in trying to re-use a poorly written rigid and not adoptive piece of code... That's why I believe that getting things well really pays off in a long run. Of course, as time goes by, I find myself making adjustments and improvements to such components, but that is called normal re-factorings.

A Domain Model normally consists of the following types of domain artifacts:

  • Domain entities; those are subjects and actors in the given domain; Domain entity objects must have no knowledge of any application-specific functionality whatsoever. They should not define any persistence logic, nor should they store any direct references to any type of objects that implement data access. For example, some ''UsZipCode'' class may implement such behavior as ZIP validation, parsing of the input data, splitting the ZIP into a 5-digit code, 4-digit extension, or representing itself in several different ways: 5-, 9-, or 11-digit code, etc. All of such functionality is essential to defining the very concept of a US Zip Code. However, it has nothing to do with the world outside the class itself. Nothing in such ZIP Code class depends on, or makes assumptions about, how the objects of the class will be used in applications. Entity definitions must be de-coupled from any external operations that may be performed on the entity instances. The necessity in such operations (use cases) may come and go, but the domain entity objects will remain what they are - regardless of how they are used. It is absolutely valid to expose domain entity objects to the presentation tier (e.g. controllers, form beans, action forms, etc.) and DAOs. If necessary, instances of domain entities may and should be passed between the presentation tier classes and application services in the middle-tier.

  • Domain Services (APIs): define the operations that do not conceptually belong on domain entities; these may be generic domain-specific use cases/scenarios that may be applicable to the domain entities but may not be considered the integral part of the essential behavior of any particular domain entity; these use cases define the context in which some instances of the domain entities may be used and any operations applicable to these entities; the actual implementations of these service APIs may be application specific. The clients of the services may be application presentation tiers, external applications (i.e. web applications, batch processes, web services, etc.) or other services. A ''service'' module exposes only the ''use cases'' the service implements – via its public API/Domain Model. Everything else, including any data access logic and technology, should be considered the implementation details of the service that are not directly exposed to the clients. Use cases implemented by services must reflect logical operations and never expose a notion of any particular data store. For example, a client that submits a product order should only be aware of the mere placing of the order, with no indications of what the underlying system actually does with the order, whether the data is stored in the database, etc. As Eric Evans and martin Fowler both point out, the application services would normally be fairly light-weight and abstracting the minimum application-specific logic and persistence, if necessary. It is absolutely fine for such services to do little more than forwarding to the underlying DAOs. That is a very small price to pay for flexibility, ability to swap implementations, and future maintenance.

  • Domain Factories that produce instances of domain objects;

  • Domain Value Objects, if necessary; I can't say I often find much use for pure value objects, however.

A domain-specific component abstracts any implementation details of the functionality it provides while exposing the operations via the ''public interfaces''. In some cases, it may not even have to provide the implementations at all – leaving that task up to its clients. In the terms of a programming language such as Java, the Application Programming Interfaces (APIs) of such components are defined as methods that represent the business operations of the given functional domain. These methods may live on the entities themselves, if appropriate, or on domain services if such services are applicable to the given domain. The domain entity objects may be returned by and/or accepted as the arguments of such interface methods.

Domain components may use other components and 3rd party libraries as their dependencies. Each component is developed, maintained, and distributed independently of any client applications or other components that may rely on their functionality.


Data Access

Some use cases implemented by a service or component may require access to data resources such as databases, in-memory data, file systems, web services, or other types of remote or local data sources.

I whole-heartedly believe that a data access operation should not be considered a use case in its own right outside the context of the business operation that requires that data access in order to complete. It should always be abstracted by a service that implements the use cases.

For example, some Order service may expose the "Place Order" Use Case API that reveals nothing about how and where, or whether, the order should be saved. It only tells the client that the order will be processed and the client can expect the result promised by the API. Once the client calls this generic API, the actual service implementation may (or may not) need to ask another service (e.g. some Product Service) to check whether the given product is available, etc., and then invoke its data access module to save the order in the data store defined by the service configuration. The Order service is agnostic of the Product service’s data access implementation and is only aware of the Product service’s public interface, e.g. the API that implements the “Check Product Availability” use case. The client, in this particular case, is not aware of any communication between the Order service and the Product service. The Order service API, in this example, serves as a single-entry façade for the clients who need to place orders. Simple.

A data access object (DAO) is a Java class (POJO, of course) that abstracts any data access logic and data source for the given service. DAOs are not limited to implementing access to databases. They may abstract any kind of data sources including in-memory data, files, access to data via web services, etc.

I do not share the preference of some architects that DAOs should be designed on the one-per-entity or one-per-table basis. Such approach, in my view, has at least two major flaws. First, it produces extra complexity due to a large number of finely grained DAO/repository classes. Second, it usually exposes the notion of persistence to the entities. A service, on the other hand, provides a single logical entry per use case while abstracting any DAOs/repositories inside and normally grouping data access functionality by the use-case relevance, not by entity type. This way, one may get by with only one or two DAOs per service vs. one per each entity type. Since data sources and the data access details are usually specific to a particular application, I lean towards providing application-specific implementations of DAOs (one or several) per each domain/application service.

Generally, a service may abstract a single DAO or a set of dedicated DAOs. Any given DAO may only be used by one service and not exposed to multiple services. Since data access operations are nothing more than implementation details of a more generic use cases represented by the service, no other service or application should ever need to access the DAOs directly. Instead, they must talk to the APIs of the service that wraps the DAO(s.)


The above are my personal views and recommendations that I hope the reader finds helpful and comprehensive. I don't have a goal of converting everyone to my point of view. These approaches to multi-tier architecture and domain modeling, however, are shared by many architects, and have proven quite effective. There is no single good way to solve every problem. Needless to say, variations and alternatives to the approaches described here are relevant and often desirable. The general goal however should never change: keep things simple and clean, as much as possible.

Tuesday, August 11, 2009

Error Handling and Exceptions in Java

   Table of Contents


Intended Audience

    There are countless books, articles, blogs, and software forum discussions on the subject of error handling. So, why am I writing another article that stands little chance of even being picked up by Google - in the sea of all other Internet posts and publications? First and foremost, based on the real-world projects I see every day, I strongly believe that error handling (or the lack thereof) in Java applications today is a huge problem that, on a large scale, is not being solved - or even seriously considered. Clumsy, improper, often thoughtless attempts on error-handling cripple thousands of Java projects making them unjustifiably complex and... error-prone! Second, I am all but convinced by now that this sad state of affairs takes place not despite the fact that there are so many books and articles written on the subject but rather due to the fact that the vast majority of those publications are forcefully promoting (unintentionally, of course) bad practices and fundamentally flawed concepts. On the other hand, those who have long figured out the better ways of doing things, keep relatively quiet as if they want to keep the secrets to themselves or, most likely, do not want to stir up the controversy or lose some of their followers. They often simply say: "do it this way", but they won't go to a great length to explain why. They know why, they know it works very well, and it is very simple. So they just make a well-meaning suggestion in a forum, and walk away - leaving it up to the more-than-enthusiastic "old-school" zealots to pour dirt all over their comment, dismissing all or most of it - often without even reading the whole thing.

    Most of today's cutting edge Java frameworks and latest specifications seem to be doing things exactly right. The Spring Framework, or even the EJB 3.x specification are just a couple of the most obvious examples. However, I have not yet seen a book or article written by a well-respected Java authority that would openly and forcefully explain the true essense of error handling, why some approaches are better and safer than others, and why projects like Spring, for example, are chosing their way of handling errors while shying away from one of the most hyped-up features of the Java language. Most publications, in fact, continue to promote the same principles, theories, and stereotypes that have proven to fail so miserably. Many egos are at stake, software engineers generally don't like to admit that they were wrong or that their well-intended solution didn't work very well after all. And I honestly think that it seriously hurts our industry.

    If you have been frustrated or dissatisfied with the way your projects handle errors, if you have witnessed how teams struggle chasing subtle untraceable exceptions that provide no answers, and if you are genuinely looking for a better way of doing things, read on. I hope this article answers many of your questions and helps you get on the right track with your next project.

I welcome comments and critical opinions. I only have one favor to ask of anyone who wishes to dismiss my point of view: please read the whole article first and make sure your arguments are not already addressed. It took a great deal of effort to put together this article. I have really tried to deliver the information in the most comprehensive and logical way - based on hundreds of typical questions and arguments I have heard in the past. One of the frustrating experiences has been hearing or reading people bringing up arguments whose flaws have been, I believe, very meticulously exposed in this article.

    I'd also like to note that I do not intend to convert anyone who is perfectly happy with the way they are doing things now, if they believe that their approaches, however different from mine, work well for them.


Introduction


Popular False Premise

    One fundamental flaw of most solutions to error handling in software is that error handling is approached from the wrong end, to begin with. Here's a typical statement taken from a randomly selected online tutorial on exception handling: "...when you write a method that may fail, you must indicate that it may fail and how it may fail." Please note the italicized text. Although the exact wording may vary, this is, generally, the fundamental premise of the vast majority of books, tutorials, blogs, or online articles on error handling you can find today. This is the axiom, the taken for granted fact that makes the basis for all error-handling methodologies and "best" practices promoted in all those publications. Unfortunately, the very premise that some methods may fail and some may not is just plain wrong!

    Amazingly, it is a commonly accepted and hardly disputed assumption that if a module may(!) produce an error, then an indication of that possibility must be somehow built into the module to let programmers know whether or not they should provide an error-handling strategy for that module. In other words, the common thinking is: "Let's see if the function tells us that an error may occur inside; if yes, we will decide how to handle the error." If you have just read this and did not immediately see the mind-boggling ridiculousness of that assumption, you may be one of the many who have fallen into the trap. It is this philosophy that makes software systems vulnerable by not at all "expecting" large sets of error conditions, and turns error-handling solutions into unmanageable, cumbersome, inefficient, and unreliable behemoths. I will try to do my best to explain.

    It takes a 180-degree change of the view angle to see how simple and elegant error handling may - and must - be.

Always Know: Anything May Fail!

    As I have just pointed out, the common premise is based on the assumption that all functions in a software program may be divided in two categories: the ones that may fail (according to the programmer who designs the operation), and the ones for which the programmer does not expect any failures. That assumption is incorrect and dangerous in its core. If you understand the simple concept of a function or method in a programming language, you will easily see why.

    A subroutine (a.k.a "method", "function", etc.) in a programming language is the most basic form of abstraction. The main purpose of abstraction in software is to hide the implementation details of a particular functional unit, and provide simplicity of usage by exposing only a minimalistic interface via which the clients/callers may effectively invoke the abstracted functionality. Among many benefits, the concept allows to later modify the implementation details without affecting any code that depends on it. The very definition of abstraction implies that no entity that uses the abstracted functionality is in the position to make any assumptions about the implementation details of that unit and may only be aware of the exposed interface. Bingo! (Oh, and before someone even thinks about stating that silly, childish argument... No, it absolutely does not matter if the same programmer writes the client code and the function the code call!) Once something is abstracted, you are not supposed to make any assumptions of the abstracted details! Period.

When it comes to error handling in software, the only safe and correct assumption that may ever be made is that a failure may occur in absolutely every subroutine or module that exists! There are no functions for which a programmer has the right and responsibility to safely assume no possibility of failure. (Again, I don't want to hear any lame demagogy about functions that do nothing, contain no code, or simply return a constant, etc. That does not build up to a valid argument in favor of ridiculously complicating things by distinguishing between methods that may fail and those that may not.)

    Even if the designer of the particular method thinks that the method is very simple and solid, it is not responsible (and not smart) to state or imply that it may never fail. A failure may happen due to a wide variety of reasons. The method may have bugs in it. The implementation of the method, or any underlying libraries it depends on may change in the future. Some low-level code that the method depends on may have subtle bugs or not designed to work as expected by a client - in certain environments. And so on. Of course, there are two different types of failures: system/software errors, and purposefully designed/signaled exceptional conditions whose intent is to reflect different types of legitimate business situations that disallow the successful execution of the "main" workflow. We will talk about these differences later. Some people would argue that the "possible failure" premise we mentioned in the beginning usually implies the latter type ("business" conditions.) It doesn't matter. The only safe and responsible way to program is to always KNOW that any method may potentially fail to perform its main task (if not due to a business condition then due to some other type of error) and design for both successful and negative outcomes. Assuming that some methods may fail and some may not will put you on a wrong track from the start.

Two Groups of Exceptional Conditions

    Since anything may fail, it is irresponsible and incompetent of a programmer to leave some operations out of the error handling strategy simply because they do not explicitly indicate that an error might occur inside. In other words, it makes no sense at all to specify on an individual basis whether a method or component may result in an error or not. The question, therefore, is not WHETHER but WHEN and HOW different types/classes of error conditions should be handled in the system. A properly designed system must ultimately handle ALL errors, one way or the other. Therefore, the application designer faces a fairly simple task of dividing all potential exceptional conditions that may occur within the system into two very basic catecories:

  • the few that actually matter; these are the conditions that must be handled individually due to the fact that the system is expected to (can and must) do something specific in case of such conditions - as defined by the actual needs of the application (requirements), and nothing else; such conditions may be further divided into classes that require (by design!) more specific individual handling; the application designer/programmer may identify and add new error types or exceptional conditions to this group at any stage of development;
  • everything else, i.e. all other exceptional conditions that the system cannot and/or does not care to provide any form of recovery other than, perhaps, logging and graceful termination.

What I Mean by "Handling"?

    For the sake of clarity, I want to note that the term "error handling" in this article means not only "recovering" from them and implementing an alternative action. When I say "handling" I absolutely mean any type of action - by the program or programmer - to process the error, including managed program termination. This means that programmer's bugs must be handled too - in a special way, since you can't guarantee that one of such bugs doesn't make its way into Production, and nothing should simply "blow up" in Production. Regardless of whether your software runs on a space shuttle where the lives of the astronauts are at stake, or you are developing a shopping cart web application where the last thing the end users should be exposed to is some bug in your code.) The answers to the "when" and "how" questions are provided by the system requirements, and nothing else. Period. Of course, it may be fine to allow a little test application to spits an exception stack trace in your face when it blows up. However, every finished professional application that is intended to run in a production environment should always be designed to direct the most complete error logs to a safe logging destination and exit gracefully even in cases of unrecoverable errors.

    The term "handler", as used in this article, refers to a dedicated software module that implements an action to manage a specific class or group of exceptional conditions.

Ultimate Objectives of Error Handling

    When an error occurs, the module must, of course, broadcast that error information somehow. So, the job of proper error handling is really to provide a mechanism that ensures that the error signals (the tokens that carry the information about the error and its origin) are fired and freely propagate from the sources of errors to the strategically placed handlers without affecting the business logic.

Logging of the most complete error information available should be one of the mandatory steps performed by an error handler. Each type of error should be handled (and, therefore, logged) no more than once within the sub-system where it has occurred. Sometimes it is helpful to intercept and log the error at the boundary of the component where the error occurred - before propagating it further to the client. That may be necessary if the component may be potentially deployed on a different server - to ensure that the error logs are available on that server, as well as to the client applications that would do their own logging.

Thoughtful Design and Planning vs. an Afterthought

    The biggest mistake a developer or architect can make with regards to error handling in a software system is treat it as a secondary issue, something that is less important than the "main" functionality, something that may be added at the last moment.

    In a proper design and analysis of any software module or system equal consideration must be given to the desired behavior of the system for both - successful execution and any types of failure. Depending on the system, the latter may be a simple single use case, or a set of quite elaborate use cases whose complexity may measure up to or even outweigh the complexity of the successful work flow.

    Therefore, error handling strategies must be considered an essential part of the system requirements and design. Error handling may never be an afterthought, something that programmers may (or may forget to) add after the "main" work flow is implemented. Regardless of error types (e.g. critical unrecoverable system failures or predictable business conditions treated as errors), the system must be designed to accommodate for graceful and efficient handling of each such class of conditions.

    As I have stated in the Introduction to this article, when approaching error handling in software development, perhaps, the most important thing to keep in mind is the fact that everything can potentially result in an error, and a failure must be always considered as a possible outcome of an operation. This means that any given method call may potentially produce an error of some sort. There are no operations whose successful execution may be unconditionally guaranteed at any time.

    The purpose of error-handling design is not to determine what may fail (since anything may) but to determine how to appropriately divide the complete error set into the sub-sets relevant to the application's desired behavior, and where in the system each such class of failures must be treated. Fortunately, even although errors may occur in thousands of places, a typical application only needs to distinguish between a few types/classes of such conditions. This means that only a few error handling strategies normally need to be implemented per application, and all errors must simply be sorted out and properly directed to the appropriate handler.


Consolidated Error Handling

    Centralized and planned error handling is key to building efficient error-proof software systems. Regardless of the philosophy or technology used, the fact remains that error handling may never be a thoughtless knee-jerk reaction to some alerts automatically generated by a compiler or any other tool. The decision when and how to handle errors - specific or any - must always ultimately be made by the programmer based on a thoughtful consideration of the application requirements and each particular use case.

    Errors may not be reliably handled or controlled in a sporadic decentralized manner. Instead of being sprinkled all over the application, error handling must be consolidated to ensure clarity, reliability, coherence, and prevent a loss of any valuable information about the cause of the error.

Error handling is the job of a few dedicated modules that are designed to treat failures based solely on the application/business requirements.

    In the agile development environment, each new requirement and exceptional case must be analyzed to determine which part of the system should be responsible for handling it. In a case of an exceptional error condition, the call stack must immediately unwind all the way to the dedicated handler that possesses the contextual knowledge and ability to handle the error properly. The error data must reach such handlers in its original form, unaltered, with its complete stack trace. The original error information may be supplemented with additional helpful data or message - but never shortened, replaced, or otherwise altered. This is the purpose of exceptions.
  

Exceptions Fundamentals

    Exceptions in a programming language essentially serve the following purpose. They trigger an immediate unwinding of the call stack (abort the thread execution) and signal the fact and nature of the error - to anyone along the call stack who is willing to listen and take action. This, obviously, provides many significant advantages over the old-fashioned error codes. Methods may throw more than one type of exception, and each such condition may be caught/handled separately and in different methods - without cumbersome conditional logic throughout the call stack. Most importantly, exceptions allow implementing error-handling code only in those modules where it is actually appropriate - instead of doing it in every module in the call stack starting immediately at the origin of the exceptional condition. The latter is inevitable with error codes that must be checked for by each link in the call chain and propagated to the one that actually knows what to do with the error. On the other hand, with exceptions, any methods that can't really contribute to handling the error can remain completely free of any error-handling logic whatsoever while focusing on their business logic instead. All this can lead to much cleaner code and consolidated error-handling functionality.

    When the exception mechanism triggers an exception, the instance of that exception class is broadcast to all the methods on the chain of method calls that eventually resulted in the exception. This means that any of those methods can potentially be coded to catch and handle the exception, if necessary, while others can ignore it. Naturally, there is always one such method in the call chain that is better suited to handle the error than the rest. Which one? The correct answer can only be provided by the application designer, the programmer that is. No tool - a compiler or IDE - can accurately, with 100% certainty determine where in the call stack the given error must be caught and handled. The decision usually depends on the circumstances, such as the application's or use case requirements, etc. So, it is the programmer who has the responsibility to carefully consider all those factors. The error handling constructs must be implemented in the module that contains the sufficient contextual knowledge and ability to take the appropriate corrective action in response to the condition.

    One thing that can be stated with certainty is that in the vast majority of cases, the most appropriate place to handle an error is NOT immediately at the origin of the error. The immediate caller of the method that triggers an exception rarely can and should do anything about the error - simply because it does not have enough contextual knowledge of the circumstances in which it itself was called. So, in the majority of cases, the role of the exception should be to efficiently and transparently carry the error signal through the call stack to the one method that has the intelligence to handle that exceptional condition. 


Java Exceptions

Noble Idea

    In addition to the "traditional" approach to exception handling described in the previous section, Java has introduced a new type of exceptions - checked exceptions. Such exceptions, once thrown by a method, must be defined in the method signature (for unchecked exceptions that is optional), and the compiler forces any immediate caller of the method to take action. To satisfy the compiler, the immediate caller must either catch and handle the exception itself, or to add it to its own signature, which, in turn, subjects each of its own callers to the same restrictions.

    The noble idea behind checked exceptions was to introduce a harness mechanism that would remind/force developers to handle errors before they bubble up to the surface and break the application. This concept may be used with benefits in small tight sub-systems where it is, indeed, necessary to force the immediate caller to handle the error at its origin. However, as noted above, such cases, while valid and existent, make up only a small minority of all real-life situations. So, in a vast majority of cases, for proper handling, an exception absolutely must be propagated further up the call stack - to the only appropriate module that was designated to handle the exception. And this is where checked exceptions introduce more problems than they solve.

    Read the interview with Anders Hejlsberg (the Chief Architect of the C# project) where the subject of checked exceptions is discussed in great detail. In that interview given back in 2003, Hejlsberg explains the most obvious pitfalls of the Java's implementation of checked exceptions, and why the C# committee had unanimously rejected the idea. He specifically talks about such issues as broken encapsulation, inappropriately forced contracts, negative effect on scalability and "versionabiliy" of systems built with checked exceptions.

Myth: Checked Exceptions Add Safety

    In addition to the issues highlighted by Hejlsberg, perhaps, the most essential - and often completely overlooked - is the fact that, in practice, checked exceptions have failed their most hyped promise: increased safety. Actually, the myth of added safety is arguably the most outrageous and dangerous misconception promoted by the advocates of checked exceptions. In fact, if I doubted the sincerity and good intentions of the advocates of checked exceptions, I would call it an outright lie.

Whether the authors of checked exceptions meant this or not, relying on the compiler to indicate which methods can throw exceptions (e.g. fail to do what's expected from it) led so many programmers to disregard the fact that anything may fail under certain circumstances and worry only about a small subset of all potential errors represented by checked exceptions - leaving huge holes in their applications. (I am not theorizing here: it is merely a fact of life, a pathetic reality that can be observed on thousands of Java projects.)

    Many people - perhaps, even subconsciously - interpret the mere presence of checked exceptions in the language as a suggestion that some modules/methods may result in errors while others may not! So, they forget to handle anything other than what the compiler forces them to handle... Need I say more?

Myth: Recoverable vs. Unrecoverable

    Another very common and dangerously misleading guideline - widely promoted by various books, articles, and, most importantly, by Sun Microsystems in their tutorial - is that checked exceptions are meant to indicate "recoverable" conditions, while unchecked exceptions represent only errors (programming errors, i.e. bugs.) Some less experienced (or less competent) programmers even conclude from the latter that since runtime exceptions are "unrecoverable", applications should not even do anything about them! (I only hope those guys are not employed by companies that build software for systems where human lives are at stake!) The "Recoverable vs. Unrecoverable" thesis is also suggested in one of my most favorite and respected books on Java - Joshua Bloch's "Effective Java", which is quite understood considering that the book was written by one of the authors of the JDK. With all fairness, Bloch also stresses that checked exceptions should be used with caution and not overused. He suggests to exercise the best judgement when deciding what should be considered recoverable, and when in doubt, opt for an unchecked exception. And that is the approach I use. However, the only conclusion I usually arrive at is that I am in no position to decide for the client how they would want and need to act upon the specific exception. An API defines how a client accesses the unit of functionality, and what results it should expect. No API can dictate how the client should use the result after the result has been obtained. Therefore, only the caller can and should decide what it wants to recover from, and what it does not care to recover from - based on its own needs and not on the implementation detail of the module it calls. On the other hand, any exceptional condition must be handled by a correctly designed system - at some point. The question is where and when. So, it is also very, very wrong to suggest that runtime exceptions are meant to be ignored.

Myth: Unchecked Exceptions are Bad and Dangerous

    Inevitably, it has become typical to handle checked exceptions and completely disregard unchecked exceptions, only to act surprised when the application suddenly crashes. This, in turn, has generated a phobia of unchecked/run-time exceptions among inexperienced programmers. Those who mistakenly believe that run-time exceptions should not be handled, blame them for application crashes. As the result, a large number of Java programmers today does not even realize that it is their responsibility to design for error handling. A runtime exception is a very useful carrier of error information that is openly available for anyone in the call chain who is interested. Unfortunately, many developers show no interest at all in listening for what's out there for them to use. Instead, they all but thoughtlessly react to a compiler alert (often feeling annoyed) and do something just to make the compiler error go away. Needless to say, such approach cannot lead to reliable error handling.

It is absolutely incorrect to assume that all runtime exceptions should not be caught and allowed to propagate to the very "top" of the application. As I have stressed at the beginning of this article, for every exceptional condition that is required to be handled distinctly - by the system/business requirements - programmers must decide where to catch it and what to do once the condition is caught. This must be done strictly according to the actual needs of the application, not based on a compiler alert. All other errors must be allowed to freely propagate to the topmost handler where they would be logged and a graceful (perhaps, termination) action will be taken. 

Reality: Correctly Handling Checked Exceptions is a Lot of Work

    In large systems, a significant amount of cumbersome, repetitive boilerplate code is necessary to handle checked exceptions properly - to ensure that the error information indeed propagates to the dedicated handler instead of being mishandled. The try/catch constructs and throw clauses litter multiple methods and layers throughout the application. They unjustifiably pollute and bloat methods that have neither knowledge or ability to do anything about the errors they catch and re-throw.

To properly utilize checked exceptions and ensure their propagation to the correct designated handlers in a typical real-world application, a great deal of developer awareness and competence is absolutely required. When checked exceptions are used, even a slightest sloppiness of the programmer inevitably results in chaos, unimaginable code clutter, redundancies and unnecessary complexity. This is at odds with the original intent of checked exceptions, which is to simplify error handling and make it easier for developers of all levels of expertise to know what and when to handle.

    In reality, checked exceptions end up confusing programmers, prompting them to handle or swallow exceptions before they reach the proper handler, and ultimately introducing more problems than they solve.

Business Exceptions

    If there is one thing that both camps (checked vs. unchecked) seemingly agree on is that exceptions should be primarily used to indicate exceptional conditions and should not be used to implement business logic.  This guideline has been widely promoted by most publications, starting with the very popular and well-respected book by Joshua Bloch, "Effective Java".  In reality, however, both camps tend to use exceptions to signal business conditions and direct business logic based on that. Personally, I think that occasionally using an exception to cut through the layers and quickly trigger an alternative business flow is perfectly acceptable, and may be very convenient. However, this should be done with caution and after evaluating all other options.

   For example, here is a very typical use case. The client calls some User service to create a new user entry.  If a new user is successfully created, a new object of some sort is returned.  That may be a user ID, user entity, or some kind of result object that is appropriate according to the use case specification.  Naturally, one well-expected legitimate scenario of this use case would be a situation when a user with the provided credentials already exists.  Many API designers prefer to throw some sort of a UserAlreadyExistsException in such cases, and many argue that it should be a checked exception - in order to explicitly inform the caller of this potential outcome. Any other type of failure to create a new user would indicate a more serious system failure (or even programming error.)  

    Here's my decision-making process when designing APIs for use cases like the one described above. First, I identify any "legitimate" situations in which the service will not create a new user, e.g. "the user already exists", or, perhaps, "the max allowed participants exceeded/enrollment temporarily suspended", etc.  I know that all other conditions would be represented with a generic service-specific runtime exception, e.g. "UserServiceException" (extends RuntimeException) whose only purpose is to signal the service's failure that is not related to a legitimate business situation, and to provide the most complete error info possible for the diagnostic purposes.  Then, I look at the other - legitimate - cases of failure to create a new user and see if my service might have more information than just the binary "yes/no" to return to the client.  If the service, upon its failure to create the user with the given data, has a whole lot of meaningful information, then it may be a good idea to have the API return a result object that the client would examine to see whether a user was created or whether the status of the returned object indicates one of the multiple meaningful conditions.  In our example, "the user exists" condition may only mean one thing: an entry with the given primary key already exists.  So, there is hardly anything else useful that can be returned to the client, and, perhaps throwing an exception is just fine. (Another option might be returning a null vs. a valid newly created object. But the latter may or may not be acceptable depending on the API specifications or specific development guidelines. If the null result may be ambiguous, such solution would not be good. Also, many guidelines discourage returning null for the sake of programming safety.) The bottom line is that throwing an exception in a case of a business condition is only one of several options, and, perhaps, more often than not, it is not the most preferred way. In case the analysis indicates that throwing a business exception is indeed the most convenient solution (and it may be), I would create a separate UserExistsException (that extends my generic  UserServiceException), document it in the API's Javadoc, and place it into the method signature. Since it is a runtime exception, it will not force the very immediate caller method on the client to mess with it but will give the client programmer the freedom to decide where to place the try/catch construct for that particular exception without polluting any methods on the call stack in between.  The client can simply implement a listener for this particular type of exception and not worry about it being a nuisance anywhere else. As I have said, many developers would prefer to make such business exception a checked exception, which is also a valid option, however I would vote for less burden on the client and providing more information via the method signature and Javadoc.  These days, most competent developers would catch a checked business exception on the client the first thing - as soon as they call the API that enforces the contract - and convert it into a runtime exception so it can freely propagate to the dedicated listener elsewhere on the client.

    It is reasonable to expect (and require) of each developer to understand the nature of any API they use. So, if the developer designs an application that has a need to implement the workflows for a successful user creation as well as for the situation when a user already exists, it is a full responsibility of that developer to understand how to use the "createUser" API.

Avoid Using Exceptions to Communicate Business Data

     If an exception does not represent a legitimate business condition, one should never put meaningful business data into the instance of the exception class! Generally, the information inside the exception is for diagnostics purposes only. In most cases, the only "business" information carried by an exception is the very fact of that particular exception being thrown. If you want to indicate a different type of condition, use a different exception class.

    It may be somewhat appropriate in cases of business exceptions to provide a public method or two on the exception class that would allow the clients/handlers to extract some supplementary data. Programmers should use their best judgement to decide what kind of data could be conveyed to the client via an exception, and whether it is appropriate to use that data in the client's business logic. Personally, I believe that in most cases it is not very appropriate or fair to expect the client to build its business logic based on some potential attribute values stored in the instance of the thrown exception class. Exception classes should not contain any "error codes" that are required to be correctly resolved by the client's business logic via all sorts of ugly conditional constructs. This, to a large extent, defeats the purpose of exceptions that is, among other things, to eliminate the necessity in such conditional logic. Unfortunately, this ridiculous pattern of exception misuse is widely spread, and it is even promoted by code examples in many Java books and articles! The worst possible example of bad usage of exception data is when clients are expected to parse the detail message and act upon whatever string tokens are detected.

    Exceptions must not carry any user-oriented messages inside them. That's another common form of exception abuse.  The "detail message" obtained via the Exception.getMessage() method is not meant for the presentation purposes. It is for diagnostics. If you need to present a user-friendly message in the UI in a case of a particular exception, the appropriate message must be resolved on the client side as part of the handling of that particular exception class - by the dedicated handler. It is your presentation tier - and only the presentation tier - that should be aware of the locales and message resources. Therefore, it is absolutely incompetent to place your presentation-tier-specific messages or message constants into the exception classes, especially those originated in other tiers.

    Any mappings between business exceptions and appropriate client actions or presentation details must be implemented on the client itself in a separate dedicated module. For a good example of this concept, I encourage you to take a close look at the Spring MVC's SimpleMappingExceptionResolver class and its usage.

Use Unchecked/Runtime Exceptions

NOTE: You should carefully read the previous chapters in this article before reading this!

    Developers of the modules that have no business of handling a particular error as it passes through do not need to be aware of that error at all. They do not need to bother with implementing any code to handle that error. Instead, only one module in the whole application must be implemented to listen to that particular exception, catch it when it bubbles up, and handle it according to the application requirements. Combined with the mandatory common-sense understanding that anything might potentially fail, runtime exceptions become a super-efficient tool that allows to consolidate error handling in a few strategically placed modules, and with bullet-proof safety.

    Like any technology or tool, unchecked exceptions may be mishandled, true. However, it is very easy to spot - during testing - a case when a particular run-time exception bubbles up way too close to the surface (or crashes the application!) instead of being handled earlier. A detection of such case indicates that a new handler must be introduced at a deeper level (as needed by the system) that will intercept that particular error before the more generic handler. No modifications to any business code would be required. It is significantly easier and less time-consuming than chasing a usually obscure problem caused by a mishandled checked exception.

    While there are cases where a checked exception does no harm and serves good purpose, for the sake of consistency and to avoid error-handling chaos, I recommend using only unchecked exceptions. Unfortunately, the particular implementation of the checked exceptions concept in Java introduces so many possibilities and encouragements for doing things wrong that any benefits of exception-re-enforced contracts between methods (where they may help) pale compared to the overall damage this mechanism causes to the software projects all over the world, and, most importantly, to the very perception of the error handling task and the role of a programmer in it.

    Checked JDK and 3rd-party exceptions could always be caught inside the implementations of the your classes, wrapped into module-specific run-time exceptions (usually, with an additional clarifying message), and re-thrown only to be caught and appropriately handled by the designated handler upstream.

Packaging Exception Classes

    Each exception class should normally reside inside the package with the classes that implement the exact functionality the exception relates to. It is a common mistake and bad practice to use a dedicated exceptions package that combines all exception classes for an application or component. Such approach violates modularity by forcing unnecessary package couplings: all other packages in the application become conjoint at the common exceptions package.

Summary

    I think that one very important but simple point is missing from most discussions on this subject: anything may fail. Period. I just can't stress this enough. If we insist that something in the method signature must remind the programmer about it, then every method's signature must be equipped with such indication, e.g. "throws Exception". The fact that some methods in Java are declared throwing an exception (checked) and some are not is seriously misleading since it implies that some methods may never result in errors, or if they do, the programmers may not need to worry about it. And so many of them don't.

When an error occurs within a function - regardless of whether the function declares a checked exception or not - that should never be a surprise to a developer! Just know that it may happen, and stop looking at the compiler to tell you whether it may or may not. It may. Period. End of story.


    Programmers must never rely on the source of the potential error to tell them whether they should expect an error or not!!! The necessity for error handling must be unconditionally assumed by all programmers at any time. What this means is that the programmer must always KNOW and REMEMBER that any module or operation may potentially - in certain conditions - fail or misfire. Once this simple fact is understood, and the application programmer knows that the module her code depends on may fail, all she has to decide is WHERE in the application (not WHETHER!) she should handle that potential class of errors - without having to do something about the error immediately at its origin. This is what we should be teaching programmers! Instead, we are telling them: "Hey, we have this cool feature in Java that will tell you precisely when you need to worry about errors; in case a method signature declares an exception, you should pay attention..." Are we kidding ourselves? Does anyone see how horribly destructive - and foolish - such approach is? It is one of the most harmful but well-concealed philosophies that has ever been burnt into the minds of millions of programmers. I can't tell you how many times I have heard a developer scream with rage at his application exploding in his face: "What the ...!!! I have taken care of all the exceptions and it still blows up!" What they are saying is that they had caught all the checked exceptions in their code, but didn't even think about any other potential failures. Checked exceptions re-enforce this horrific mindset. Instead of consciously expecting negative results as a given possible outcome of any operation and thoughtfully designing against them, we wait for the compiler to tell us whether we should worry about a possible error or not! That is just ridiculous. The implication of checked exceptions (not intentional, of course) is that if the method does not throw a checked exception, there's nothing to worry about. We know that it is not true because checked exceptions are just a subset of all possible exceptions that may be thrown by the code that implements the operation. And if we know that, we understand that even if we handle the checked exceptions, at some point we must take care of everything else. And if so, we don't need checked exceptions. The problem is that many today's Java programmers raised on checked exceptions are not used to think that way. As the result, they often forget to handle the rest of the exceptions. And you can't really blame them, because - with checked exceptions - the error handling code is everywhere, literally in every single method, i.e. in thousands of places. It is virtually impossible to keep track of what and where is being handled. That is the reason why so many errors are not handled at all, countless errors are mishandled, and so much of valuable error information is lost completely before it reaches the proper place in the system where sense could be made of it, where it could and should be properly handled.


    Fortunately, it's easy to build bullet-proof systems with only unchecked exceptions (much easier than ensuring proper handling of checked ones.) Any application, in reality, only needs to distinguish between a very limited set of exception classes that need to be handled separately by strategically placed dedicated handlers to which these classes of exceptions should propagate without being intercepted. Everything else should be allowed to bubble up to the "ultimate" top-level (the "toppest") handler that catches everything that was not handled before and handles it gracefully. (This means, once such [unchecked] exception is thrown, you don't have to worry about it anywhere in your code, knowing that it will be caught and handled by your top-level handler.) So, you normally start with creating such top-level handler that will catch everything inside the system.

    For example, in a Spring MVC application that would be an exception resolver class configured to catch and handle the most generic Exception class, and, perhaps, resolving the error to the generic error page that politely apologizes to the user and informs the user that the system is currently not able to process the request, etc. Of course, in a case of such web application, there must also be a filter defined in the web.xml file that would ensure that any critical exceptions that may potentially occur in JSP tags on the pages would also be caught and the application redirected to an error page. Such filter is required because any JSP tag exceptions occur outside the Spring-managed controller and will not be dispatched to the resolver registered in the Spring application context.

    Next, we need to identify the classes of conditions that must to be handled individually in a distinct manner. We decide where such events should be handled in our system - based on our system's requirements, and nothing else! For example, in Spring MVC web applications, often a single generic exception resolver class may be all you need to handle all your server-side exceptions! Such exception resolver would resolve all distinct exception classes by implementing the mappings between the exception classes and views (or, in the case of Spring WebFlow, flow states.) Based on the class of the exception thrown anywhere within the application, it will redirect the request to the appropriate view or state - possibly, after executing some logic, if necessary. The resolver would always fall back on the generic "catch-all" redirection, in case the caught exception class is not explicitly mapped to any view or state.

    If, during testing, we discover that some exception propagates too far (e.g. to the topmost handler) instead of being handled earlier, well, we immediately know that another lower-level handler is required (or that the particular exception should be caught by one of the existing lower-level handlers.) That's it. Very simple. And very reliable.

The Art and Craft of Software Engineering

With all the exciting new technologies, powerful intelligent tools and frameworks popping out almost every day, have we, perhaps, missed one strange and scary turn the software industry has taken at some point in the past years? Ask a typical hiring manager about what they are looking for in a software engineer, architect, programmer. You will typically be presented with a long list of technologies, methodologies, frameworks, and just plain flavor-of-the-day buzzwords that the manager is dying to see on the applicant's resume. Everybody is talking about Agile development, Scrum, fast-paced environments, etc. There would be nothing wrong with that - if one simple requirement and expectation had not been all but lost behind all these buzzwords. How about the basic talent for software craftsmanship? Since when does the familiarity with specific technologies and processes (often quite superficial, by the way) replace the creative vision and true ability to design and code good quality software? One may argue that the latter is always implied and is such an essential requirement that it would be silly even to mention it! I might believe that if I hadn't seen so many software engineers who couldn't design or write decent code even if their lives depended on that! Oh, and, by the way, all those folks happily and proudly claim having worked with the latest and the greatest technologies and tools. It is only because of what I have seen with my own eyes on so many projects, I present you with the rant that follows... ;)

It should not be a secret to anyone that any tool, technology, or process may be misused by an incompetent person. An incompetent programmer will continue to write bad code even if he or she uses a brilliant framework such as Spring. There is no magic! No matter how many meetings the team has each day, how many status reports they produce, how much their managers talk about agile processes and SCRUM, the project will still be a chaotic mess if there are no good programmers on the team.

Any tool, technology, or process is only as good as the people who use it.

In today's sea of programmers, architects, and all sorts of people who call themselves "software professionals", how many are indeed good, reliable Software Engineers capable of producing good quality software?

There is a common opinion that suggests that the usual “80-20” rule is applicable to the software community as well. Some people, however, are less forgiving. The brilliant Edsger W. Dijkstra - considered by many the Father of Computing Science - once suggested that only 10% of all software engineers are any good at all. The outspoken author and software professional Allen Holub in his brilliant article “The Terror of Code in the Wrong Hands” goes even further and suggests that only about 5% - the so-called elite programmers - can be trusted to get the job done without causing various degrees of harm. So, what is the magic ingredient that distinguishes a good software engineer from everybody else?

Wired for Intellectual Manageability

The most important quality of a true software engineer – in my opinion – is the ability (talent and vision) to effectively represent any complex problem as a set of simple, intellectually manageable parts where each of those parts can be viewed, analyzed, and worked on individually without appearing overwhelmingly complex. This involves the ability to identify, separate, and abstract distinct functional domains, concerns, and concepts within the subject system – however large or small that system itself is. The subject may be a high-level architectural view, or a single low-level operation where distinct groups of micro-steps may be identified and abstracted into smaller and more manageable subroutines. Regardless of the flavor-of-the-day technology or design methodology used, any approach absolutely must focus on de-coupling things that do not have to be - and should not be - hard-wired to one another.

What is even more important is that a good Software Engineer understands - and constantly feels - the necessity to maintain such separation and intellectual manageability of individual parts at any given moment of the development process!

Refactoring is not a buzzword. It is not a separate project scheduled for later either. It is the way of programming, a vital integral part of the development process - every single minute of it. Good programmers refactor subconsciously with almost every line of code they write - not because they want to keep things pretty but because their brains are wired to keep things well organized and intellectually manageable at any given moment. That is the key to crafting software that is always in the stable working condition. That is the only way agile development can really work!

In a reasonable amount of time any new technology or process may be learnt and adopted by anyone with sufficient general intelligence. Unfortunately, not everyone has the vision and ability to model well. Just like the gift of playing a musical instrument or creating works of art, that skill may only be acquired with practice by those who were born with the very specific talent for it. I am absolutely convinced that software engineering requires a unique combination of intelligence, scientific mindset, and artistic vision. Different people are born with different talents and inclinations, and those three qualities are absolutely essential for being a true Software Engineer. None of these three ingredients may be aquired with time: you either have it, or you don't.

A true Software Engineer, without a doubt, is an artist and a craftsman. And with that comes elegance, efficiency, fast results, manageability, and high quality of the software he or she designs.

I believe that creative artistic vision is especially essential in software modeling. The ability of the software engineer to visualize distinct concepts and their relationships, represent them in a clear, elegant, and non-convoluted model – that is what makes the difference between quick results and an endless stressful nightmare, between making profit and loosing money, between the success and failure. When I talk about a “model” or “design” I don't necessarily mean a “design document”, a diagram, or a particular stage of the software development process. I am talking about any type of representation of the subject/problem, including – and foremost – mental visualization, at any given stage of the creative process of designing software. The ability of such visualization and modeling is crucial at any given moment, and at any given level – whether one is working on a high-level architecture or a single low-level method or function. That is the skill that allows the better developers to properly abstract the complexity in components, modules, classes, and subroutines, ensuring intellectual manageability of every individual piece. On the other hand, those who lack that essential ability, end up writing excessively complex, convoluted, hard-to-manage, hard-to-maintain, and very costly software.

Invest in Competence: Why Is It So Difficult?

Building software is not trivial, by any means. Software engineers must deal with many complex issues at a time, overwhelming schedules, confusing and ever-changing business requirements. So why do companies continue to make things even more difficult and stressful by hiring mediocre engineers who are simply not capable of designing and coding well? Why is convoluted spaghetti code not only tolerated but is literally a norm on so many critical projects today?

Building a competent development team requires first of all a very competent technical manager who can identify a good software engineer in the sea of fly-bys and imposters. Such manager must also have the freedom and means to offer good compensation and motivation to the people he or she hires. The manager must also have the freedom to easily let go of those team members who are not performing well, show no improvements or desire to improve. When it comes to offering compensation packages, more often than not, the managers are limited to working with what is given to them by the HR departments or other managers. Unfortunately, even if the company itself can afford paying their employees well, among the people who actually determine the salary/rate caps for software engineers not everyone knows and appreciates the difference between a good software engineer and an "average" one. And, tragically, most non-software folks can't even imagine in their worst nightmare that the fine line between an "average" SE and a really good one in practice translates into a monstrous difference between a complete failure and a huge success. As the result, companies "save" pennies on salaries and waste millions of dollars on disastrous projects that never end.

Although unwillingness of many companies to invest into truly high-quality engineers is a serious problem, it may be - at least, partially, - overcome by a dedication of a competent technical manager who finds other ways to interest and keep highly-skilled top-notch resources. Such managers understand that salary, however important, is not the only motivational factor for a good SE. And this is why it is important to remember that true Software Engineers are artists who take pride in their craft and draw satisfaction from their daily work. For a true software engineer, interesting work on a team with like-minded people who are equally passionate about the common cause is, at the very least, as important as the salary. A good software engineer will think twice before turning down an interesting, creatively challenging assignment. On the other hand, such people are unlikely to stick around long if they find themselves surrounded by lazy unmotivated slouches and drones. Imagine an artist coming back to work each day only to find that some thoughtless fool has painted vulgarities over his masterpiece! Day after day! That is exactly how a good programmer often feels on the team with so called "average software engineers."

A more prosaic and, unfortunately, very common reason why so many teams consist of poor programmers is - sadly - the incompetence and personal insecurities of some hiring managers. This is, of course, true for any industry, and has to do with the human nature. While a good manager tries to hire the best people and let them do what they do best, an incompetent and insecure manager is more concerned with preserving his or her perceived status in the company. Therefore such folks naturally tend to surround themselves with the kind of employees who are not likely to challenge them in any way, who will not expose their weaknesses. It is not uncommon for a good software engineer who's passionate about his/her work to outspoken. Some managers don't like that and may act very annoyed and "concerned" when an "ordinary" engineer speaks up in meetings with new ideas or suggestions to improve things. Go to any popular online software forum, and you are all but guaranteed to come across desperate outbursts and bitter stories about managers or "chief architects" that suppress creativity or enforce old-fashioned dogmas on their teams - just because they themselves don't know any better.

Unfortunately, a truly good development manager is just as hard to come by as a good programmer. Due to the unfortunate lack of appreciation for the value of a good software engineer, most organizations pay the best of their SEs significantly less than they pay development managers. If a SE wants to earn more, their only way - within the given organization - is to grow first into a position of an "Architect", and then into a "Development Manager".

Generally, I believe that the only major difference between an "Architect" and a "Programmer"is that a true architect is expected to have knowledge of a broader spectrum of platforms and technologies, the ability to choose and integrate the right ones, while a "programmer" may get away with simply focusing on a narrower set within a project. Both should be equally capable of understanding the principles of software design and programming. I am convinced that a poor programmer may never become a good software architect. And a truly good software architect is always a good programmer. Architecture and design are not limited to the high-level view. In software, every single artifact, however small, requires thoughtful design and architectural vision. In many organizations, however, the title of an Architect relates to a semi-bureaucratic position somewhere between the enginnering team and the management - usually, closer to the management. "Architects" in such organizations spend most of their time in meetings with the managers and business people. They rarely work closely with programmers, and never write code themselves. However, since such positions usually come with higher salaries, good programmers face the dilemma: to move away from programming into a managerial position, or to look for a way out of that company in favor of something better. Some choose to become free-lance consultants just so that they can continue to do what they do best and what they love, while getting paid fairly well without competing for titles with more political types. It is not a secret that brilliant software engineers not always have great managerial skills. If a good programmer becomes a manager, the organization often loses a good programmer and - quite possibly - acquires a mediocre manager. Some of the good engineers, indeed, make for excellent engineering managers. I am fortunate to have worked with such people. Some, however, become lousy managers - just as before they were sub-par programmers.

IN my opinion, for things to improve, the companies (and I mean, the top management and HR) must realize and admit the following:

  • not everyone who calls himself/herself a Software Engineer actually is one; quite a few of such people do more harm than good by creating a false impression that the project is in full swing, while their incompetent actions are actually leading it away from the successful completion; if there's no one to identify such people on the project, the management may never know that the job that takes over a year could have been completed in a couple of months by just one or two real software engineers.
  • one good software engineer can do more work - faster and better - than a team of (sometimes 10 or 20!) "average" software engineers; therefore, hiring only good software engineers would allow to significantly reduce the sizes of teams and departments and get the jobs done faster and better;
  • good software engineers are the ones who do all the work and are the decisive factor in ensuring the success of theh project;
  • it is okay to pay good SEs at least as much as managers - based on the value they bring and uniqueness of their skills; I have witnessed a situation when the hiring manager's ego stood in the way of hiring an experienced consultant only because the manager could not live with the fact that the consultant's hourly rate - if multiplied by 40 hours and 52 weeks - amounted to a yearly income that apparently was higher than the manager's salary (the manager in question had actually openly voiced that concern); managers often claim that they can't find good people for the job, but what they are often not saying is that they can't find good people for the kind of salary they offer...
  • and finally, it should be made easier for companies to fire those who consistently underperform or demonstrate incompetence and inability to create good software - in order to avoid paying the big bucks to those who don't deserve it.
It seems quite unrealistic to expect all these things happen any time soon. I am only hoping that more and more companies - at least gradually - realize and appreciate the true value of a good software engineer, as well as the magnitude of the potential harm a so called "average" software engineer may bring.

Conclusion

It is not the purpose of a Software Architect to produce thousands of pages of unreadable documentation. It is not to impress the business people and upper management by dropping buzzwords and talking things none of them really understands. I believe that the real duty of any Software Engineer - be it an architect who designs a large system or a programmer who develops a particular module - is to work relentlessly to minimize the complexity and ensure the intellectual manageability of every single artifact. That is what ensures fast, efficient, and successful development. Not everyone is capable of that, and those who are not usually argue that they are not given enough time - under the provided timelines - to write quality software. I am convinced that such arguments are nothing but lame excuses.

Good software engineers often half-jokingly explain their obsession with quality by saying that they are simply... lazy! A good software engineer doesn't want to do the same thing over and over again. Or even twice! A good software engineer hates having to waste hours and days on deciphering cryptic unreadable code, including their own. That is why they write code that reads like a book.

Finally, good sftware engineers hate working endless hours on chasing obscure bugs. They shrug at the thought of being part of some "sustaining team" whose only purpose is fixing bugs and putting patches on a sloppily written code. Good software engineers prefer to get things done well the first time around. That doesn't mean no bugs at all. There may always be something that was not accounted for during the initial iterations of the code. However, in a well designed quality system, any defect in a module is easily traceable and may be corrected quickly without any - or with minimum - impact on the other modules. If your organization has a "sustaining" team, that may be an indication that your "development" team is not very good after all...

No one should be proud of how complex and crafty their systems are. On the contrary, a software engineer should feel ashamed of anything that looks complex, convoluted, unreadable, and difficult to understand.

Of course, it's just my opinion. I could be wrong...