What Conditions (Exceptions) are Really About
March 24, 2008The “condition” (exception) feature of Common Lisp is important, but widely misunderstood, as can be seen by the frequent confusion between “conditions” and “errors”. I’ve been thinking about conditions and exceptions for many years, and here’s how I explain them.
Notes: I’m going to avoid using the word “error”, which has become overloaded. Some of the following applies to Java, but not all; I might write about Java exceptions in the future. I’ll omit the use of explicit catch/throw, for brevity. I’m only talking here about the simple heart of the condition feature, not fancy things like restarts.
Contracts, Bugs, and the Failstop Principle
Every function has a “contract” which defines what the function is supposed to do. If any function call violates the contract, the program must be incorrect: a “bug” has happened. The actual incorrect behavior might have started any time before we detect that there’s a bug.
If a program detects that a bug has happened, it should stop. That’s because if it keeps on going, there’s no way to know what it might do: write the wrong data to a file or database, display wrong answers, hang, etc. This is called the “failstop” principle.
(Exactly what “stop” means depends on the context. An interactive command might return to its event loop. A server thread might go back to its wait-for-input step. These are not perfectly safe, since the program might have corrupted transient state before the bug was detected. A safer way to stop is to kill the entire process. In Erlang, you only have to kill a thread, since each thread has its own transient state.)
Outcomes
The contract of a function specifies, among other things, the possible “outcomes” of calling the function. There is always one “usual” or “straight-line” kind of outcome, and then there can be zero or more “unusual” outcomes.
In Common Lisp, every function call either returns zero of more values, or else signals a condition. The caller discriminates on which kind of outcome this is by scrutinizing the values returned, or scrutinizing the condition that was signaled. The contract specifies the circumstances under which each kind of outcome happens, saying what values are returned or what condition object is signaled (plus what side-effects occurred) for each kind of outcome.
For example, suppose we call (open pathname :if-does-not-exist nil). Possible kinds of outcome are:
- It returns a stream object. This means that the specified file has been opened for input.
- It returns
nil. This means that there was no file by this name in the file system. There are no side-effects. - It signals
inappropriate-wildcard. This means that the pathname had was a wildcard pathname; it doesn’t make sense to open one. There are no side-effects. - It signals
undefined-logical-host, and the instance’sundefined-logical-host-nameis the name of the logical host. This means that it was a logical pathname whose host was not found in the set of translations.
(There are many other kinds of outcome. Sadly, Common Lisp does not actually specify what condition classes are signaled. You own contracts always should!)
If the call to open does any of these things, it is working properly and there is no bug. If the call to open returns something other than nil or an open-for-input stream to the specified file, or if it signals any other condition class, a bug has happened and the program should stop.
Conditions and bugs are entirely orthogonal. If you call open (as shown above) with a wildcard pathname, and it signals inappropriate-wildcard, that’s not a bug; that’s exactly what it’s supposed to do. If you call open and it returns a symbol, that’s bug, but no condition is signaled.
Commonly, when a function call ends with an unusual outcome, that’s specified to mean that there were no side-effects. There’s nothing theoretically wrong with specifying in the contract that a certain unusual outcome also has some side effects, but it’s not customary.
Tasteful Contract Design
When you design a function, you should first think of all the possible kinds of (correct) outcome. Then you should decide how each outcome will look to the caller: certain specific returned value(s), or certain specific conditions. This all becomes part of the contract for the function.
The general principle for making this choice is to consider which outcomes are the ones that a programmer is likely to expect and desire. You can’t always know for sure: different programmers might call the same function with different expectations. But it’s usually not hard to guess accurately. The “usual”, “straight-line” outcomes should always be a kind of returned value. The more unusual outcomes seems like it will be expected and important, the more likely you’d be to represent it by a kind of returned value than by a condition. All other unusual outcomes should be indicated by signaling conditions.
The main clue is the appearance of the function call. That’s mainly the function’s name, but it can also include the names of keyword arguments.
For example, (open "/a/b") should be defined to return a value only when it has actually opened a file, in which case it returns a stream. All other outcomes should be signals of conditions. However, (open "/a/b" :if-does-not-exist nil) suggests strongly that some outcomes (there’s no “b” in directory “/a”, or directory “/a” does not exist) should be indicated by returning nil, and conditions should be used for other outcomes.
Why Conditions are Better Than Special Return Values
It’s sometimes tempting to indicate unusual outcomes by having a function return a special value, or by having it return a second value. However, there are two drawbacks to this.
First, experience over many long years has shown that programmers often forget to check for the special values. Coding is hard and demands a lot of concentration. When a programmer is hard at work figuring out how to write an algorithm, it can be difficult to keep in mind all the possible outcomes of every call. There’s no excuse for it, but in real life, this is a common bug.
Bruce Eckel, in Thinking in Java, 2nd edition, correctly says:
In C and other earlier languages, there could be several of these formalities, and they were generally established by convention and not as part of the programming language. Typically, you returned a special value or set a flag, and the recipient was supposed to look at the value or the flag and determine that something was amiss. However, as the years passed, it was discovered that programmers who use a library tend to think of themselves as invincible — as in, “Yes, errors might happen to others, but not in my code.” So, not too surprisingly, they wouldn’t check for the error conditions (and sometimes the error conditions were too silly to check for [such as all the error values from printf]). If you were thorough enough to check for an error every time you called a method, your code could turn into an unreadable nightmare. Because programmers could still coax systems out of these languages they were resistant to admitting the truth: This approach to handling errors was a major limitation to creating large, robust, maintainable programs.
If an algorithm forgets to check for the special values, it will proceed as if the usual outcome happened. This means that the program is malfunctioning. A bug has happened but it has not been detected.
But if that unusual outcome is expressed as a signal of a condition, and the programmer forgets to handle it, the program will stop. This is what we want: failstop behavior.
(Exactly what “stop” means depends on context. In a server, there would probably be a handler-bind near the base of the stack that handles all conditions. This “ultimate handler” is called when a bug has been detected. It might write a stack trace to a log file, and then cause the thread to be restarted, for example.)
Second, even if you do remember to check for the special value, it often makes the program cluttered and harder to read. This is particularly annoying in Lisp, where it’s customary to write applicative forms where arguments to one form are themselves non-trivial forms.
I only have room here for a short example. The problems discussed above come up more often, and are harder to deal with, in much larger programs.
Suppose we have a configuration module that associates keys with URL’s. Looking up a key has two possible outcomes: the URL is found (usual) and no URL is found (unusual). The function url-host-name extracts the host name from an URL. If the URL does not specify a host name, that’s an unusual outcome. Finally, make-host creates and returns a host object, with the given host name.
We want to write a new function, get-host-from-configuration, which takes a configuration and key, and returns the host name of the specified configuration entry. There are two possible outcomes: the host, or an indication that we could not obtain it.
Version 1 disregards unusual outcomes:
(defun get-host-from-configuration (configuration key) "Returns the host associated with the key and the configuration." (make-host :name (url-host-name (read-url configuration key))))
Version 2 indicates unusual outcomes by returning nil:
(defun get-host-from-configuration (configuration key)
"Returns the host associated with the key and the configuration,
or nil if it cannot be obtained."
(let ((url (read-url configuration key)))
(when url
(let ((host-name (url-host-name url)))
(when host-name
(make-host :name host-name))))))
Version 3 uses conditions:
(defun get-host-from-configuration (configuration key)
"Returns the host associated with the key and the configuration,
signal host-not-in-config if the host cannot be found."
(handler-case
(make-host :name (url-host-name (read-url configuration key)))
((configuration-entry-not-found url-has-no-host) ()
(error 'cannot-make-host-from-key :key key))))
Version 1 is nice and simple, but it doesn’t take into account the possibility of the unusual outcomes of its callees. Its contract cannot possibly be fulfilled.
Version 2 works, but it loses the applicative form. Every time we call a function, we have to stop, give the result a name, and check it before we can go on.
Version 3 keeps the applicative form. As long as everything has the usual outcome, it’s just like the simple code in Version 1. The “straight-line” code path is all in one place and easy to see. The infrequent unusual condition handlers are out of the way.
Conditions at the Right Level of Abstraction
You may be thinking: why not fix Version 1 by keeping the code, and just changing its contract to say
“Returns the host associated with the key and the configuration, signals configuration-entry-not-found if the URL was not found in the configuration, and signals url-has-no-host if the URL doesn’t have a host.”
In other words, we could make the callees use conditions, as with version 3, but just let the conditions propagate to the caller.
The problem with this is that it’s a modularity violation. The caller of get-host-from-configuration has no business knowing that there are URL’s involved at all. That’s an underlying implementation detail. Instead, get-host-from-configuration should indicate the unusual outcome, that it can’t make the host object, by signaling the cannot-make-host-from-key condition. It’s OK for the condition object to contain the key, since our caller clearly knows about the concept of keys since that’s an argument to get-host-from-configuration.
Similarly, it’s good for the read-url function applied to a configuration to indicate that it can’t find an entry by signaling configuration-entry-not-found rather than, say, file-not-found if the whole configuration file was missing. The caller of read-url has no business knowing whether the configuration is stored in a file or a database. We might even have two subclasses of configuration, file-configuration and database-configuration, but this would be hidden from the caller of get-host-from-configuration. Whether the configuration is stored in a file or a database is an internal implementation detail.
condition, serious-condition, and error Are Meaningless
Common Lisp defines three base condition classes named condition, serious-condition, and error. This is based on the misconception that you can tell whether the signaling of a condition is an “error” (bug) simply by knowing the class. But you can’t. Whether the signaling of a condition is a bug or not depends entirely on whether the function signaling it is defined to do so, or not. If I were designing a new dialect of Lisp, I would omit the classes serious-condition and error.
Why This Philosophy is Unconventional
Most explanations of conditions put little or no emphasis on functions having contracts that specify conditions. Few other explanations refer to the propensity of programmers to neglect to check special “error codes”.
Major Lisp texts, such as “Practical Common Lisp” and “Common Lisp: The Language, 2nd Edition” start off by acknowledging that signaling does not always mean that there’s an “error”, but they soon give up on that distinction. The word “error” is often used to sometimes mean what I call an “unusual outcome” and other times used to mean what I call a “bug”. I see these as extremely different phenomena that must be carefully distinguished.
The fact that the usual function for signaling a condition is called error greatly amplifies the confusion. If I were designing a new Lisp dialect, I would not call it that.
Bruce Eckel’s book says:
With an exceptional condition, you cannot continue processing because you don’t have the information necessary to deal with the problem in the current context. All you can do is jump out of the current context and relegate that problem to a higher context. This is what happens when you throw an exception.
As you see, that’s not how I would explain it at all. An unusual outcomes isn’t even necessarily a “problem”. It doesn’t mean you “cannot continue processing” any more than returning from the function means that.
Joel Spolsky doesn’t like exceptions at all. He considers them like “goto” statements, which everybody “considers harmful”, whereas I think that structured non-local exits do not have the problems cited in the “considered harmful” paper. He objects that “there is no way to see which exceptions might be thrown and from where”. But how are you supposed to program with functions whose contracts you do not know, exceptions or no exceptions? He says “they create too many possible exit points”; but whether you express unusual outcomes with exceptions or with special returned values, there are just the same number of them. He advocates using error codes, even though he admits that it makes programs far bulkier and makes it impossible to nest function calls.
Implementation and Portability Considerations
The Common Lisp specification makes tradeoffs between clean contracts and speed. For example, the addition function “+” ideally ought to be defined to signal a condition when either argument is a symbol. But, in order to allow generation of fast code on non-specialized hardware, its contract says that given a symbol, it may either signal, or return some integer value.
Some contracts in Common Lisp are deliberately incomplete in order to allow some implementations to add non-standard extensions.
Many contracts in Common Lisp do not specify particular condition classes to be signaled, but rather erely say that some outcome’s behavior is “a condition is signaled” without specifying a particular condition class nor instance variable values.
Topics For the Future
unwind-protect, unhandled conditions in cleanup handlers, chained conditions, Java exceptions, debugging, handler-bind, handling all condition classes, *break-on-signal*, polymorphism, with-error-context, condition names should say what happened, not where it happened.