Zumi's Scratchpad

Log, don't throw

updated on

Or, the actual title: My dumb error-handling idea.

Eternal struggle

It's ${CURRENT_YEAR}, and error handling is STILL not a "solved problem", especially when it comes to balancing ergonomics (how good it feels to code with) and performance. It's still very much debatable what approach you use, whether it be exceptions, error codes, Result/Option types, either monadic (Rust etc.) or tuple (Go, Odin).

I have made the mistake of reading programmer "Disc Horse" (particularly on Hacker News and Lobsters) where ✨the enlightened✨ makes fun of the Blub programmers for having such a error-prone and inefficient way of handling errors, among other things. All of this hasn't added clarity, only confusion. Enough for me to add fuel to the fire, especially as someone completely unqualified to yap about any of this.

So let's first enter these rough points from said discourses into "evidence", of which I won't elaborate or argue for:

  1. The design of exceptions is bad and you should feel bad, prefer error codes and result types.
  2. Error handling should be front-and-center and not just afterthoughts, therefore they should be part of your code.
  3. "Careful thought and strategically-placed print statements" still reign supreme.
  4. The if err != nil: return nil, err ("simple re-throwing") pattern in Go.
  5. fmt.Errorf("%w", err) ("error wrapping") in Go.
  6. Log and rethrow can be considered an anti-pattern.
  7. Premature abstraction is the root of all evil.
  8. It is still useful to have different types of error, because not every error is equal.

Next, consider this log of an unhandled exception of a web service:

ValueError exception: Can't parse "blerp" as Date
from: std/parsedate:100
from: /home/zumi/dev/myDumbProject/src/parsetools/object:245
from: /home/zumi/dev/myDumbProject/src/fetch/object:124
from: /home/zumi/dev/myDumbProject/src/controller/getObject:24
from: /home/zumi/dev/myDumbProject/src/handleRequests:24

You can probably infer from this that there has been an error when accessing some Object, and that it has received a nonsense value for a date input. But there's two bits of context I'm missing here:

  1. What kind of date was it trying to parse, exactly? As in, which database column? (and it'll most likely be from a database)
  2. Which object ID did it choke on?

In a trivial database, you can probably go to the line and then go to the database to find the odd value. But all "serious" web services will have to talk about scale. You can imagine that plus a wild request, and suddenly it becomes a bit harder to rectify.

And you could argue that I should have caught it at the lowest level I could. A bit easier with a language that has checked exceptions, because the compiler will tell you which ones to catch.

Besides, the whole point of exceptions is so that you can get the error-handling cruft out of the way, right? So then, you'd be likely to wrap your entire function inside of a giant try statement rather than try-catch individually. In some languages, try is its own block. Imagine if you want to initialize variables this way, unless blocks can also be expressions.

But what if I can have this instead:

[!!] can't parse object's creation date. e: "ValueError: Can't parse "blerp" as Date". value = "blerp". [std/parsedate:100]
[!!] can't parse object. [parsetools/object.src:245]
[!!] can't fetch object. id = 99. [fetch/object.src:124]
[!!] can't display object. id = 99. [controller/getObject.src:24]

Right out of the gate, I have the answer to the questions of context, and I can fix the issue faster:

  1. blerp for some reason was the value of the creation date in the database.
  2. The issue was with ID 99.

This can be achieved anyway you like. But what if the log is the stack trace?

There's a practice I know of that simply has strings as the error value, so basically Result[T, string]. I think the idea being that, they're just values, so as to keep the function pure and you have the option of logging it or not.

What if I, like a dumbass, don't care about "keeping it pure"? I just assume that a logger package is present, initialized, and I utterly depend on it, and I don't want the option of not logging the error?

And instead of having only one or maybe two kinds of error, results or error codes or exceptions or whatever... what if I have all of them??

My dumb idea

I came up with these error types:

It's important to note that this is NOT a general error-handling strategy. This shouldn't be used to "take over the world", but instead to ONLY be used in YOUR code. The libraries you use can do whatever, exceptions, results, whatever. The strategy described here should be used in code that YOU care about. It's really a way of marking boundaries between app code and "library" code. A.k.a, "app code that faces the user" where this approach can be used vs. "app code that can be used elsewhere" where e.g. exceptions can be used.

And instead of the focus being on the function itself, the focus is in whoever calls it. I think it fits into the "code what you need" mindset instead of prematurely preparing abstractions that will not hold up.

For the pseudo-code in this article let's assume something that looks kind-of-like C, and has exceptions.

Outcome

A value of either Fail, or Ok. Implementing this can simply be a boolean true/false—whichever denotes the error value depends on what kind of standards you have. For example, C functions usually say "return true if there is an error", which results in "non-zero value denotes an error".

DifferentiatedFail

An enum that just says what kinds of errors are possible that you care about. They can be literally anything depending on what you need, perhaps in the form of these constants, which map neatly to HTTP errors:

InternalFail // 500
ExistsFail // 404
PermissionFail // 403

Option[T]

Your bog-standard "safe nullable reference" optionals type that you need to "unwrap" to use, forcing you to check the thing. In absence of this, you could just use the nullable reference type, but you carry the risk that comes with it.

Here it communicates that either there is only one possible way that the call can error out, or that the caller should not care what kind of error occurred. After all it just needs to know whether it would receive the value or not.

Result[T, E]

The ok-or-error monad that Rust convinced me was the One True Way to Go. And speaking of, yeah, Go fits this model, although it's like the "standard nullable reference" version of this model, since you can forget to check it. oooOOOOooOOo BILLION DOLLAR MISTAKE!!!!!! BAD!!!!!! or something. Hate this type of antagonism.

ContextualFail

The closest thing to "exception objects", but it's lighter because it just contains this:

struct ContextualFail
{
    kind: DifferentiatedFail;
    message: string;
};

Which one to pick

In short:

A flowchart showing what error type to pick. Can the function error out? If it cannot error out, does it return a value? If it returns, then the type is T, otherwise nothing or a void. If it can error out, does the caller need to distinguish between different kinds of error? If not, does it return a value? If it returns a value, then the type is Option of T, otherwise the Outcome enum. If the caller does need to differentiate, does it return a value? If it doesn't return a value, the type is a DifferentiatedFail. Otherwise, does the caller need different error messages? If not, return a Result of T and DifferentiatedFail. Otherwise, return a result of T and ContextualFail.
Graph source

The thing about DifferentiatedFail is—again—it can be anything, even function-specific:

UsernameVibeCheckFailed
PasswordTooStrong
ConfirmationPasswordMismatch

This is fine for standalone console apps or whatever, but if you're writing a web service, you may as well fold it into a ContextualFail:

ContextualFail{
    .kind: ValidationFail,
    .message: "Invalid user name, must have at least one em-dash in it"
}
ContextualFail{
    .kind: ValidationFail,
    .message: "Password too strong, must have a maximum of 3 characters"
}
ContextualFail{
    .kind: ValidationFail,
    .message: "Password confirmation doesn't match password input"
}

The idea here being that ValidationFail would map to a 400, and then the message can be a message that can be flashed alongside the re-thrown form.

Show code pls

Alright. I'm a bit more familiar with Go so that's what I'll use as reference. In Go, you might do:

func callee() (T, error) {
    // ...
    if err != nil {
        // you can choose to bury errors here
        return nil, errors.New("can't do x")
    }
    //
    return ...
}

func caller() error {
    a, err := callee()
    if err != nil {
        // if callee uses this pattern, you can easily
        // get a mile long string...
        return nil, fmt.Errorf("can't get a: %w", err)
    }
    return nil
}

func main() {
    err := caller()
    if err != nil {
        // you could log only in here, but then you
        // don't get WHY it happens...
        log.Fatalf("%w", err)
    }
}

In my dumb error handling proposal, I'd do, in pseudo-code:

Option[T] callee()
{
    try
    {
        x = // ...
        return x.some()
    }
    catch SomeError as e
    {
        log.errorf("cannot get x: %s", e.message)
        return none(T)
    }
}

Outcome caller()
{
#if you_prefer_pattern_matching
    match (callee())
    {
        some(a)
        {
            // do stuff with a
            return Ok
        }
        none()
        {
            // note how you don't return this log itself
            // as a value, but it's just a log
            log.errorf("cannot get a")
            return Fail
        }
    }
#else
    a = {
        result = callee()
        if (a.isNone())
        {
            log.errorf("cannot get a")
            // this is an early *explicit* return,
            // exits the entire function
            return Fail
        }
        // *implicit* return assigns to `a`
        // and continues
        result.get()
    }
    // do stuff with a
    return Ok
#endif
}

int main()
{
    if (caller() != Ok)
    {
        log.errorf("cannot do thing")
        return 1;
    }
    return 0;
}

You'd get:

cannot get x: some internal error
cannot get a
cannot do thing

Notice how you don't need to:

How might an error string be used, aside from logging, anyway? Go already has a hard time differentiating errors (need to preallocate beforehand and having to use errors.Is), so I think that says something about returning error strings as a concept.

Exception object implementations are varied, but they usually need to have a stack trace. The common complaint is that they need heap allocations for composing the error messages and such. No difference here if you do formatting for everything, but you could define a bunch of strings in .data to mitigate that. Again, what do you plan on using all of that for?

In some languages you might even want to create new Exception types because at a high level you don't need to care about the internals of whatever it is you're calling, e.g. a controller shouldn't need to care about a DbError. I think here even that would fit.

If you want to use this in a web service in particular, there's another point that could be in favor of this:

Users don't need to know the precise error logs. But the server admin does.

"Precise error logs" in this model, then, is an explicit opt-in, rather than an opt-out. Let's assume the standard handler-repository pattern.

Here's some rough, contrived "repository" code. In an exception-laden language, you might want to try-and-catch at the most granular level. But sometimes you don't want to handle things like DbError which can happen at every step of the way.

// as in this tiny example the kind of error we're expecting is only
// a server-side error, we can use an Option[T] here.
Option[T] getPostDate(int postId)
/** error kind: internal **/
{
    db = getGlobalDatabase()
    try
    {
        q = db.prepareStatement(makePostQuery(postId))
        s = db.execute(q)
        r = db.getColumn(s, "post_date")
        try
        {
            return r.asDate().some()
        }
        catch ValueError as e
        {
            // you can tell the problem value here
            log.errorf("cannot parse post date: %s. value=%s", e.message, r)
            return none()
        }
    }
    catch DbError as e
    {
        // since this is a catch-all, we won't get which one
        // of these db calls (prepareStatement, execute, getRowCol)
        // errored out. so we might wanna print its stack trace anyway.
        log.errorf("%s", e.getStackTrace())
        log.errorf("db error: %s", e.message)
        return none()
    }
}

And here's the "handler" that takes this function.

void endpointGetPost(HttpServer s, int id)
{
    // assume the post's availability is handled differently
    date = {
        i = getPostDate(id)
        if (i.isNone())
        {
            log.errorf("unable to get post date of id %d", id)
            s.reply(status=500, body=makeErrorPage(500))
            return
        }
        i.get()
    }
    // ...
}

The user only sees the 500 error, while you see:

[!] cannot parse post date: Can't parse "null" as Date. value=null [controller/post:163]
[!] unable to get post date of id 19 [controller/post:145]

Or:

std/dbtools/private/backwarddb:140
std/dbtools:1100
fetch/object:154
[!] db error: The database server didn't respond [fetch/object:174]
[!] unable to get post date of id 19 [controller/post:145]

From the handler code's PoV, you as a caller don't need to know WHY the lower layer failed, like a DBError or whatever. But you want to know WHAT the effects of it is. like, whether it's an "internal error" or a "does not exist error", so you can return a 500 or a 404.

From the administrator's PoV, you want to know WHY the lower layer failed, because WHAT the effects of it is already reported to the user of your service.

Drawbacks

But in case you're convinced this is a good idea, consider the following.

Again, you need some global log object that is always available, and accessible by every function. If you work in something that expects pure functions, tough luck, you need to pass that logger object around like hot potatoes.

For Outcome and Option[T] in particular, since they communicate ONE kind of error, you still need to diligently document that yourself.

// wait, does this error or not?
Option[T] derpity()
{
// ahh okay, i see.
Option[T] derpity()
/** error kind: internal **/
{

You find that you suddenly need to differentiate between an internal error and a validation error, but what you called was an Outcome or Option[T]. Okay, let's change that to DifferentiatedFail or Result[T, DifferentiatedFail]. Oops, turns out there's 10 other functions that use it, and you have to change them too. That's right, it's viral! But the solace is in the fact that it's not happening in libraries, only your own code. Besides, abstractions that work do tend to take a while.

Depending on how granular you made your functions, you can't easily shut off some errors from your callers. Your only option here is to set the log levels from the called functions themselves. Incidentally, that also means you can't just "swallow" errors like you might be tempted to do with reg'o exceptions.

And let's not forget the fact that you now have FIVE types to choose before writing anything, instead of just the one. In which case, you should refer to the flowchart of what to choose.

In conclusion

XKCD 2119.

I'll have to put this into practice for my aforementioned web app, and see how I like it. But until then it's just a theory.

A GAME THEORY. THANKS FOR FLAMING.