| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were previously exporting root causes from one layer of abstraction to the next. For example, anything that called into the database could cause an `sqlx::Error`, so anything that transitively called into that logic exported a `Database(sqlx::Error)` error variant of its own, using `From` to map errors from inner type to outer type.
This had a couple of side effects. First, it required each layer of error handling to carry with it a `From` implementation unwrapping and rewrapping root causes from the next layer down. This was particularly apparent in the event and boot endpoints, which had separate error cases unique to crypto key processing errors solely because they happened to involve handling events that contained those keys. There were others, including the pervasive `Database(sqlx::Error)` error variants.
Separately, none of the error variants introduced for this purpose were being used for anything other than printing to stderr. All the complexity of From impls and all the structure of the error types was being thrown away at top-level error handlers.
This change replaces most of those error types with a generic `Failed` error. A `Failed` carries with it two pieces of information: a (boxed) underlying error, of any boxable `Error` type, and text meant to explain the context and cause of an error. Code which acts on errors can treat `Failed` as a catch-all case, while individually handling errors that signify important cases. Errors can be moved into or out of the `Failed` case by refactoring, as needed.
The design of `Failed` is heavily motivated by [anyhow's `context` system][context] as a way for the programmer to capture immediate intention as an explanation for some underlying error. However, instead of accepting the full breadth of types that implement `Display`, a `Failed` can only carry strings as explanation. We don't need the generality at this time, and the implementation underlying it is pretty complex for what it does.
[context]: https://docs.rs/anyhow/latest/anyhow/struct.Error.html#method.context
This change also means that the full source chain for an error is now available to top-level error handlers, allowing more complete error messages. For example, starting `pilcrow` with an invalid network listen address produces
Failed to bind to www.google.com:64209
Caused by:
Can't assign requested address (os error 49)
instead of the previous
Error: Io(Os { code: 49, kind: AddrNotAvailable, message: "Can't assign requested address" })
which previously captured the same _cause_, but without the formatting (see previous commit) and without the _context_ (this commit). Similar improvements are available for many of the error scenarios Pilcrow is designed to give up on.
When deciding which errors to use `Failed` with, I've used the heuristic that if something can fail for more than one underlying reason, and if the caller will only ever need to be able to differentiate those reasons after substantial refactoring anyways, then the reasons should collase into `Failed`. If there's either only a single underlying failure reason possible, or only errors arising out of the function body possible, then I've left error handling alone.
In the process I've refactored most request-handler-level error mappings to explicitly map `Failed` to `Internal`, rather than having a catch-all mapping for all unhandled errors, to make it easier to remember to add request-level error representations when adding app-level error cases.
This also includes helper traits for `Error` and `Result`, to make constructing `Failed` (and errors that include `Failed` as an alternative) easier to do, and some constants for the recurring error messages related to transaction demarcation. I'm not completely happy with the repetitive nature of those error cases, but this is the best I've arrived at so far.
As errors are no longer expected to be convertible up the call stack, the `NotFound` and `Duplicate` helper traits for database errors had to change a bit. Those previously assumed that they would be used in the context of an error type implementing `From<sqlx::Error>` (or from another error type with similar characteristics), and that's not the case any more. The resulting idiom for converting a missing value into a domain error is `foo.await.optional().fail(MESSAGE)?.ok_or(DOMAIN ERROR)?`, which is rather clunky, but I've opted not to go further with it. The `Duplicate` helper is just plain gone, as it's not easily generalizable in this structure and using `match` is more tractable for me.
Finally, I've changed the convention for error messages from `all lowercase messages in whatever tense i feel like at the moment` to `Sentence-case messages in the past tense`, frequently starting with `Failed to` and a short summary of the task at hand. This, as above, makes error message capitalization between Pilcrow's own messages and messages coming from other libraries/the Rust stdlib much more coherent and less jarring to read.
|
|
|
When an error is printed, if that error has a source, print the source. Then repeat that process until there are no more sources.
For an error with no source:
the error message
For an error with a single source:
the error message
caused by:
the source's error message
For an error message with chained sources:
the error message
caused by:
the immediate source's error message
the first transitive source's error message
the second transitive source's error message
…
the ultimate source's error message
That is, sources are printed from the outermost inwards, terminating with the lowest-level source.
This printing occurs in two places: when pilcrow starts, and whenever it processes an internal server error. For internal server errors, the error message is prefixed with the error ID, as before, to allow operators to correlate log messages with responses if they choose to do so:
[E1234] the error message
caused by:
the immediate source's error message
the first transitive source's error message
the second transitive source's error message
…
the ultimate source's error message
Both call sites lock stderr to try to prevent interleaving messages. No promises about how well it works.
I've not made any effort to ensure that multi-line error messages are correctly indented. So far as I know, pilcrow never emits a multi-line error message (except for long messages which are wrapped), so this likely isn't necessary. It's also surprisingly hard to do well, though there is [a crate for it][indenter].
[indenter]: https://docs.rs/indenter/latest/indenter/
There's a widespread convention for unix tools to prefix stderr output with the name of the program. For example, cat on my system prints the following when asked to open a nonexistent file:
cat: nope: No such file or directory
I've opted not to do that for Pilcrow. By my understanding, that convention is intended to help users identify which of several programs emitted a message when several programs share a stderr channel, for example when a program is used as part of a shell pipeline.
Programs that are largely intended to be freestanding do not consistently follow the same convention - Cargo, clang, and gcc all write diagnostics to stderr with no prefix, for example. Pilcrow is, similarly, meant to be run in isolation, either as the single thing a user is running interactively, or because it's being run as a service. In either case, the prefix is not necessary to identify Pilcrow as the source of a message.
I debated making the new `Exit` type crate-private, and having `cli::Args::run` return `impl Termination` to handle the interface between the crate and the outside world. For now, I've opted to instead return a result, so that any ill-advised callers trying to use Pilcrow as a library can at least attempt to understand the structure of its return value; this entails making `Exit` public as well, so that the CLI entry point (`main.rs`) can import it.
This is an enabling change, with little immediate impact on its own. It probably makes some error messages print less than perfectly, but Pilcrow generates errors so rarely in development that it's hard to check. I'll be revising individual errors in later
|