pilcrow - Run-It-Yourself web chat, maybe

	Commit message (Collapse)	Author	Age
*	Define a generic "Failed" case for app-level errors (and a few others).	Owen Jacobson	2025-11-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were previously exporting root causes from one layer of abstraction to the next. For example, anything that called into the database could cause an `sqlx::Error`, so anything that transitively called into that logic exported a `Database(sqlx::Error)` error variant of its own, using `From` to map errors from inner type to outer type. This had a couple of side effects. First, it required each layer of error handling to carry with it a `From` implementation unwrapping and rewrapping root causes from the next layer down. This was particularly apparent in the event and boot endpoints, which had separate error cases unique to crypto key processing errors solely because they happened to involve handling events that contained those keys. There were others, including the pervasive `Database(sqlx::Error)` error variants. Separately, none of the error variants introduced for this purpose were being used for anything other than printing to stderr. All the complexity of From impls and all the structure of the error types was being thrown away at top-level error handlers. This change replaces most of those error types with a generic `Failed` error. A `Failed` carries with it two pieces of information: a (boxed) underlying error, of any boxable `Error` type, and text meant to explain the context and cause of an error. Code which acts on errors can treat `Failed` as a catch-all case, while individually handling errors that signify important cases. Errors can be moved into or out of the `Failed` case by refactoring, as needed. The design of `Failed` is heavily motivated by [anyhow's `context` system][context] as a way for the programmer to capture immediate intention as an explanation for some underlying error. However, instead of accepting the full breadth of types that implement `Display`, a `Failed` can only carry strings as explanation. We don't need the generality at this time, and the implementation underlying it is pretty complex for what it does. [context]: https://docs.rs/anyhow/latest/anyhow/struct.Error.html#method.context This change also means that the full source chain for an error is now available to top-level error handlers, allowing more complete error messages. For example, starting `pilcrow` with an invalid network listen address produces Failed to bind to www.google.com:64209 Caused by: Can't assign requested address (os error 49) instead of the previous Error: Io(Os { code: 49, kind: AddrNotAvailable, message: "Can't assign requested address" }) which previously captured the same _cause_, but without the formatting (see previous commit) and without the _context_ (this commit). Similar improvements are available for many of the error scenarios Pilcrow is designed to give up on. When deciding which errors to use `Failed` with, I've used the heuristic that if something can fail for more than one underlying reason, and if the caller will only ever need to be able to differentiate those reasons after substantial refactoring anyways, then the reasons should collase into `Failed`. If there's either only a single underlying failure reason possible, or only errors arising out of the function body possible, then I've left error handling alone. In the process I've refactored most request-handler-level error mappings to explicitly map `Failed` to `Internal`, rather than having a catch-all mapping for all unhandled errors, to make it easier to remember to add request-level error representations when adding app-level error cases. This also includes helper traits for `Error` and `Result`, to make constructing `Failed` (and errors that include `Failed` as an alternative) easier to do, and some constants for the recurring error messages related to transaction demarcation. I'm not completely happy with the repetitive nature of those error cases, but this is the best I've arrived at so far. As errors are no longer expected to be convertible up the call stack, the `NotFound` and `Duplicate` helper traits for database errors had to change a bit. Those previously assumed that they would be used in the context of an error type implementing `From<sqlx::Error>` (or from another error type with similar characteristics), and that's not the case any more. The resulting idiom for converting a missing value into a domain error is `foo.await.optional().fail(MESSAGE)?.ok_or(DOMAIN ERROR)?`, which is rather clunky, but I've opted not to go further with it. The `Duplicate` helper is just plain gone, as it's not easily generalizable in this structure and using `match` is more tractable for me. Finally, I've changed the convention for error messages from `all lowercase messages in whatever tense i feel like at the moment` to `Sentence-case messages in the past tense`, frequently starting with `Failed to` and a short summary of the task at hand. This, as above, makes error message capitalization between Pilcrow's own messages and messages coming from other libraries/the Rust stdlib much more coherent and less jarring to read.
*	Print source chains for errors.	Owen Jacobson	2025-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an error is printed, if that error has a source, print the source. Then repeat that process until there are no more sources. For an error with no source: the error message For an error with a single source: the error message caused by: the source's error message For an error message with chained sources: the error message caused by: the immediate source's error message the first transitive source's error message the second transitive source's error message … the ultimate source's error message That is, sources are printed from the outermost inwards, terminating with the lowest-level source. This printing occurs in two places: when pilcrow starts, and whenever it processes an internal server error. For internal server errors, the error message is prefixed with the error ID, as before, to allow operators to correlate log messages with responses if they choose to do so: [E1234] the error message caused by: the immediate source's error message the first transitive source's error message the second transitive source's error message … the ultimate source's error message Both call sites lock stderr to try to prevent interleaving messages. No promises about how well it works. I've not made any effort to ensure that multi-line error messages are correctly indented. So far as I know, pilcrow never emits a multi-line error message (except for long messages which are wrapped), so this likely isn't necessary. It's also surprisingly hard to do well, though there is [a crate for it][indenter]. [indenter]: https://docs.rs/indenter/latest/indenter/ There's a widespread convention for unix tools to prefix stderr output with the name of the program. For example, cat on my system prints the following when asked to open a nonexistent file: cat: nope: No such file or directory I've opted not to do that for Pilcrow. By my understanding, that convention is intended to help users identify which of several programs emitted a message when several programs share a stderr channel, for example when a program is used as part of a shell pipeline. Programs that are largely intended to be freestanding do not consistently follow the same convention - Cargo, clang, and gcc all write diagnostics to stderr with no prefix, for example. Pilcrow is, similarly, meant to be run in isolation, either as the single thing a user is running interactively, or because it's being run as a service. In either case, the prefix is not necessary to identify Pilcrow as the source of a message. I debated making the new `Exit` type crate-private, and having `cli::Args::run` return `impl Termination` to handle the interface between the crate and the outside world. For now, I've opted to instead return a result, so that any ill-advised callers trying to use Pilcrow as a library can at least attempt to understand the structure of its return value; this entails making `Exit` public as well, so that the CLI entry point (`main.rs`) can import it. This is an enabling change, with little immediate impact on its own. It probably makes some error messages print less than perfectly, but Pilcrow generates errors so rarely in development that it's hard to check. I'll be revising individual errors in later
*	De minimis "send me a notification" implementation.	Owen Jacobson	2025-11-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a user clicks "send a test notification," Pilcrow delivers a push message (with a fixed payload) to all active subscriptions. The included client then displays this as a notification, using browser APIs to do so. This lets us verify that push notification works, end to end - and it appears to. The API endpoint for sending a test notification is not documented. I didn't feel it prudent to extensively document an endpoint that is intended to be temporary and whose side effects are very much subject to change. However, for posterity, the endpoint is POST /api/push/ping {} and the push message payload is ping Subscriptions with permanent delivery failures are nuked when we encounter them. Subscriptions with temporary failures cause the `ping` endpoint to return an internal server error, and are not retried. We'll likely want retry logic - including retry logic to handle server restarts - for any more serious use, but for a smoke test, giving up immediately is fine. To make the push implementation testable, `App` is now generic over it. Tests use a dummy implementation that stores sent messages in memory. This has some significant limitations, documented in the test suite, but it beats sending real notifications to nowhere in tests.
*	Stop passing an unused timestamp around when rotating VAPID keys.	Owen Jacobson	2025-10-08
\|
*	Allow administrators to rotate VAPID keys immediately if needed.	Owen Jacobson	2025-08-30
\| \| \| \| \| \| \| \|	In spite of my design preference away from CLI tools, this is a CLI tool: `pilcrow --database-url <URL> rotate-vapid-key`. This is something we can implement here and now, which does not require us to confront the long-avoided issue of how to handle the idea that some users are allowed to make server-operational changes and some aren't, by delegating the problem back to the OS. The implementation is a little half-baked to make it easier to rip out later. I would ordinarily prefer to push both `serve` (the default verb, not actually named in this change) and `rotate-vapid-key` into their own, separate CLI submodules, with their own argument structs, but that change is much more intrusive and would make this effectively permanent. This can be yanked out in a few minutes by deleting a few lines of `cli.rs` and inlining the `serve` function. Nonetheless, nothing is as permanent as a temporary solution, so I've written at least some bare-minimum operations documentation on how to use this and what it does.
*	Stop mentioning private error types in doctest boilerplate.	Owen Jacobson	2025-08-13
\| \| \| \|	In 792de8e49fa8a3c04bfb747adadf71572d753055, `crate::cli::Error` was made private. I forgot to update the doctest that mentions it.
*	Remove `pilcrow::cli::Error` from the lib crate's public interface.	Owen Jacobson	2025-07-22
\| \| \| \|	This might be the pettiest rude change I've ever made to a Rust program. If I saw this - or did this - in code _intend_ to be used as a library, I'd be appalled.
*	Stop linking to private documentation items in public docs.	Owen Jacobson	2025-07-22
\| \| \| \|	The Pilcrow crate library docs are something of a wart; Pilcrow isn't meant to be used as a library, and the only public interface it exposes is the CLI entry point. However, we will likely be publishing Pilcrow via crates.io (among other options), and so it will _be usable_ as a library if you're desperate enough to try. The docs should at least be coherent.
*	Add a `--umask` option to determine what permissions new files/databases get.	Owen Jacobson	2025-07-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new `--umask` option takes one of three values: * `--umask masked`, the default, takes the inherited umask and forces o+rwx on. * `--umask inherit` takes the inherited umask as-is. * `--umask OCTAL` sets the umask to exactly `OCTAL` and is broadly equivalent to `umask OCTAL && pilcrow --umask inherit`. This fell out of a conversation with @wlonk, who is working on notifications. Since notifications may require [VAPID] keys, the server will need a way to store those keys. That would generally be "in the pilcrow database," which lead me to the observation that Pilcrow creates that database as world-readable by default. "World-readable" and "encryption/signing keys" are not things that belong in the same sentence. [VAPID]: https://datatracker.ietf.org/doc/html/rfc8292 The most "obvious" solution would be to set the permissions used for the sqlite database when it's created. That's harder than it sounds: sqlite has no built-in facility for doing this. The closest thing that exists today is the [`modeof`] query parameter, which copies the permissions (and ownership) from some other file. We also can't reliably set the permissions ourselves, as sqlite may - depending on build options and configuration - [create multiple files][wal]. [`modeof`]: https://www.sqlite.org/uri.html [wal]: https://www.sqlite.org/wal.html Using `umask` is a whole-process solution to this. As Pilcrow doesn't attempt to create other files, there's little issue with doing it this way, but this is a design risk for future work if it creates files that are _intended_ to be readable by more than just the Pilcrow daemon user.
*	Reorganize and consolidate HTTP routes.	Owen Jacobson	2025-06-18
\| \| \| \| \| \| \| \|	HTTP routes are now defined in a single, unified module, pulling them out of the topical modules they were formerly part of. This is intended to improve the navigability of the codebase. Previously, finding the handler corresponding to a specific endpoint required prior familiarity, though in practice you could usually guess from topic area. Now, all routes are defined in `crate::routes`. Other than changing visibility, I've avoided making changes to the handlers at the ends of those routes.
*	Use a fluent style for the middleware layers.	Owen Jacobson	2025-06-17
\| \| \| \| \| \|	For endpoints that are unavailable, that default behaviour no longer needs to be specified: `Required(app)` will do that for you. For endpoints that are redirects until setup is completed, `Require(app).with_fallback(…response…)` will do that. To make this a bit harder to break by accident, the default unavailable response is now its own type.
*	Unify `setup_required` middlewares.	Owen Jacobson	2025-06-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The two middlewares were identical but for the specific `IntoResponse` impl used to generate the response when setup has not been completed. However, unifying them while still using `from_fn_with_state` lead to this horrorshow: .route_layer(middleware::from_fn_with_state( app.clone(), \|state, req, next\| { setup::middeware::setup_required(UNAVAILABLE, state, req, next) } )) It's a lot to read, and it surfaces the entire signature of a state-driven middleware `fn` into the call site solely to close over one argument (`UNAVAILABLE`). Rather than doing that, I've converted this middleware into a full blown Tower middleware, following <https://docs.rs/axum/latest/axum/middleware/index.html#towerservice-and-pinboxdyn-future>. I considered taking this further and implementing a custom future to remove the allocation for `Box::pin`, but honestly, that allocation isn't hurting anyone and this code already got long enough in the translation. The new API looks like: .route_layer(setup::Required::or_unavailable(app.clone())) Or like: .route_layer(setup::Required::with_fallback(app.clone(), RESPONSE)) One thing I would have liked to have avoided is the additional `app.clone()` argument, but there isn't a way to extract the _state_ from a request inside of an Axum middleware. It has to be passed in externally - that's what `from_fn_with_state` is doing under the hood, as well. Using `State` as an extractor doesn't work; the `State` extractor is special in a _bunch_ of ways, and this is one of them. Other extractors would work. Realistically, I'd probably want to explore interfaces like .route_layer(setup::Required(app).or_unavailable()) or .route_layer(app.setup().required().or_unavailable())
*	Rename the `login` module to `user`.	Owen Jacobson	2025-03-23
\|
*	Upgrade to Rust 1.85 and Rust 2024 edition.	Owen Jacobson	2025-02-20
\| \| \| \| \| \| \| \|	There are a couple of migration suggestions from `cargo fix --edition` that I have deliberately skipped, which are intended to make sure that the changes to `if let` scoping don't bite us. They don't, I'm pretty sure, and if I turn out to be wrong, I'd rather fix the scoping issues (as they arise) than use `match` (`cargo fix --edition`'s suggestion). This change also includes a bulk reformat and a clippy cleanup. NOTA BENE: As this requires a new Rust toolchain, you'll need to update Rust (`rustup update`, normally) or the server won't build. This also applies to the Debian builder Docker image; it'll need to be rebuilt (from scratch, pulling its base image again) as well.
*	Rename the project to `pilcrow`.	Owen Jacobson	2024-11-08
\|
*	Limit background expiry to the API.	Owen Jacobson	2024-10-31
\| \| \| \| \| \|	Using requests to drive background work (expiring things, mainly) is a hack to avoid the complexity of background workers, but it's reaching its limits. In the live deployment at `hi.grimoire.ca`, we found that requests for the UI were taking 300+ milliseconds as the expiry process required database access. The DB there is slow, which is a separate issue, but also being accessed many times for little benefit. Since only the API is actually _affected_ by expiry, I've scoped the middleware down to just those endpoints.
*	Remove `hi-recanonicalize`.	Owen Jacobson	2024-10-30
\| \| \| \|	This utility was needed to support a database migration with existing data. I have it on good authority that no further databases exist that are in the state that made this tool necessary.
*	Provide `hi-recanonicalize` to recover from canonicalized-name problems.	Owen Jacobson	2024-10-22
\|
*	Organizational pass on endpoints and routes.	Owen Jacobson	2024-10-16
\|
*	Create APIs for inviting users.	Owen Jacobson	2024-10-11
\|
*	Provide a separate "initial setup" endpoint that creates a user.	Owen Jacobson	2024-10-11
\|
*	Add a `server` header to responses.	Owen Jacobson	2024-10-11
\|
*	Separate `/api/boot` into its own module.	Owen Jacobson	2024-10-05
\|
*	Merge branch 'wip/ui'	Owen Jacobson	2024-10-05
\|\
\| *	Render the UI at /.	Owen Jacobson	2024-10-05
\| \|
* \|	Make a backup of the `.hi` database before applying migrations.	Owen Jacobson	2024-10-05
\| \| \| \| \| \| \| \|	This was motivated by Kit and I both independently discovering that sqlite3 will happily partially apply migrations, leaving the DB in a broken state.
* \|	Start fresh with database migrations.	Owen Jacobson	2024-10-04
\|/ \| \| \| \| \|	The migration path from the original project inception to now was complicated and buggy, and stranded _both_ Kit and I with broken databases due to oversights and incomplete migrations. We've agreed to start fresh, once. If this is mistakenly started with an original-schema-flavour DB, startup will be aborted.
*	Add endpoints for deleting channels and messages.	Owen Jacobson	2024-10-03
\| \| \| \|	It is deliberate that the expire() functions do not use them. To avoid races, the transactions must be committed before events get sent, in both cases, which makes them structurally pretty different.
*	Retire top-level `repo`.	Owen Jacobson	2024-10-02
\| \| \| \|	This helped me discover an organizational scheme I like more.
*	First pass on reorganizing the backend.	Owen Jacobson	2024-10-02
\| \| \| \|	This is primarily renames and repackagings.
*	Expire channels, too.	Owen Jacobson	2024-09-28
\|
*	Delete expired messages out of band.	Owen Jacobson	2024-09-28
\| \| \| \| \| \| \| \|	Trying to reliably do expiry mid-request was causing some anomalies: * Creating a channel with a dup name would fail, then succeed after listing channels. It was very hard to reason about which operations needed to trigger expiry, to fix this "correctly," so now expiry runs on every request.
*	Make `/api/events` a firehose endpoint.	Owen Jacobson	2024-09-27
\| \| \| \| \| \| \| \|	It now includes events for all channels. Clients are responsible for filtering. The schema for channel events has changed; it now includes a channel name and ID, in the same format as the sender's name and ID. They also now include a `"type"` field, whose only valid value (as of this writing) is `"message"`. This is groundwork for delivering message deletion (expiry) events to clients, and notifying clients of channel lifecycle events.
*	rustdoc comment for the (very limited) public API of the crate.	Owen Jacobson	2024-09-25
\| \| \| \| \| \| \| \|	This silences some `-Wclippy::pedantic` warning, and it's just a good thing to do. I've made the choice to have the docs comment face programmers, and to provide `hi --help` and `hi -h` content via Clap attributes instead of inferring it from the docs comment. Internal (private) "rustdoc" comments have been converted to regular comments until I learn how to write better rustdoc.
*	Put database prep somewhere tests can call it.	Owen Jacobson	2024-09-20
\|
*	Remove the HTML client, and expose a JSON API.	Owen Jacobson	2024-09-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	This API structure fell out of a conversation with Kit. Described loosely: kit: ok kit: Here's what I'm picturing in a client kit: list channels, make-new-channel, zero to one active channels, post-to-active. kit: login/sign-up, logout owen: you will likely also want "am I logged in" here kit: sure, whoami
*	Most pass-through errors do not need additional message text	Owen Jacobson	2024-09-18
\|
*	Make BoxedError an implementation detail of InternalError.	Owen Jacobson	2024-09-18
\|
*	Consolidate channel events into a single stream endpoint.	Owen Jacobson	2024-09-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While reviewing [MDN], I noticed this note: > SSE suffers from a limitation to the maximum number of open connections, which can be specially painful when opening various tabs as the limit is per browser and set to a very low number (6). […] This limit is per browser + domain, so that means that you can open 6 SSE connections across all of the tabs to www.example1.com and another 6 SSE connections to www.example2.com. I tested it in Safari; this is true, and once six streams are open, _no_ more requests can be made - in any tab, even a fresh one. Since the design _was_ that each channel had its own events endpoint, this is an obvious operations risk. Any client that tries to read multiple channels' streams will hit this limit quickly. This change consolidates all channel events into a single endpoint: `/events`. This takes a list of channel IDs (as query parameters, one `channel=` param per channel), and streams back events from all listed channels. The previous `/:channel/events` endpoint has been removed. Clients can selectively request events for the channels they're interested in. [MDN]: https://developer.mozilla.org/en-US/docs/Web/API/EventSource
*	Transmit messages via `/:chan/send` and `/:chan/events`.	Owen Jacobson	2024-09-13
\|
*	Push most endpoint and extractor logic into functoins of `App`.	Owen Jacobson	2024-09-12
\| \| \| \| \| \| \| \|	This is, again, groundwork for logic that requires more than just a database connection. The login process has been changed to be more conventional, attempting login _before_ account creation rather than after it. This was not previously possible, because the data access methods used to perform these steps did not return enough information to carry out the workflow in that order. Separating storage from password validation and hashing forces the issue, and makes it clearer _at the App_ whether an account exists or not. This does introduce the possibility of two racing inserts trying to lay claim to the same username. Transaction isolation should ensure that only one of them "wins," which is what you get before this change anyways.
*	Wrap the database pool in an App struct.	Owen Jacobson	2024-09-12
\| \| \| \| \| \|	This is a jumping-off point for adding logic that needs more than just the DB for state, such as chat message handling. The name sucks, but it's the best I've got.
*	Allow any login to create channels.	Owen Jacobson	2024-09-04
\|
*	Expire sessions after 90 days.	Owen Jacobson	2024-09-04
\|
*	Allow login creation and authentication.	Owen Jacobson	2024-09-03
\| \| \| \| \| \| \| \| \| \|	This is a beefy change, as it adds a TON of smaller pieces needed to make this all function: * A database migration. * A ton of new crates for things like password validation, timekeeping, and HTML generation. * A first cut at a module structure for routes, templates, repositories. * A family of ID types, for identifying various kinds of domain thing. * AppError, which _doesn't_ implement Error but can be sent to clients.
*	Store state in sqlite. Default to .hi in the cwd.	Owen Jacobson	2024-08-30
\|
*	Make it an HTTP server	Owen Jacobson	2024-08-30