pilcrow - Run-It-Yourself web chat, maybe

	Commit message (Collapse)	Author	Age
*	Define a generic "Failed" case for app-level errors (and a few others).	Owen Jacobson	2025-11-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were previously exporting root causes from one layer of abstraction to the next. For example, anything that called into the database could cause an `sqlx::Error`, so anything that transitively called into that logic exported a `Database(sqlx::Error)` error variant of its own, using `From` to map errors from inner type to outer type. This had a couple of side effects. First, it required each layer of error handling to carry with it a `From` implementation unwrapping and rewrapping root causes from the next layer down. This was particularly apparent in the event and boot endpoints, which had separate error cases unique to crypto key processing errors solely because they happened to involve handling events that contained those keys. There were others, including the pervasive `Database(sqlx::Error)` error variants. Separately, none of the error variants introduced for this purpose were being used for anything other than printing to stderr. All the complexity of From impls and all the structure of the error types was being thrown away at top-level error handlers. This change replaces most of those error types with a generic `Failed` error. A `Failed` carries with it two pieces of information: a (boxed) underlying error, of any boxable `Error` type, and text meant to explain the context and cause of an error. Code which acts on errors can treat `Failed` as a catch-all case, while individually handling errors that signify important cases. Errors can be moved into or out of the `Failed` case by refactoring, as needed. The design of `Failed` is heavily motivated by [anyhow's `context` system][context] as a way for the programmer to capture immediate intention as an explanation for some underlying error. However, instead of accepting the full breadth of types that implement `Display`, a `Failed` can only carry strings as explanation. We don't need the generality at this time, and the implementation underlying it is pretty complex for what it does. [context]: https://docs.rs/anyhow/latest/anyhow/struct.Error.html#method.context This change also means that the full source chain for an error is now available to top-level error handlers, allowing more complete error messages. For example, starting `pilcrow` with an invalid network listen address produces Failed to bind to www.google.com:64209 Caused by: Can't assign requested address (os error 49) instead of the previous Error: Io(Os { code: 49, kind: AddrNotAvailable, message: "Can't assign requested address" }) which previously captured the same _cause_, but without the formatting (see previous commit) and without the _context_ (this commit). Similar improvements are available for many of the error scenarios Pilcrow is designed to give up on. When deciding which errors to use `Failed` with, I've used the heuristic that if something can fail for more than one underlying reason, and if the caller will only ever need to be able to differentiate those reasons after substantial refactoring anyways, then the reasons should collase into `Failed`. If there's either only a single underlying failure reason possible, or only errors arising out of the function body possible, then I've left error handling alone. In the process I've refactored most request-handler-level error mappings to explicitly map `Failed` to `Internal`, rather than having a catch-all mapping for all unhandled errors, to make it easier to remember to add request-level error representations when adding app-level error cases. This also includes helper traits for `Error` and `Result`, to make constructing `Failed` (and errors that include `Failed` as an alternative) easier to do, and some constants for the recurring error messages related to transaction demarcation. I'm not completely happy with the repetitive nature of those error cases, but this is the best I've arrived at so far. As errors are no longer expected to be convertible up the call stack, the `NotFound` and `Duplicate` helper traits for database errors had to change a bit. Those previously assumed that they would be used in the context of an error type implementing `From<sqlx::Error>` (or from another error type with similar characteristics), and that's not the case any more. The resulting idiom for converting a missing value into a domain error is `foo.await.optional().fail(MESSAGE)?.ok_or(DOMAIN ERROR)?`, which is rather clunky, but I've opted not to go further with it. The `Duplicate` helper is just plain gone, as it's not easily generalizable in this structure and using `match` is more tractable for me. Finally, I've changed the convention for error messages from `all lowercase messages in whatever tense i feel like at the moment` to `Sentence-case messages in the past tense`, frequently starting with `Failed to` and a short summary of the task at hand. This, as above, makes error message capitalization between Pilcrow's own messages and messages coming from other libraries/the Rust stdlib much more coherent and less jarring to read.
*	Add an endpoint for creating push subscriptions.	Owen Jacobson	2025-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The semantics of this endpoint are somewhat complex, and are incompletely captured in the associated docs change. For posterity, the intended workflow is: 1. Obtain Pilcrow's current VAPID key by connecting (it's in the events, either from boot or from the event stream). 2. Use the browser push APIs to create a push subscription, using that VAPID key. 3. Send Pilcrow the push subscription endpoint and keys, plus the VAPID key the client used to create it so that the server can detect race conditions with key rotation. 4. Wait for messages to arrive. This commit does not introduce any actual messages, just subscription management endpoints. When the server's VAPID key is rotated, all existing subscriptions are discarded. Without the VAPID key, the server cannot service those subscriptions. We can't exactly notify the broker to stop processing messages on those subscriptions, so this is an incomplete solution to what to do if the key is being rotated due to a compromise, but it's better than nothing. The shape of the API endpoint is heavily informed by the [JSON payload][web-push-json] provided by browser Web Push implementations, to ease client development in a browser-based context. The idea is that a client can take that JSON and send it to the server verbatim, without needing to transform it in any way, to submit the subscription to the server for use. [web-push-json]: https://developer.mozilla.org/en-US/docs/Web/API/PushSubscription/toJSON Push subscriptions are operationally associated with a specific _user agent_, and have no inherent relationship with a Pilcrow login or token (session). Taken as-is, a subscription created by user A could be reused by user B if they share a user agent, even if user A logs out before user B logs in. Pilcrow therefore _logically_ associates push subscriptions with specific tokens, and abandons those subscriptions when the token is invalidated by * logging out, * expiry, or * changing passwords. (There are no other token invalidation workflows at this time.) Stored subscriptions are also abandoned when the server's VAPID key changes.
*	Convert the `Tokens` component into a freestanding struct.	Owen Jacobson	2025-10-28
\| \| \| \|	As with the `Setup` component, I've generalized the associated middleware across anything that can provide a `Tokens`, where possible.
*	Automatically reorder imports to my preferred style.	Owen Jacobson	2025-08-30
\|
*	Consolidate `events.map(…).collect()` calls into `Broadcaster`.	Owen Jacobson	2025-08-26
\| \| \| \| \| \|	This conversion, from an iterator of type-specific events (say, `user::Event` or `message::Event`), into a `Vec<event::Event>`, is prevasive, and it needs to be done each time. Having Broadcaster expose a support method for this cuts down on the repetition, at the cost of a slightly alarming amount of type-system nonsense in `broadcast_from`. Historical footnote: the internal message structure is a Vec and not an individual message so that bulk operations, like expiring channels and messages, won't disconnect everyone if they happen to dispatch more than sixteen messages (current queue depth limit) at once. We trade allocation and memory pressure for keeping the connections alive. _Most_ event publishing is an iterator of one item, so the Vec allocation is redundant.
*	Split `user` into a chat-facing entity and an authentication-facing entity.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The taxonomy is now as follows: * A _login_ is someone's identity for the purposes of authenticating to the service. Logins are not synchronized, and in fact are not published anywhere in the current API. They have a login ID, a name and a password. * A _user_ is someone's identity for the purpose of participating in conversations. Users _are_ synchronized, as before. They have a user ID, a name, and a creation instant for the purposes of synchronization. In practice, a user exists for every login - in fact, users' names are stored in the login table and are joined in, rather than being stored redundantly in the user table. A login ID and its corresponding user ID are always equal, and the user and login ID types support conversion and comparison to facilitate their use in this context. Tokens are now associated with logins, not users. The currently-acting identity is passed down into app types as a login, not a user, and then resolved to a user where appropriate within the app methods. As a side effect, the `GET /api/boot` method now returns a `login` key instead of a `user` key. The structure of the nested value is unchanged.
*	Generate tokens in memory and then store them.	Owen Jacobson	2025-08-26
\| \| \| \|	This is the leading edge of a larger storage refactoring, where repo types stop doing things like generating secrets or deciding whether to carry out an operation. To make this work, there is now a `Token` type that holds the complete state of a token, in memory.
*	Split the `user` table into an authentication portion and a chat portion.	Owen Jacobson	2025-08-26
\| \| \| \|	We'll be building separate entities around this in future commits, to better separate the authentication data (non-synchronized and indeed "not public") from the chat data (synchronized and public).
*	Return an identity, rather than the parts of an identity, when validating an ↵	Owen Jacobson	2025-08-25
\| \| \| \| \| \|	identity token. This is a small refactoring that's been possible for a while, and we only just noticed.
*	Stop returning a body from `POST /api/password`.	Owen Jacobson	2025-08-24
\|
*	Stop returning body data from `POST /api/auth/login`.	Owen Jacobson	2025-08-24
\| \| \| \|	As with `/api/setup`, the response was an ad-hoc choice, which we are not using and which constrains future development just by existing.
*	Hoist `password` out to the top level.	Owen Jacobson	2025-08-24
\| \| \| \|	Having this buried under `crate::user` makes it hard to split up the roles `user` fulfils right now. Moving it out to its own module makes it a bit tidier to reuse it in a separate, authentication-only way.
*	Rust 1.89: Add elided lifetime parameters (`'_`) where appropriate.	Owen Jacobson	2025-08-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rust 1.89 added a new warning: warning: hiding a lifetime that's elided elsewhere is confusing --> src/setup/repo.rs:4:14 \| 4 \| fn setup(&mut self) -> Setup; \| ^^^^^^^^^ ----- the same lifetime is hidden here \| \| \| the lifetime is elided here \| = help: the same lifetime is referred to in inconsistent ways, making the signature confusing help: use `'_` for type paths \| 4 \| fn setup(&mut self) -> Setup<'_>; \| ++++ I don't entirely agree with the style advice here, but lifetime elision style is an evolving area in Rust and I'd rather track the Rust team's recommendations than invent my own, so I've added all of them.
*	Define ID types as specializations, rather than newtypes.	Owen Jacobson	2025-07-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is based heavily on the work done for normalized strings, in `crate::normalize`. The key realization in that module is that the logic distinguishing one kind of thing (normalized strings in that case, IDs, in this case) can be packaged up as a type token, and that doing so may reduce the overall complexity. This implementation for ID also borrows heavily from the implementation for normalized strings. It's less flexible: an ID implemented this way can't expose _less_ of `crate::id::ID`'s interface, whereas newtype wrappers can, for example. However, our code doesn't use that flexiblity on purpose anywhere and we're relatively unlikely to change that. In return, the individual ID types require substantially less code - they do not, for example, need to re-implement `Display` for themselves. I very nearly made the trait `Prefix`: ```rust pub trait Prefix { const PREFIX: &str; } ``` however, I think having an effectively-constant method is less surprising overall.
*	Remove a stray reference to users being called "logins."	Owen Jacobson	2025-06-23
\| \| \| \|	Missed in 9f7f82dbd9adee8ae18ae7ff2600b3e1dc8fadbc.
*	Merge branch 'prop/minor-cleanups'	Owen Jacobson	2025-04-03
\|\
\| *	The label used to mask "Secret" strings in ↵	Owen Jacobson	2025-03-24
\| \| \| \| \| \| \| \| \| \| \| \|	357116366c1307bedaac6a3dfe9c5ed8e0e0c210 wasn't updated (and wasn't quite correct then, either). I haven't found a way to derive it from the name of the type.
* \|	Rename a bunch of straggler references to `login`.	Owen Jacobson	2025-03-24
\| \|
* \|	Rename the `login` module to `user`.	Owen Jacobson	2025-03-23
\| \|
* \|	Rename `user` to `login` at the database.	Owen Jacobson	2025-03-23
\|/
*	Be a little more pedantic about constant str ref lifetimes	Owen Jacobson	2025-02-21
\|
*	Upgrade to Rust 1.85 and Rust 2024 edition.	Owen Jacobson	2025-02-20
\| \| \| \| \| \| \| \|	There are a couple of migration suggestions from `cargo fix --edition` that I have deliberately skipped, which are intended to make sure that the changes to `if let` scoping don't bite us. They don't, I'm pretty sure, and if I turn out to be wrong, I'd rather fix the scoping issues (as they arise) than use `match` (`cargo fix --edition`'s suggestion). This change also includes a bulk reformat and a clippy cleanup. NOTA BENE: As this requires a new Rust toolchain, you'll need to update Rust (`rustup update`, normally) or the server won't build. This also applies to the Debian builder Docker image; it'll need to be rebuilt (from scratch, pulling its base image again) as well.
*	Upgrade Axum to 0.8.1.	Owen Jacobson	2025-02-19
\|
*	Add `change password` UI + API.	Owen Jacobson	2024-10-29
\| \| \| \|	The protocol here re-checks the caller's password, as a "I left myself logged in" anti-pranking check.
*	Sort out the naming of the various parts of an identity.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \|	* A `cookie::Identity` (`IdentityCookie`) is a specialized CookieJar for working with identities. * An `Identity` is a token/login pair. I hope for this to be a bit more legible. In service of this, `Login` is no longer extractable. You have to get an identity.
*	Canonicalize login and channel names.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Canonicalization does two things: * It prevents duplicate names that differ only by case or only by normalization/encoding sequence; and * It makes certain name-based comparisons "case-insensitive" (generalizing via Unicode's case-folding rules). This change is complicated, as it means that every name now needs to be stored in two forms. Unfortunately, this is _very likely_ a breaking schema change. The migrations in this commit perform a best-effort attempt to canonicalize existing channel or login names, but it's likely any existing channels or logins with non-ASCII characters will not be canonicalize correctly. Since clients look at all channel names and all login names on boot, and since the code in this commit verifies canonicalization when reading from the database, this will effectively make the server un-usuable until any incorrectly-canonicalized values are either manually canonicalized, or removed It might be possible to do better with [the `icu` sqlite3 extension][icu], but (a) I'm not convinced of that and (b) this commit is already huge; adding database extension support would make it far larger. [icu]: https://sqlite.org/src/dir/ext/icu For some references on why it's worth storing usernames this way, see <https://www.b-list.org/weblog/2018/nov/26/case/> and the refernced talk, as well as <https://www.b-list.org/weblog/2018/feb/11/usernames/>. Bennett's treatment of this issue is, to my eye, much more readable than the referenced Unicode technical reports, and I'm inclined to trust his opinion given that he maintains a widely-used, internet-facing user registration library for Django.
*	Unicode normalization on input.	Owen Jacobson	2024-10-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This normalizes the following values: * login names * passwords * channel names * message bodies, because why not The goal here is to have a canonical representation of these values, so that, for example, the service does not inadvertently host two channels whose names are semantically identical but differ in the specifics of how diacritics are encoded, or two users whose names are identical. Normalization is done on input from the wire, using Serde hooks, and when reading from the database. The `crate::nfc::String` type implements these normalizations (as well as normalizing whenever converted from a `std::string::String` generally). This change does not cover: * Trying to cope with passwords that were created as non-normalized strings, which are now non-verifiable as all the paths to verify passwords normalize the input. * Trying to ensure that non-normalized data in the database compares reasonably to normalized data. Fortunately, we don't _do_ very many string comparisons (I think only login names), so this isn't a huge deal at this stage. Login names will probably have to Get Fixed later on, when we figure out how to handle case folding for login name verification.
*	Make the responses for various data creation requests more consistent.	Owen Jacobson	2024-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general: * If the client can only assume the response is immediately valid (mostly, login creation, where the client cannot monitor the event stream), then 200 Okay, with data describing the server's view of the request. * If the client can monitor for completion by watching the event stream, then 202 Accepted, with data describing the server's view of the request. This comes on the heels of a comment I made on Discord: > hrm > > creating a login: 204 No Content, no body > sending a message: 202 Accepted, no body > creating a channel: 200 Okay, has a body > > past me, what were you on There wasn't any principled reason for this inconsistency; it happened as the endpoints were written at different times and with different states of mind.
*	BREAKING: Remove redundant error conversions now that 1.82 is stable.	Owen Jacobson	2024-10-18
\| \| \| \|	MSRV is now 1.82.
*	Retain deleted messages and channels temporarily, to preserve events for replay.	Owen Jacobson	2024-10-17
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, when a channel (message) was deleted, `hi` would send events to all _connected_ clients to inform them of the deletion, then delete all memory of the channel (message). Any disconnected client, on reconnecting, would not receive the deletion event, and would de-synch with the service. The creation events were also immediately retconned out of the event stream, as well. With this change, `hi` keeps a record of deleted channels (messages). When replaying events, these records are used to replay the deletion event. After 7 days, the retained data is deleted, both to keep storage under control and to conform to users' expectations that deleted means gone. To match users' likely intuitions about what deletion does, deleting a channel (message) _does_ immediately delete some of its associated data. Channels' names are blanked, and messages' bodies are also blanked. When the event stream is replayed, the original channel.created (message.sent) event is "tombstoned", with an additional `deleted_at` field to inform clients. The included client does not use this field, at least yet. The migration is, once again, screamingingly complicated due to sqlite's limited ALTER TABLE … ALTER COLUMN support. This change also contains capabilities that would allow the API to return 410 Gone for deleted channels or messages, instead of 404. I did experiment with this, but it's tricky to do pervasively, especially since most app-level interfaces return an `Option<Channel>` or `Option<Message>`. Redesigning these to return either `Ok(Channel)` (`Ok(Message)`) or `Err(Error::NotFound)` or `Err(Error::Deleted)` is more work than I wanted to take on for this change, and the utility of 410 Gone responses is not obvious to me. We have other, more pressing API design warts to address.
*	Organizational pass on endpoints and routes.	Owen Jacobson	2024-10-16
\|
*	Split the login transaction, to reduce database contention during login	Owen Jacobson	2024-10-11
\|
*	Stop creating accounts during login.	Owen Jacobson	2024-10-11
\|
*	Provide a separate "initial setup" endpoint that creates a user.	Owen Jacobson	2024-10-11
\|
*	Make the identity cookie available to routes outside of `/api/`.	Owen Jacobson	2024-10-10
\| \| \| \|	We now (try to) use the identity cookie in `/ch/:channel`. This will not work, because the cookie's path doesn't include `/ch/`.
*	Provide a view of logins to clients.	Owen Jacobson	2024-10-09
\|
*	Represent channels and messages using a split "History" and "Snapshot" model.	Owen Jacobson	2024-10-03
\| \| \| \| \| \|	This separates the code that figures out what happened to an entity from the code that represents it to a user, and makes it easier to compute a snapshot at a point in time (for things like bootstrap). It also makes the internal logic a bit easier to follow, since it's easier to tell whether you're working with a point in time or with the whole recorded history. This hefty.
*	Retire top-level `repo`.	Owen Jacobson	2024-10-02
\| \| \| \|	This helped me discover an organizational scheme I like more.
*	Split login and token handling.	Owen Jacobson	2024-10-02
\|
*	First pass on reorganizing the backend.	Owen Jacobson	2024-10-02
	This is primarily renames and repackagings.