pilcrow - Run-It-Yourself web chat, maybe

	Commit message (Collapse)	Author	Age
*	Define a generic "Failed" case for app-level errors (and a few others).	Owen Jacobson	2025-11-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were previously exporting root causes from one layer of abstraction to the next. For example, anything that called into the database could cause an `sqlx::Error`, so anything that transitively called into that logic exported a `Database(sqlx::Error)` error variant of its own, using `From` to map errors from inner type to outer type. This had a couple of side effects. First, it required each layer of error handling to carry with it a `From` implementation unwrapping and rewrapping root causes from the next layer down. This was particularly apparent in the event and boot endpoints, which had separate error cases unique to crypto key processing errors solely because they happened to involve handling events that contained those keys. There were others, including the pervasive `Database(sqlx::Error)` error variants. Separately, none of the error variants introduced for this purpose were being used for anything other than printing to stderr. All the complexity of From impls and all the structure of the error types was being thrown away at top-level error handlers. This change replaces most of those error types with a generic `Failed` error. A `Failed` carries with it two pieces of information: a (boxed) underlying error, of any boxable `Error` type, and text meant to explain the context and cause of an error. Code which acts on errors can treat `Failed` as a catch-all case, while individually handling errors that signify important cases. Errors can be moved into or out of the `Failed` case by refactoring, as needed. The design of `Failed` is heavily motivated by [anyhow's `context` system][context] as a way for the programmer to capture immediate intention as an explanation for some underlying error. However, instead of accepting the full breadth of types that implement `Display`, a `Failed` can only carry strings as explanation. We don't need the generality at this time, and the implementation underlying it is pretty complex for what it does. [context]: https://docs.rs/anyhow/latest/anyhow/struct.Error.html#method.context This change also means that the full source chain for an error is now available to top-level error handlers, allowing more complete error messages. For example, starting `pilcrow` with an invalid network listen address produces Failed to bind to www.google.com:64209 Caused by: Can't assign requested address (os error 49) instead of the previous Error: Io(Os { code: 49, kind: AddrNotAvailable, message: "Can't assign requested address" }) which previously captured the same _cause_, but without the formatting (see previous commit) and without the _context_ (this commit). Similar improvements are available for many of the error scenarios Pilcrow is designed to give up on. When deciding which errors to use `Failed` with, I've used the heuristic that if something can fail for more than one underlying reason, and if the caller will only ever need to be able to differentiate those reasons after substantial refactoring anyways, then the reasons should collase into `Failed`. If there's either only a single underlying failure reason possible, or only errors arising out of the function body possible, then I've left error handling alone. In the process I've refactored most request-handler-level error mappings to explicitly map `Failed` to `Internal`, rather than having a catch-all mapping for all unhandled errors, to make it easier to remember to add request-level error representations when adding app-level error cases. This also includes helper traits for `Error` and `Result`, to make constructing `Failed` (and errors that include `Failed` as an alternative) easier to do, and some constants for the recurring error messages related to transaction demarcation. I'm not completely happy with the repetitive nature of those error cases, but this is the best I've arrived at so far. As errors are no longer expected to be convertible up the call stack, the `NotFound` and `Duplicate` helper traits for database errors had to change a bit. Those previously assumed that they would be used in the context of an error type implementing `From<sqlx::Error>` (or from another error type with similar characteristics), and that's not the case any more. The resulting idiom for converting a missing value into a domain error is `foo.await.optional().fail(MESSAGE)?.ok_or(DOMAIN ERROR)?`, which is rather clunky, but I've opted not to go further with it. The `Duplicate` helper is just plain gone, as it's not easily generalizable in this structure and using `match` is more tractable for me. Finally, I've changed the convention for error messages from `all lowercase messages in whatever tense i feel like at the moment` to `Sentence-case messages in the past tense`, frequently starting with `Failed to` and a short summary of the task at hand. This, as above, makes error message capitalization between Pilcrow's own messages and messages coming from other libraries/the Rust stdlib much more coherent and less jarring to read.
*	Add an endpoint for creating push subscriptions.	Owen Jacobson	2025-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The semantics of this endpoint are somewhat complex, and are incompletely captured in the associated docs change. For posterity, the intended workflow is: 1. Obtain Pilcrow's current VAPID key by connecting (it's in the events, either from boot or from the event stream). 2. Use the browser push APIs to create a push subscription, using that VAPID key. 3. Send Pilcrow the push subscription endpoint and keys, plus the VAPID key the client used to create it so that the server can detect race conditions with key rotation. 4. Wait for messages to arrive. This commit does not introduce any actual messages, just subscription management endpoints. When the server's VAPID key is rotated, all existing subscriptions are discarded. Without the VAPID key, the server cannot service those subscriptions. We can't exactly notify the broker to stop processing messages on those subscriptions, so this is an incomplete solution to what to do if the key is being rotated due to a compromise, but it's better than nothing. The shape of the API endpoint is heavily informed by the [JSON payload][web-push-json] provided by browser Web Push implementations, to ease client development in a browser-based context. The idea is that a client can take that JSON and send it to the server verbatim, without needing to transform it in any way, to submit the subscription to the server for use. [web-push-json]: https://developer.mozilla.org/en-US/docs/Web/API/PushSubscription/toJSON Push subscriptions are operationally associated with a specific _user agent_, and have no inherent relationship with a Pilcrow login or token (session). Taken as-is, a subscription created by user A could be reused by user B if they share a user agent, even if user A logs out before user B logs in. Pilcrow therefore _logically_ associates push subscriptions with specific tokens, and abandons those subscriptions when the token is invalidated by * logging out, * expiry, or * changing passwords. (There are no other token invalidation workflows at this time.) Stored subscriptions are also abandoned when the server's VAPID key changes.
*	Convert the `Tokens` component into a freestanding struct.	Owen Jacobson	2025-10-28
\| \| \| \|	As with the `Setup` component, I've generalized the associated middleware across anything that can provide a `Tokens`, where possible.
*	Convert `Logins` into a freestanding component.	Owen Jacobson	2025-10-28
\|
*	Automatically reorder imports to my preferred style.	Owen Jacobson	2025-08-30
\|
*	Consolidate `events.map(…).collect()` calls into `Broadcaster`.	Owen Jacobson	2025-08-26
\| \| \| \| \| \|	This conversion, from an iterator of type-specific events (say, `user::Event` or `message::Event`), into a `Vec<event::Event>`, is prevasive, and it needs to be done each time. Having Broadcaster expose a support method for this cuts down on the repetition, at the cost of a slightly alarming amount of type-system nonsense in `broadcast_from`. Historical footnote: the internal message structure is a Vec and not an individual message so that bulk operations, like expiring channels and messages, won't disconnect everyone if they happen to dispatch more than sixteen messages (current queue depth limit) at once. We trade allocation and memory pressure for keeping the connections alive. _Most_ event publishing is an iterator of one item, so the Vec allocation is redundant.
*	Split `user` into a chat-facing entity and an authentication-facing entity.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The taxonomy is now as follows: * A _login_ is someone's identity for the purposes of authenticating to the service. Logins are not synchronized, and in fact are not published anywhere in the current API. They have a login ID, a name and a password. * A _user_ is someone's identity for the purpose of participating in conversations. Users _are_ synchronized, as before. They have a user ID, a name, and a creation instant for the purposes of synchronization. In practice, a user exists for every login - in fact, users' names are stored in the login table and are joined in, rather than being stored redundantly in the user table. A login ID and its corresponding user ID are always equal, and the user and login ID types support conversion and comparison to facilitate their use in this context. Tokens are now associated with logins, not users. The currently-acting identity is passed down into app types as a login, not a user, and then resolved to a user where appropriate within the app methods. As a side effect, the `GET /api/boot` method now returns a `login` key instead of a `user` key. The structure of the nested value is unchanged.
*	Rename the `login` module to `user`.	Owen Jacobson	2025-03-23
\|
*	Rename `user` to `login` at the database.	Owen Jacobson	2025-03-23
\|
*	Ensure `must_use` warnings fire even after results are unwrapped.	Owen Jacobson	2025-02-21
\|
*	Upgrade to Rust 1.85 and Rust 2024 edition.	Owen Jacobson	2025-02-20
\| \| \| \| \| \| \| \|	There are a couple of migration suggestions from `cargo fix --edition` that I have deliberately skipped, which are intended to make sure that the changes to `if let` scoping don't bite us. They don't, I'm pretty sure, and if I turn out to be wrong, I'd rather fix the scoping issues (as they arise) than use `match` (`cargo fix --edition`'s suggestion). This change also includes a bulk reformat and a clippy cleanup. NOTA BENE: As this requires a new Rust toolchain, you'll need to update Rust (`rustup update`, normally) or the server won't build. This also applies to the Debian builder Docker image; it'll need to be rebuilt (from scratch, pulling its base image again) as well.
*	Resume points are no longer optional.	Owen Jacobson	2024-10-30
\| \| \| \|	This is an inconsequential change for actual clients, since "resume from the beginning" was never a preferred mode of operation, and it simplifies some internals. It should also mean we get better query plans where `coalesce(cond, true)` was previously being used.
*	Remove `hi-recanonicalize`.	Owen Jacobson	2024-10-30
\| \| \| \|	This utility was needed to support a database migration with existing data. I have it on good authority that no further databases exist that are in the state that made this tool necessary.
*	Add `change password` UI + API.	Owen Jacobson	2024-10-29
\| \| \| \|	The protocol here re-checks the caller's password, as a "I left myself logged in" anti-pranking check.
*	fixup! Restrict login names.	Owen Jacobson	2024-10-29
\|
*	Create a dedicated workflow type for creating logins.	Owen Jacobson	2024-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nasty design corner. Logins need to be created in three places: 1. In tests, using app.logins().create(…); 2. On initial setup, using app.setup().initial(…); and 3. When accepting invites, using app.invites().accept(…). These three places do the same thing with respect to logins, but also do a varying mix of other things. Testing is the simplest and _only_ creates a login. Initial setup and invite acceptance both issue a token for the newly-created login. Accepting an invite also invalidates the invite. Previously, those three functions have been copy-pasted variations on a theme. Now that we have validation, the copy-paste approach is no longer tenable; it will become increasingly hard to ensure that the three functions (plus any future functions) remain in synch. To accommodate the variations while consolidating login creation, I've added a typestate-based state machine, which is driven by method calls: * A creation attempt begins with `let create = Create::begin()`. This always succeeds; it packages up arguments used in later steps, but does nothing else. * A creation attempt can be validated using `let validated = create.validate()?`. This may fail. Input validation and password hashing are carried out at this stage, making it potentially expensive. * A validated attempt can be stored in the DB, using `let stored = validated.store(&mut tx).await?`. This may fail. The login will be written to the DB; the caller is responsible for transaction demarcation, to allow other things to take place in the same transaction. * A fully-stored attempt can be used to publish events, using `let login = stored.publish(self.events)`. This always succeeds, and unwraps the state machine to its final product (a `login::History`).
*	Restrict login names.	Owen Jacobson	2024-10-29
\| \| \| \| \| \| \| \|	There's no good reason to use an empty string as your login name, or to use one so long as to annoy others. Names beginning or ending with whitespace, or containing runs of whitespace, are also a technical problem, so they're also prohibited. This change does not implement [UTS #39], as I haven't yet fully understood how to do so. [UTS #39]: https://www.unicode.org/reports/tr39/
*	Tests for accepting invites	Owen Jacobson	2024-10-24
\|
*	Make sure (most) queries avoid table scans.	Owen Jacobson	2024-10-23
\| \| \| \| \| \| \| \| \| \| \| \|	I've exempted inserts (they never scan in the first place), queries on `event_sequence` (at most one row), and the coalesce()s used for event replay (for now; these are obviously a performance risk area and need addressing). Method: ``` find .sqlx -name 'query-*.json' -exec jq -r '"explain query plan " + .query + ";"' {} + > explain.sql ``` Then go query by query through the resulting file.
*	Sort out the naming of the various parts of an identity.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \|	* A `cookie::Identity` (`IdentityCookie`) is a specialized CookieJar for working with identities. * An `Identity` is a token/login pair. I hope for this to be a bit more legible. In service of this, `Login` is no longer extractable. You have to get an identity.
*	Provide `hi-recanonicalize` to recover from canonicalized-name problems.	Owen Jacobson	2024-10-22
\|
*	Canonicalize login and channel names.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Canonicalization does two things: * It prevents duplicate names that differ only by case or only by normalization/encoding sequence; and * It makes certain name-based comparisons "case-insensitive" (generalizing via Unicode's case-folding rules). This change is complicated, as it means that every name now needs to be stored in two forms. Unfortunately, this is _very likely_ a breaking schema change. The migrations in this commit perform a best-effort attempt to canonicalize existing channel or login names, but it's likely any existing channels or logins with non-ASCII characters will not be canonicalize correctly. Since clients look at all channel names and all login names on boot, and since the code in this commit verifies canonicalization when reading from the database, this will effectively make the server un-usuable until any incorrectly-canonicalized values are either manually canonicalized, or removed It might be possible to do better with [the `icu` sqlite3 extension][icu], but (a) I'm not convinced of that and (b) this commit is already huge; adding database extension support would make it far larger. [icu]: https://sqlite.org/src/dir/ext/icu For some references on why it's worth storing usernames this way, see <https://www.b-list.org/weblog/2018/nov/26/case/> and the refernced talk, as well as <https://www.b-list.org/weblog/2018/feb/11/usernames/>. Bennett's treatment of this issue is, to my eye, much more readable than the referenced Unicode technical reports, and I'm inclined to trust his opinion given that he maintains a widely-used, internet-facing user registration library for Django.
*	Unicode normalization on input.	Owen Jacobson	2024-10-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This normalizes the following values: * login names * passwords * channel names * message bodies, because why not The goal here is to have a canonical representation of these values, so that, for example, the service does not inadvertently host two channels whose names are semantically identical but differ in the specifics of how diacritics are encoded, or two users whose names are identical. Normalization is done on input from the wire, using Serde hooks, and when reading from the database. The `crate::nfc::String` type implements these normalizations (as well as normalizing whenever converted from a `std::string::String` generally). This change does not cover: * Trying to cope with passwords that were created as non-normalized strings, which are now non-verifiable as all the paths to verify passwords normalize the input. * Trying to ensure that non-normalized data in the database compares reasonably to normalized data. Fortunately, we don't _do_ very many string comparisons (I think only login names), so this isn't a huge deal at this stage. Login names will probably have to Get Fixed later on, when we figure out how to handle case folding for login name verification.
*	Make the responses for various data creation requests more consistent.	Owen Jacobson	2024-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general: * If the client can only assume the response is immediately valid (mostly, login creation, where the client cannot monitor the event stream), then 200 Okay, with data describing the server's view of the request. * If the client can monitor for completion by watching the event stream, then 202 Accepted, with data describing the server's view of the request. This comes on the heels of a comment I made on Discord: > hrm > > creating a login: 204 No Content, no body > sending a message: 202 Accepted, no body > creating a channel: 200 Okay, has a body > > past me, what were you on There wasn't any principled reason for this inconsistency; it happened as the endpoints were written at different times and with different states of mind.
*	Retain deleted messages and channels temporarily, to preserve events for replay.	Owen Jacobson	2024-10-17
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, when a channel (message) was deleted, `hi` would send events to all _connected_ clients to inform them of the deletion, then delete all memory of the channel (message). Any disconnected client, on reconnecting, would not receive the deletion event, and would de-synch with the service. The creation events were also immediately retconned out of the event stream, as well. With this change, `hi` keeps a record of deleted channels (messages). When replaying events, these records are used to replay the deletion event. After 7 days, the retained data is deleted, both to keep storage under control and to conform to users' expectations that deleted means gone. To match users' likely intuitions about what deletion does, deleting a channel (message) _does_ immediately delete some of its associated data. Channels' names are blanked, and messages' bodies are also blanked. When the event stream is replayed, the original channel.created (message.sent) event is "tombstoned", with an additional `deleted_at` field to inform clients. The included client does not use this field, at least yet. The migration is, once again, screamingingly complicated due to sqlite's limited ALTER TABLE … ALTER COLUMN support. This change also contains capabilities that would allow the API to return 410 Gone for deleted channels or messages, instead of 404. I did experiment with this, but it's tricky to do pervasively, especially since most app-level interfaces return an `Option<Channel>` or `Option<Message>`. Redesigning these to return either `Ok(Channel)` (`Ok(Message)`) or `Err(Error::NotFound)` or `Err(Error::Deleted)` is more work than I wanted to take on for this change, and the utility of 410 Gone responses is not obvious to me. We have other, more pressing API design warts to address.
*	Organizational pass on endpoints and routes.	Owen Jacobson	2024-10-16
\|
*	Split the login transaction, to reduce database contention during login	Owen Jacobson	2024-10-11
\|
*	Stop creating accounts during login.	Owen Jacobson	2024-10-11
\|
*	Blind debug output for StoredHash.	Owen Jacobson	2024-10-11
\| \| \| \|	This is a little excessive, as PasswordHash (which StoredHash converts to) _does_ derive Debug and exposes the hash, but I'll feel better if the hash never ends up in logs.
*	Provide a view of logins to clients.	Owen Jacobson	2024-10-09
\|
*	Separate `/api/boot` into its own module.	Owen Jacobson	2024-10-05
\|
*	Use `/api/boot` to bootstrap the client.	Owen Jacobson	2024-10-05
\| \| \| \| \| \| \| \| \| \| \|	The client now takes an initial snapshot from the response to `/api/boot`, then picks up the event stream at the immediately-successive event to the moment the snapshot was taken. This commit removes the following unused endpoints: * `/api/channels` (GET) * `/api/channels/:channel/messages` (GET) The information therein is now part of the boot response. We can always add 'em back, but I wanted to clear the deck for designing something more capable, for dealing with client needs.
*	Retire top-level `repo`.	Owen Jacobson	2024-10-02
\| \| \| \|	This helped me discover an organizational scheme I like more.
*	Split login and token handling.	Owen Jacobson	2024-10-02
\|
*	First pass on reorganizing the backend.	Owen Jacobson	2024-10-02
\| \| \| \|	This is primarily renames and repackagings.
*	Organize IDs into top-level namespaces.	Owen Jacobson	2024-10-01
\| \| \| \|	(This is part of a larger reorganization.)
*	Provide a resume point to bridge clients from state snapshots to the event ↵	Owen Jacobson	2024-10-01
\| \| \| \|	sequence.
*	Prevent racing between `limit_stream` and logging out.	Owen Jacobson	2024-10-01
\|
*	Reimplement the logout machinery in terms of token IDs, not token secrets.	Owen Jacobson	2024-09-29
\| \| \| \| \| \|	This (a) reduces the amount of passing secrets around that's needed, and (b) allows tests to log out in a more straightforwards manner. Ish. The fixtures are a mess, but so is the nomenclature. Fix the latter and the former will probably follow.
*	Shut down the `/api/events` stream when the user logs out or their token ↵	Owen Jacobson	2024-09-29
\| \| \| \| \| \| \| \|	expires. When tokens are revoked (logout or expiry), the server now publishes an internal event via the new `logins` event broadcaster. These events are used to guard the `/api/events` stream. When a token revocation event arrives for the token used to subscribe to the stream, the stream is cut short, disconnecting the client. In service of this, tokens now have IDs, which are non-confidential values that can be used to discuss tokens without their secrets being passed around unnecessarily. These IDs are not (at this time) exposed to clients, but they could be.
*	Wrap credential and credential-holding types to prevent `Debug` leaks.	Owen Jacobson	2024-09-28
\| \| \| \| \| \| \| \| \| \| \| \|	The following values are considered confidential, and should never be logged, even by accident: * `Password`, which is a durable bearer token for a specific Login; * `IdentitySecret`, which is an ephemeral but potentially long-lived bearer token for a specific Login; or * `IdentityToken`, which may hold cookies containing an `IdentitySecret`. These values are now wrapped in types whose `Debug` impls output opaque values, so that they can be included in structs that `#[derive(Debug)]` without requiring any additional care. The wrappers also avoid implementing `Display`, to prevent inadvertent `to_string()`s. We don't bother obfuscating `IdentitySecret`s in memory or in the `.hi` database. There's no point: we'd also need to store the information needed to de-obfuscate them, and they can be freely invalidated and replaced by blanking that table and asking everyone to log in again. Passwords _are_ obfuscated for storage, as they're intended to be durable.
*	Expire channels, too.	Owen Jacobson	2024-09-28
\|
*	Browsers default Path= to the directory part of the request URI.	Owen Jacobson	2024-09-27
\| \| \| \|	This change makes the identity cookie available throughout `/api`.
*	Retire `fixtures::error::expected!`.	Owen Jacobson	2024-09-25
\| \| \| \|	I had no idea `std` included a `matches!` macro, and I feel we're better off using it.
*	Crank up the Clippy warnings.	Owen Jacobson	2024-09-25
\| \| \| \|	This'll catch style issues, mostly.
*	rustdoc comment for the (very limited) public API of the crate.	Owen Jacobson	2024-09-25
\| \| \| \| \| \| \| \|	This silences some `-Wclippy::pedantic` warning, and it's just a good thing to do. I've made the choice to have the docs comment face programmers, and to provide `hi --help` and `hi -h` content via Clap attributes instead of inferring it from the docs comment. Internal (private) "rustdoc" comments have been converted to regular comments until I learn how to write better rustdoc.
*	Write tests.	Owen Jacobson	2024-09-20
\|
*	Pass dates around by ref more consistently	Owen Jacobson	2024-09-20
\|
*	Remove the HTML client, and expose a JSON API.	Owen Jacobson	2024-09-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	This API structure fell out of a conversation with Kit. Described loosely: kit: ok kit: Here's what I'm picturing in a client kit: list channels, make-new-channel, zero to one active channels, post-to-active. kit: login/sign-up, logout owen: you will likely also want "am I logged in" here kit: sure, whoami
*	Expire messages after 90 days.	Owen Jacobson	2024-09-20
\| \| \| \| \| \| \| \| \| \|	This is intended to manage storage growth. A community with broadly steady traffic will now reach a steady state (ish) where the amount of storage in use stays within a steady band. The 90 day threshold is a spitball; this should be made configurable for the community's needs. I've also hoisted expiry out into the `app` classes, to reduce the amount of non-database work repo types are doing. This should make it easier to make expiry configurable later on. Includes incidental cleanup and style changes.