pilcrow - Run-It-Yourself web chat, maybe

	Commit message (Collapse)	Author	Age
*	Use freestanding structs for `App` components.	ojacobson	2025-10-28
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A "component" is a struct that provides domain-specific operations on the service. `App` largely acts as a way to obtain components for domain-specific operations: for example, given `let app: App = todo!();`, then `app.tokens()` provides a component (`Tokens`) that supports operations on authentication tokens, and `app.conversations()` provides a component (`Conversations`) that supports operations on conversations. This has been a major piece of the server's internal organization for a long time. Historically, these components have been built as short-lived view structs, which hold onto their functional dependencies by reference. A given component was therefore bound to its source `App`, and had a lifetime bounded by the life of that `App` instance or reference. This change turns components into freestanding structs - that is, they can outlive the `App` that provided them. They hold their dependencies by value, not by reference; `App` provides clones when creating a component, instead of borrowing its own state. For the functional dependencies we have today, cloning is a supported and cheap way to share access; details are documented in the individual commits. I'm making this change because we're working on web push, and we discovered while prototyping that it will be useful to be able to support multiple distinct types of web push client. A running Pilcrow server would use a "real" client (which sends real HTTP requests to deliver push messages), while tests would use a client stub (which doesn't). However, to make that work, we'll need to make `App` generic over the client, and the resulting type parameter would then end up in every handler and in most other things that touch the `App` type. This refactoring dramatically reduces the number of places we mention the `App` type, by making most uses rely on specific components instead of relying on `App` generally. There are still a few places that work on `App` generally, rather than on specific components, because an operation requires the use of two or more components. I don't love all this cloning, even if I know in my head that it's fine. The alternatives that we looked at include: * Provider traits, as we have for `Transaction`, that allow endpoints to specify that they want any type that can provide a `Tokens` or a `Conversation` instead of specifically an `App` (`App<PushClient>`). This is wordy enough that we've opted to punt on that approach for now. * Accept the type parameter as the cost of doing business. This is still an open alternative. * Use `dyn` dispatch instead of a type parameter for the push client. This is still an open alternative, though not one I love as we'd be incurring function call indirection without getting any generality out of it. Merges freestanding-app-components into main.
\| *	Convert the last stray tests to be generic over components deriveable from ↵	Owen Jacobson	2025-10-28
\| \| \| \| \| \| \| \| \| \| \| \|	an App. There are a few places in the test fixtures that still call `App` methods directly, as they call `app.users()` (which, as per previous commits, has no `FromRef` impl).
\| *	Convert the `Users` component into a freestanding struct.	Owen Jacobson	2025-10-28
\| \| \| \| \| \| \| \|	Because `Users` is test-only and is not used in any endpoints, it doesn't need a FromRef impl.
\| *	Convert the `Tokens` component into a freestanding struct.	Owen Jacobson	2025-10-28
\| \| \| \| \| \| \| \|	As with the `Setup` component, I've generalized the associated middleware across anything that can provide a `Tokens`, where possible.
\| *	Convert the `Setup` component into a freestanding struct.	Owen Jacobson	2025-10-28
\| \| \| \| \| \| \| \|	The changes to the setup-requiring middleware are probably more general than was strictly needed, but they will make it work with anything that can provide a `Setup` component rather than being bolted to `App` specifically, which feels tidier.
\| *	Convert the `Messages` component to a freestanding struct.	Owen Jacobson	2025-10-28
\| \|
\| *	Convert `Logins` into a freestanding component.	Owen Jacobson	2025-10-28
\| \|
\| *	Convert `Invites` into a freestanding component.	Owen Jacobson	2025-10-28
\| \|
\| *	Convert the `Events` app component into a freestanding struct.	Owen Jacobson	2025-10-28
\| \| \| \| \| \| \| \|	This one doesn't need a FromRef impl at this time, as it's only ever used in a handler that also uses other components and so will need to continue receiving `App`. However, there's little reason not to make the implementatino of the `Events` struct consistent.
\| *	Convert the `Conversations` component into a freestanding struct.	Owen Jacobson	2025-10-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike the previous example, this involves cloning an event broadcaster, as well. This is, per the documentation, how the type may be used. From <https://docs.rs/tokio/latest/tokio/sync/broadcast/fn.channel.html>: > The Sender can be cloned to send to the same channel from multiple points in the process or it can be used concurrently from an `Arc`. The language is less firm than the language sqlx uses for its pool, but the intent is clear enough, and it works in practice.
\| *	Make `Boot` a freestanding app type, rather than a view of ↵	Owen Jacobson	2025-10-28
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`crate::app::App`'s internals. In the course of working on web push, I determined that we probably need to make `App` generic over the web push client we're using, so that tests can use a dummy client while the real app uses a client created at startup and maintained over the life of the program's execution. The most direct implementation of that is to render App as `App<P>`, where the parameter is occupied by the specific web push client type in use. However, doing this requires refactoring at _every_ site that mentions `App`, including every handler, even though the vast majority of those sites will not be concerned with web push. I reviewed a few options with @wlonk: * Accept the type parameter and apply it everywhere, as the cost of supporting web push. * Hard-code the use of a specific web push client. * Insulate handlers &c from `App` via provider traits, mimicing what we do for repository provider traits today. * Treat each app type as a freestanding state in its own right, so that only push-related components need to consider push clients (as far as is feasible). This is a prototype towards that last point, using a simple app component (boot) as a testbed. `FromRef` allows handlers that take a `Boot` to be used in routes that provide an `App`, so this is a contained change. However, the structure of `FromRef` prevents `Boot` from carrying any lifetime narrower than `'static`, so it now holds clones of the state fields it acquires from App, instead of references. This is fine - that's just a database pool, and sqlx's pool type is designed to be shared via cloning. From <https://docs.rs/sqlx/latest/sqlx/struct.Pool.html>: > Cloning Pool is cheap as it is simply a reference-counted handle to the inner pool state.
*	Raise minimum Rust version to 1.90.	Owen Jacobson	2025-10-24
\| \| \| \|	We've made stylistic changes that follow from Rust 1.86 through 1.89 anyways, and I'm not at all confident we aren't using APIs that only exist in those versions.
*	Automatically reorder imports to my preferred style.	Owen Jacobson	2025-08-30
\|
*	Implement storage of synchronized entities in terms of events, not state.	ojacobson	2025-08-27
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conversations, users, messages, and all other "synchronized" entities now have an in-memory implementation of their lifecycle, rather than a database-backed one. These operations take a history, apply one lifecycle change to that history, and emit a new history. Storage is then implemented by applying the events in this new history to the database. The storage methods in repo types, which process these events by emitting SQL statements, make necessary assumptions that the events being passed to them are coherent with the data already in storage. For example, the code to handle a conversation's delete event is allowed to assume that the database already contains a row for that conversation, inserted in response to a prior conversation creation event. Data retrieval is not modified in this commit, and probably never will be without a more thorough storage rewrite. The whole intention of the data modelling approach I've been using is that a single row per entity represents its entire history, in turn so that the data in the database should be legible to people approaching it using normal SQL tools. Developed as an aesthetic response to increasing unease with the lack of an ORM versus the boring-ness of our actual queries. Merges event-based-storage into main.
\| *	Remove entirely-redundant synchronization inside of Broadcaster.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \| \| \| \| \|	Per <https://docs.rs/tokio/latest/tokio/sync/broadcast/struct.Sender.html>, a `Sender` is safe to share between threads. The clone behaviour we want is also provided by its `Clone` impl directly, and we don't need to wrap the sender in an `Arc` to share it. It's amazing what you can find in the docs.
\| *	Consolidate `events.map(…).collect()` calls into `Broadcaster`.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \| \| \| \| \|	This conversion, from an iterator of type-specific events (say, `user::Event` or `message::Event`), into a `Vec<event::Event>`, is prevasive, and it needs to be done each time. Having Broadcaster expose a support method for this cuts down on the repetition, at the cost of a slightly alarming amount of type-system nonsense in `broadcast_from`. Historical footnote: the internal message structure is a Vec and not an individual message so that bulk operations, like expiring channels and messages, won't disconnect everyone if they happen to dispatch more than sixteen messages (current queue depth limit) at once. We trade allocation and memory pressure for keeping the connections alive. _Most_ event publishing is an iterator of one item, so the Vec allocation is redundant.
\| *	Store `User` instances using their events.	Owen Jacobson	2025-08-26
\| \|
\| *	Store `Message` instances using their events.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \|	I found a test bug! The tests for deleting previously-deleted or previously-expired tests were using the wrong user to try to delete those messages. The tests happened to pass anyways because the message authorship check was done after the message lifecycle check. They would have no longer passed; the tests are fixed to use the sender, instead.
\| *	Store `Conversation` instances using their events.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \|	This replaces the approach of having the repo type know about conversation lifecycle in detail. Instead, the repo type accepts events and applies them to the DB blindly. The SQL written to implement each event does, however, embed assumptions about what order events will happen in.
\| *	Allow callers to pass `Instant`s to `Sequence` predicate constructors.	Owen Jacobson	2025-08-26
\|/
*	Split `user` into a chat-facing entity and an authentication-facing entity.	ojacobson	2025-08-26
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The taxonomy is now as follows: * A _login_ is someone's identity for the purposes of authenticating to the service. Logins are not synchronized, and in fact are not published anywhere in the current API. They have a login ID, a name and a password. * A _user_ is someone's identity for the purpose of participating in conversations. Users _are_ synchronized, as before. They have a user ID, a name, and a creation instant for the purposes of synchronization. ## API changes * `GET /api/boot` method now returns a `login` key instead of a `user` key. The structure of the nested value is unchanged. This change is not backwards-compatible; the included client and the docs have been updated accordingly. ## Server implementation * Most app methods that took a `&User` as an identity now take a `&Login` as an identity, instead. Where a `User` is needed, the new `tx.users().for_login(&login)` database access method resolves a `Login` to its corresponding `user::History`, which can then be turned into a `User` at whatever point in time is most appropriate. This adds a few new error cases to methods that traverse the login-to-history-to-user chain. Those cases are presently unreachable, but I've fully fleshed them out so that they don't bite us later. Most of the resulting errors, however, are captured as internal server errors. * There is a new `app.logins()` application entry point, dealing with login identities and password-based logins. * `app.tokens()` is a bit more limited in scope to only things that work with an existing token. That has the side effect of splitting up logging in (in `app.logins().with_password(…)`) and logging out (in `app.tokens().logout(…)`). ## Schema changes The `user` table has been split: * `login` holds the data needed for the user to log in - their login ID, their name, and their password. * `user` now holds only the user ID and the event data for the user's `created` instant. Reconstructing a `User` struct requires joining in data from both `login` and `user`. In theory, the relationship is one-way: every user has a login. In practice, it's reciprocal: every login has a user and every user has a login. Relationships with downstream tables have been modified to suit: * `message` still refers to `user` for authorship information. * `invite` still refers to `user` for originator information. * `token` refers to `login` for authentication information. ## Blimy, that's big Yeah, I know. It's hard to avoid and I'm not sure the effort of making this in incremental steps is worth it. Authentication logic has a way of getting into all sorts of corners, and Pilcrow is no different. In order for the new taxonomy to make sense, all of the places that previously used `User` as a representation of an authenticated identity have to be updated, and it's easier to do that all at once, so that we can retire all the code that _supports_ using a `User` that way. Merges split-user into main.
\| *	Split `user` into a chat-facing entity and an authentication-facing entity.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The taxonomy is now as follows: * A _login_ is someone's identity for the purposes of authenticating to the service. Logins are not synchronized, and in fact are not published anywhere in the current API. They have a login ID, a name and a password. * A _user_ is someone's identity for the purpose of participating in conversations. Users _are_ synchronized, as before. They have a user ID, a name, and a creation instant for the purposes of synchronization. In practice, a user exists for every login - in fact, users' names are stored in the login table and are joined in, rather than being stored redundantly in the user table. A login ID and its corresponding user ID are always equal, and the user and login ID types support conversion and comparison to facilitate their use in this context. Tokens are now associated with logins, not users. The currently-acting identity is passed down into app types as a login, not a user, and then resolved to a user where appropriate within the app methods. As a side effect, the `GET /api/boot` method now returns a `login` key instead of a `user` key. The structure of the nested value is unchanged.
\| *	Generate tokens in memory and then store them.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \|	This is the leading edge of a larger storage refactoring, where repo types stop doing things like generating secrets or deciding whether to carry out an operation. To make this work, there is now a `Token` type that holds the complete state of a token, in memory.
\| *	Split the `user` table into an authentication portion and a chat portion.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \|	We'll be building separate entities around this in future commits, to better separate the authentication data (non-synchronized and indeed "not public") from the chat data (synchronized and public).
\| *	Factor out common authentication test verification steps into helpers.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \|	These checks tended to be wordy, and were prone to being done subtly differently in different locations for no good reason. Centralizing them cleans this up and makes the tests easier to follow, at the expense of making it somewhat harder to follow what the test is specifically checking.
\| *	Return an identity, rather than the parts of an identity, when validating an ↵	Owen Jacobson	2025-08-25
\| \| \| \| \| \| \| \| \| \| \| \|	identity token. This is a small refactoring that's been possible for a while, and we only just noticed.
* \|	Use the imported name, since we have it.	Owen Jacobson	2025-08-26
\|/
*	Group Rust imports by crate.	Owen Jacobson	2025-08-25
\| \| \| \| \| \|	I've been doing this by hand anyways, and this makes it a _ton_ less tedious to maintain. I think it looks nice. This does, however, require nightly - for formatting only.
*	Remove unused response bodies from a number of API endpoints.	ojacobson	2025-08-26
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes the response body from the following methods: * `POST /api/setup` * `POST /api/auth/login` * `POST /api/invite/:id` * `POST /api/password` The bodies returned from these methods were something of a rough guess as to what might be useful. Actual client development has shown that we don't use _any_ of the data from any of these API responses, so let's not tie ourselves to future compatibility by continuing to send them. We can add a body to a bodyless method a _lot_ more easily than we can change the body of a method that already returns one, after all. These changes are not backwards compatible for clients which care about the existing bodies. To my knowledge, there are no such clients; the included client definitely doesn't care. ## Internals Not only does this change stop returning bodies at the API surface, but it also stops retrieving and returning values used internally to construct those responses, simplifying the code a bit in the process. One side effect of this is that tests that need to log in a user now need to manually verify the returned token secret, to convert it back into a user, whereas the previous versions returned both a token secret and a user during password login. I don't love the increase in the size of the tests, but I think it's the right tradeoff (and this change is code net-negative anyways). Merges no-content into main.
\| *	Add a missing docs note about the behaviour of `POST /api/auth/logout` when ↵	Owen Jacobson	2025-08-24
\| \| \| \| \| \| \| \| \| \| \| \|	the current token is invalid. It's inconsistent with the behaviour when the current token is unset. Shrug.
\| *	Stop returning a body from `POST /api/password`.	Owen Jacobson	2025-08-24
\| \|
\| *	Remove the now-unused return value from the final stage of user creation.	Owen Jacobson	2025-08-24
\| \|
\| *	Stop returning an HTTP body from `POST /api/invite/:id`.	Owen Jacobson	2025-08-24
\| \| \| \| \| \| \| \|	As with the previous commits, the body was never actually being used.
\| *	Stop returning body data from `POST /api/auth/login`.	Owen Jacobson	2025-08-24
\| \| \| \| \| \| \| \|	As with `/api/setup`, the response was an ad-hoc choice, which we are not using and which constrains future development just by existing.
\| *	Stop returning body data from `POST /api/setup`.	Owen Jacobson	2025-08-24
\| \| \| \| \| \| \| \|	This API response was always ad-hoc, and the client doesn't use it. To free up some maneuvering room for server refactorings, stop sending it. We can add a response in the future if there's a need.
\| *	Define a canonical "empty" response.	Owen Jacobson	2025-08-24
\|/ \| \| \|	This is a bit tidier and easier to assert on than returning a bare HTTP status code, but is otherwise interchangeable with it.
*	Collapse redundant "deleted_at" timestaps and "deleted" event instants.	Owen Jacobson	2025-08-24
\| \| \| \|	These were separated as there wasn't an obvious way to serialize two fields with the same _type_ with different _prefixes_. Turns out this is a common problem, and someone's written a crate for it that remaps the names for you.
*	Hoist `password` out to the top level.	Owen Jacobson	2025-08-24
\| \| \| \|	Having this buried under `crate::user` makes it hard to split up the roles `user` fulfils right now. Moving it out to its own module makes it a bit tidier to reuse it in a separate, authentication-only way.
*	Add conversions between String and Id<T>.	Owen Jacobson	2025-08-24
\| \| \| \|	There's already an implicit conversion (via serialization), it's just awkward to use. However, we now need those conversions more directly.
*	Include tests (as well as benchmarks, examples, and anything else we add ↵	Owen Jacobson	2025-08-24
\| \| \| \| \| \|	later on) when checking validity of Rust code. I inadvertantly broke a test and my pre-commit hook, which runs `tools/check-lint`, didn't catch it.
*	Factor data-to-JSON-string construction out of stitches.	Owen Jacobson	2025-08-21
\| \| \| \|	This is a recurring and nameable operation; let's give it a name before we use it further.
*	Merge branch 'no-prerendered-markdown'	Owen Jacobson	2025-08-19
\|\
\| *	Render message markdown to HTML inside of `<Message />`.	Owen Jacobson	2025-08-19
\|/ \| \| \|	This simplifies data flow, at the potential expense of re-rendering HTML more often than strictly necessary. Requiring every path that produces a message-shaped object to pre-render markdown made things more interdependent than intended and slowed me down.
*	Rust 1.89: Add elided lifetime parameters (`'_`) where appropriate.	Owen Jacobson	2025-08-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rust 1.89 added a new warning: warning: hiding a lifetime that's elided elsewhere is confusing --> src/setup/repo.rs:4:14 \| 4 \| fn setup(&mut self) -> Setup; \| ^^^^^^^^^ ----- the same lifetime is hidden here \| \| \| the lifetime is elided here \| = help: the same lifetime is referred to in inconsistent ways, making the signature confusing help: use `'_` for type paths \| 4 \| fn setup(&mut self) -> Setup<'_>; \| ++++ I don't entirely agree with the style advice here, but lifetime elision style is an evolving area in Rust and I'd rather track the Rust team's recommendations than invent my own, so I've added all of them.
*	Stop mentioning private error types in doctest boilerplate.	Owen Jacobson	2025-08-13
\| \| \| \|	In 792de8e49fa8a3c04bfb747adadf71572d753055, `crate::cli::Error` was made private. I forgot to update the doctest that mentions it.
*	Define ID types as specializations, rather than newtypes.	Owen Jacobson	2025-07-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is based heavily on the work done for normalized strings, in `crate::normalize`. The key realization in that module is that the logic distinguishing one kind of thing (normalized strings in that case, IDs, in this case) can be packaged up as a type token, and that doing so may reduce the overall complexity. This implementation for ID also borrows heavily from the implementation for normalized strings. It's less flexible: an ID implemented this way can't expose _less_ of `crate::id::ID`'s interface, whereas newtype wrappers can, for example. However, our code doesn't use that flexiblity on purpose anywhere and we're relatively unlikely to change that. In return, the individual ID types require substantially less code - they do not, for example, need to re-implement `Display` for themselves. I very nearly made the trait `Prefix`: ```rust pub trait Prefix { const PREFIX: &str; } ``` however, I think having an effectively-constant method is less surprising overall.
*	Fix some minor weirdness when Pilcrow is (unwisely) used as a library.	ojacobson	2025-07-23
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pilcrow isn't meant to be used as a library, and the only public interface the `pilcrow` lib crate exposes is the CLI entry point. However, we will likely be publishing Pilcrow via crates.io (among other options) one day, and so it will _be usable_ as a library if someone's desperate enough to try. To that end, let's try to be good citizens. This change fixes two issues: * The docs contained links to internal items, which are not actually included in the library documentation. The links are simply removed; the uses of those items were already private anyways. * The CLI `Error` type is no longer part of the public interface, using `impl Trait` (`impl std::error::Error`) shenanigans to hide the error type from callers. (To be clear, this would be _extremely_ rude in code intended for library use.) This frees us up to change the structure of the error type - or to replace it entirely - without making the world's most pedantic semver change in the process. Merges lib-crate-weirdness into main.
\| *	Remove `pilcrow::cli::Error` from the lib crate's public interface.	Owen Jacobson	2025-07-22
\| \| \| \| \| \| \| \|	This might be the pettiest rude change I've ever made to a Rust program. If I saw this - or did this - in code _intend_ to be used as a library, I'd be appalled.
\| *	Stop linking to private documentation items in public docs.	Owen Jacobson	2025-07-22
\|/ \| \| \|	The Pilcrow crate library docs are something of a wart; Pilcrow isn't meant to be used as a library, and the only public interface it exposes is the CLI entry point. However, we will likely be publishing Pilcrow via crates.io (among other options), and so it will _be usable_ as a library if you're desperate enough to try. The docs should at least be coherent.
*	Add a `--umask` option to determine what permissions new files/databases get.	ojacobson	2025-07-23
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new `--umask` option takes one of three values: * `--umask masked`, the default, takes the inherited umask and forces o+rwx on. * `--umask inherit` takes the inherited umask as-is. * `--umask OCTAL` sets the umask to exactly `OCTAL` and is broadly equivalent to `umask OCTAL && pilcrow --umask inherit`. This fell out of a conversation with @wlonk, who is working on notifications. Since notifications may require [VAPID] keys, the server will need a way to store those keys. That would generally be "in the pilcrow database," which lead me to the observation that Pilcrow creates that database as world-readable by default. "World-readable" and "encryption/signing keys" are not things that belong in the same sentence. [VAPID]: https://datatracker.ietf.org/doc/html/rfc8292 The most "obvious" solution would be to set the permissions used for the sqlite database when it's created. That's harder than it sounds: sqlite has no built-in facility for doing this. The closest thing that exists today is the [`modeof`] query parameter, which copies the permissions (and ownership) from some other file. We also can't reliably set the permissions ourselves, as sqlite may - depending on build options and configuration - [create multiple files][wal]. [`modeof`]: https://www.sqlite.org/uri.html [wal]: https://www.sqlite.org/wal.html Using `umask` is a whole-process solution to this. As Pilcrow doesn't attempt to create other files, there's little issue with doing it this way, but this is a design risk for future work if it creates files that are _intended_ to be readable by more than just the Pilcrow daemon user. Merges options-umask into main.