pilcrow - Run-It-Yourself web chat, maybe

	Commit message (Collapse)	Author	Age
*	Consolidate `events.map(…).collect()` calls into `Broadcaster`.	Owen Jacobson	2025-08-26
\| \| \| \| \| \|	This conversion, from an iterator of type-specific events (say, `user::Event` or `message::Event`), into a `Vec<event::Event>`, is prevasive, and it needs to be done each time. Having Broadcaster expose a support method for this cuts down on the repetition, at the cost of a slightly alarming amount of type-system nonsense in `broadcast_from`. Historical footnote: the internal message structure is a Vec and not an individual message so that bulk operations, like expiring channels and messages, won't disconnect everyone if they happen to dispatch more than sixteen messages (current queue depth limit) at once. We trade allocation and memory pressure for keeping the connections alive. _Most_ event publishing is an iterator of one item, so the Vec allocation is redundant.
*	Split `user` into a chat-facing entity and an authentication-facing entity.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The taxonomy is now as follows: * A _login_ is someone's identity for the purposes of authenticating to the service. Logins are not synchronized, and in fact are not published anywhere in the current API. They have a login ID, a name and a password. * A _user_ is someone's identity for the purpose of participating in conversations. Users _are_ synchronized, as before. They have a user ID, a name, and a creation instant for the purposes of synchronization. In practice, a user exists for every login - in fact, users' names are stored in the login table and are joined in, rather than being stored redundantly in the user table. A login ID and its corresponding user ID are always equal, and the user and login ID types support conversion and comparison to facilitate their use in this context. Tokens are now associated with logins, not users. The currently-acting identity is passed down into app types as a login, not a user, and then resolved to a user where appropriate within the app methods. As a side effect, the `GET /api/boot` method now returns a `login` key instead of a `user` key. The structure of the nested value is unchanged.
*	Rename the `login` module to `user`.	Owen Jacobson	2025-03-23
\|
*	Upgrade to Rust 1.85 and Rust 2024 edition.	Owen Jacobson	2025-02-20
\| \| \| \| \| \| \| \|	There are a couple of migration suggestions from `cargo fix --edition` that I have deliberately skipped, which are intended to make sure that the changes to `if let` scoping don't bite us. They don't, I'm pretty sure, and if I turn out to be wrong, I'd rather fix the scoping issues (as they arise) than use `match` (`cargo fix --edition`'s suggestion). This change also includes a bulk reformat and a clippy cleanup. NOTA BENE: As this requires a new Rust toolchain, you'll need to update Rust (`rustup update`, normally) or the server won't build. This also applies to the Debian builder Docker image; it'll need to be rebuilt (from scratch, pulling its base image again) as well.
*	Remove `hi-recanonicalize`.	Owen Jacobson	2024-10-30
\| \| \| \|	This utility was needed to support a database migration with existing data. I have it on good authority that no further databases exist that are in the state that made this tool necessary.
*	Create a dedicated workflow type for creating logins.	Owen Jacobson	2024-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nasty design corner. Logins need to be created in three places: 1. In tests, using app.logins().create(…); 2. On initial setup, using app.setup().initial(…); and 3. When accepting invites, using app.invites().accept(…). These three places do the same thing with respect to logins, but also do a varying mix of other things. Testing is the simplest and _only_ creates a login. Initial setup and invite acceptance both issue a token for the newly-created login. Accepting an invite also invalidates the invite. Previously, those three functions have been copy-pasted variations on a theme. Now that we have validation, the copy-paste approach is no longer tenable; it will become increasingly hard to ensure that the three functions (plus any future functions) remain in synch. To accommodate the variations while consolidating login creation, I've added a typestate-based state machine, which is driven by method calls: * A creation attempt begins with `let create = Create::begin()`. This always succeeds; it packages up arguments used in later steps, but does nothing else. * A creation attempt can be validated using `let validated = create.validate()?`. This may fail. Input validation and password hashing are carried out at this stage, making it potentially expensive. * A validated attempt can be stored in the DB, using `let stored = validated.store(&mut tx).await?`. This may fail. The login will be written to the DB; the caller is responsible for transaction demarcation, to allow other things to take place in the same transaction. * A fully-stored attempt can be used to publish events, using `let login = stored.publish(self.events)`. This always succeeds, and unwraps the state machine to its final product (a `login::History`).
*	Restrict login names.	Owen Jacobson	2024-10-29
\| \| \| \| \| \| \| \|	There's no good reason to use an empty string as your login name, or to use one so long as to annoy others. Names beginning or ending with whitespace, or containing runs of whitespace, are also a technical problem, so they're also prohibited. This change does not implement [UTS #39], as I haven't yet fully understood how to do so. [UTS #39]: https://www.unicode.org/reports/tr39/
*	Provide `hi-recanonicalize` to recover from canonicalized-name problems.	Owen Jacobson	2024-10-22
\|
*	Canonicalize login and channel names.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Canonicalization does two things: * It prevents duplicate names that differ only by case or only by normalization/encoding sequence; and * It makes certain name-based comparisons "case-insensitive" (generalizing via Unicode's case-folding rules). This change is complicated, as it means that every name now needs to be stored in two forms. Unfortunately, this is _very likely_ a breaking schema change. The migrations in this commit perform a best-effort attempt to canonicalize existing channel or login names, but it's likely any existing channels or logins with non-ASCII characters will not be canonicalize correctly. Since clients look at all channel names and all login names on boot, and since the code in this commit verifies canonicalization when reading from the database, this will effectively make the server un-usuable until any incorrectly-canonicalized values are either manually canonicalized, or removed It might be possible to do better with [the `icu` sqlite3 extension][icu], but (a) I'm not convinced of that and (b) this commit is already huge; adding database extension support would make it far larger. [icu]: https://sqlite.org/src/dir/ext/icu For some references on why it's worth storing usernames this way, see <https://www.b-list.org/weblog/2018/nov/26/case/> and the refernced talk, as well as <https://www.b-list.org/weblog/2018/feb/11/usernames/>. Bennett's treatment of this issue is, to my eye, much more readable than the referenced Unicode technical reports, and I'm inclined to trust his opinion given that he maintains a widely-used, internet-facing user registration library for Django.
*	Unicode normalization on input.	Owen Jacobson	2024-10-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This normalizes the following values: * login names * passwords * channel names * message bodies, because why not The goal here is to have a canonical representation of these values, so that, for example, the service does not inadvertently host two channels whose names are semantically identical but differ in the specifics of how diacritics are encoded, or two users whose names are identical. Normalization is done on input from the wire, using Serde hooks, and when reading from the database. The `crate::nfc::String` type implements these normalizations (as well as normalizing whenever converted from a `std::string::String` generally). This change does not cover: * Trying to cope with passwords that were created as non-normalized strings, which are now non-verifiable as all the paths to verify passwords normalize the input. * Trying to ensure that non-normalized data in the database compares reasonably to normalized data. Fortunately, we don't _do_ very many string comparisons (I think only login names), so this isn't a huge deal at this stage. Login names will probably have to Get Fixed later on, when we figure out how to handle case folding for login name verification.
*	Organizational pass on endpoints and routes.	Owen Jacobson	2024-10-16
\|
*	Provide a view of logins to clients.	Owen Jacobson	2024-10-09
\|
*	Separate `/api/boot` into its own module.	Owen Jacobson	2024-10-05
\|
*	Retire top-level `repo`.	Owen Jacobson	2024-10-02
\| \| \| \|	This helped me discover an organizational scheme I like more.
*	Split login and token handling.	Owen Jacobson	2024-10-02
\|
*	First pass on reorganizing the backend.	Owen Jacobson	2024-10-02
\| \| \| \|	This is primarily renames and repackagings.
*	Organize IDs into top-level namespaces.	Owen Jacobson	2024-10-01
\| \| \| \|	(This is part of a larger reorganization.)
*	Provide a resume point to bridge clients from state snapshots to the event ↵	Owen Jacobson	2024-10-01
\| \| \| \|	sequence.
*	Prevent racing between `limit_stream` and logging out.	Owen Jacobson	2024-10-01
\|
*	Reimplement the logout machinery in terms of token IDs, not token secrets.	Owen Jacobson	2024-09-29
\| \| \| \| \| \|	This (a) reduces the amount of passing secrets around that's needed, and (b) allows tests to log out in a more straightforwards manner. Ish. The fixtures are a mess, but so is the nomenclature. Fix the latter and the former will probably follow.
*	Shut down the `/api/events` stream when the user logs out or their token ↵	Owen Jacobson	2024-09-29
\| \| \| \| \| \| \| \|	expires. When tokens are revoked (logout or expiry), the server now publishes an internal event via the new `logins` event broadcaster. These events are used to guard the `/api/events` stream. When a token revocation event arrives for the token used to subscribe to the stream, the stream is cut short, disconnecting the client. In service of this, tokens now have IDs, which are non-confidential values that can be used to discuss tokens without their secrets being passed around unnecessarily. These IDs are not (at this time) exposed to clients, but they could be.
*	Wrap credential and credential-holding types to prevent `Debug` leaks.	Owen Jacobson	2024-09-28
\| \| \| \| \| \| \| \| \| \| \| \|	The following values are considered confidential, and should never be logged, even by accident: * `Password`, which is a durable bearer token for a specific Login; * `IdentitySecret`, which is an ephemeral but potentially long-lived bearer token for a specific Login; or * `IdentityToken`, which may hold cookies containing an `IdentitySecret`. These values are now wrapped in types whose `Debug` impls output opaque values, so that they can be included in structs that `#[derive(Debug)]` without requiring any additional care. The wrappers also avoid implementing `Display`, to prevent inadvertent `to_string()`s. We don't bother obfuscating `IdentitySecret`s in memory or in the `.hi` database. There's no point: we'd also need to store the information needed to de-obfuscate them, and they can be freely invalidated and replaced by blanking that table and asking everyone to log in again. Passwords _are_ obfuscated for storage, as they're intended to be durable.
*	Expire channels, too.	Owen Jacobson	2024-09-28
\|
*	Write tests.	Owen Jacobson	2024-09-20
\|
*	Pass dates around by ref more consistently	Owen Jacobson	2024-09-20
\|
*	Expire messages after 90 days.	Owen Jacobson	2024-09-20
\| \| \| \| \| \| \| \| \| \|	This is intended to manage storage growth. A community with broadly steady traffic will now reach a steady state (ish) where the amount of storage in use stays within a steady band. The 90 day threshold is a spitball; this should be made configurable for the community's needs. I've also hoisted expiry out into the `app` classes, to reduce the amount of non-database work repo types are doing. This should make it easier to make expiry configurable later on. Includes incidental cleanup and style changes.
*	Less Option calisthenic	Owen Jacobson	2024-09-20
\|
*	Most pass-through errors do not need additional message text	Owen Jacobson	2024-09-18
\|
*	App methods now return errors that allow not-found cases to be distinguished.	Owen Jacobson	2024-09-18
\|
*	Express record dependencies through types.	Owen Jacobson	2024-09-17
\| \| \| \|	This provides a convenient place to _stick_ "not found" errors, though actually introducing them will come in a later commit.
*	Consolidate most repository types into a repo module.	Owen Jacobson	2024-09-16
\| \| \| \| \| \| \| \| \| \| \| \|	Having them contained in the individual endpoint groups conveyed an unintended sense that their intended scope was _only_ that endpoint group. It also made most repo-related import paths _quite_ long. This splits up the repos as follows: * "General applicability" repos - those that are only loosely connected to a single task, and are likely to be shared between tasks - go in crate::repo. * Specialized repos - those tightly connected to a specific task - go in the module for that task, under crate::PATH::repo. In both cases, each repo goes in its own submodule, to make it easier to use the module name as a namespace. Which category a repo goes in is a judgment call. `crate::channel::repo::broadcast` (formerly `channel::repo::messages`) is used outside of `crate::channel`, for example, but its main purpose is to support channel message broadcasts. It could arguably live under `crate::event::repo::channel`, but the resulting namespace is less legible to me.
*	Suggested fixes from Clippy, via nursery and pedantic sets.	Owen Jacobson	2024-09-13
\|
*	Consolidate the (now) two definitions of DateTime.	Owen Jacobson	2024-09-12
\|
*	Push most endpoint and extractor logic into functoins of `App`.	Owen Jacobson	2024-09-12
	This is, again, groundwork for logic that requires more than just a database connection. The login process has been changed to be more conventional, attempting login _before_ account creation rather than after it. This was not previously possible, because the data access methods used to perform these steps did not return enough information to carry out the workflow in that order. Separating storage from password validation and hashing forces the issue, and makes it clearer _at the App_ whether an account exists or not. This does introduce the possibility of two racing inserts trying to lay claim to the same username. Transaction isolation should ensure that only one of them "wins," which is what you get before this change anyways.