pilcrow - Run-It-Yourself web chat, maybe

	Commit message (Collapse)	Author	Age
*	Don't leave field binding vars uninitialized.	Owen Jacobson	2024-10-30
\| \| \| \| \| \|	This was causing problems for changing passwords: if the user didn't type anything in the "original password" field, the code path to sending that field to the server was just straight-up omitting the field from the message, rather than setting it to empty string, causing a 422 Unprocessable Entity. On investigation we had latent bugs related to this in a bunch of spots.
*	Index to support mass invalidation of tokens during password change.	Owen Jacobson	2024-10-30
\|
*	Track an index-friendly sequence range for both channels and messages.	Owen Jacobson	2024-10-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is meant to limit the amount of messages that event replay needs to examine. Previously, the query required a table scan; table scans on `message` can be quite large, and should really be avoided. The new schema allows replays to be carried out using an index scan. The premise of this change is that, for each (channel, message), there is a range of event sequence numbers that the (channel, message) may appear in. We'll notate that as `[start, end]` in the general case, but they are: * for active channels, `[ch.created_sequence, ch.created_sequence]`. * for deleted channels, `[ch.created_sequence, ch_del.deleted_sequence]`. * for active messages, `[mg.sent_sequence, mg.sent_sequence]`. * for deleted messages, `[mg.sent_seqeunce, mg_del.deleted_sequence]`. (The two "active" ranges may grow in future releases, to support things like channel renames and message editing. That won't change the logic, but those two things will need to update the new `last_sequence` field.) There are two families of operations that need to retrieve based on these ranges: * Boot creates a snapshot as of a specific `resume_at` sequence number, and thus must include any record whose `start` falls on or before `resume_at`. We can't exclude records whose `end` is also before it, as their terminal state may be one that is included in boot (eg. active channels). * Event replay needs to include any events that fall after the same `resume_at`, and thus must include events from any record whose `end` falls strictly after `resume_at`. We can't exclude records whose `start` is also strictly after `resume_at`, as we'd omit them from replay, inappropriately, if we did. This gives three interesting cases: 1. Record fully after `resume_at`: event sequence --increasing--> x-a … x … x+k … resume_at start end This record should be omitted by boot, but included for event replay. 2. Record fully before `resume_at`: event sequence --increasing--> x … x+k … x+a start end resume_at This record should be omitted for event replay, but included for boot. 3. Record overlapping `resume_at`: event sequence --increasing--> x … x+a … x+k start resume_at end This record needs to be included for both boot and event replay. However, the bounds of that range were previously stored in two tables (`channel` and `channel_deleted`, or `message` and `message_deleted`, respectively), which sqlite (indeed most SQL implementations) cannot index. This forced a table scan, leading to the program considering every possible (channel, message) during event replay. This commit adds a `last_sequence` field to channels and messages, which is set to the above values as channels and messages are operated on. This field is indexed, and queries can use it to rapidly identify relevant rows for event replay, cutting down the amount of reading needed to generate events on resume.
*	Resume points are no longer optional.	Owen Jacobson	2024-10-30
\| \| \| \|	This is an inconsequential change for actual clients, since "resume from the beginning" was never a preferred mode of operation, and it simplifies some internals. It should also mean we get better query plans where `coalesce(cond, true)` was previously being used.
*	Remove `hi-recanonicalize`.	Owen Jacobson	2024-10-30
\| \| \| \|	This utility was needed to support a database migration with existing data. I have it on good authority that no further databases exist that are in the state that made this tool necessary.
*	Avoid hard-coding the assumption that delete comes-after create.	Owen Jacobson	2024-10-30
\| \| \| \|	I mean, it always does, but I'd rather get a panic during message/channel reconstruction than wrong results if that assumption is ever violated inadvertently.
*	Prevent deletion of non-empty channels.	Owen Jacobson	2024-10-30
\|
*	Load DB paths from a file, rather than hard-coding them in the systemd unit.	Owen Jacobson	2024-10-30
\|
*	Incrementally less jank invite listing.	Owen Jacobson	2024-10-29
\|
*	Add `change password` UI + API.	Owen Jacobson	2024-10-29
\| \| \| \|	The protocol here re-checks the caller's password, as a "I left myself logged in" anti-pranking check.
*	Restrict deletion to deleting your own messages.	Owen Jacobson	2024-10-29
\|
*	Restrict channel names, too.	Owen Jacobson	2024-10-29
\| \| \| \|	Thankfully, channel creation only happens in one place, so we don't need a state machine for this.
*	fixup! Restrict login names.	Owen Jacobson	2024-10-29
\|
*	Create a dedicated workflow type for creating logins.	Owen Jacobson	2024-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nasty design corner. Logins need to be created in three places: 1. In tests, using app.logins().create(…); 2. On initial setup, using app.setup().initial(…); and 3. When accepting invites, using app.invites().accept(…). These three places do the same thing with respect to logins, but also do a varying mix of other things. Testing is the simplest and _only_ creates a login. Initial setup and invite acceptance both issue a token for the newly-created login. Accepting an invite also invalidates the invite. Previously, those three functions have been copy-pasted variations on a theme. Now that we have validation, the copy-paste approach is no longer tenable; it will become increasingly hard to ensure that the three functions (plus any future functions) remain in synch. To accommodate the variations while consolidating login creation, I've added a typestate-based state machine, which is driven by method calls: * A creation attempt begins with `let create = Create::begin()`. This always succeeds; it packages up arguments used in later steps, but does nothing else. * A creation attempt can be validated using `let validated = create.validate()?`. This may fail. Input validation and password hashing are carried out at this stage, making it potentially expensive. * A validated attempt can be stored in the DB, using `let stored = validated.store(&mut tx).await?`. This may fail. The login will be written to the DB; the caller is responsible for transaction demarcation, to allow other things to take place in the same transaction. * A fully-stored attempt can be used to publish events, using `let login = stored.publish(self.events)`. This always succeeds, and unwraps the state machine to its final product (a `login::History`).
*	Restrict login names.	Owen Jacobson	2024-10-29
\| \| \| \| \| \| \| \|	There's no good reason to use an empty string as your login name, or to use one so long as to annoy others. Names beginning or ending with whitespace, or containing runs of whitespace, are also a technical problem, so they're also prohibited. This change does not implement [UTS #39], as I haven't yet fully understood how to do so. [UTS #39]: https://www.unicode.org/reports/tr39/
*	Update stored sqlx queries	Owen Jacobson	2024-10-29
\|
*	Stop logging every step of postinst	Owen Jacobson	2024-10-29
\|
*	Merge branch 'docker-deb-build'	Owen Jacobson	2024-10-29
\|\
\| *	Package `hi` for Debian.	Owen Jacobson	2024-10-29
\|/ \| \| \| \| \| \| \| \| \| \| \|	This commit provides a Docker-based build process for generating `.deb` packages, which can be run in Docker Desktop. I don't love it, but it's the best option I have _right now_ for doing this. The resulting packages: * Install `hi` (and `hi-recanonicalize`), in `/usr/bin`. * Create a user (`hi`) and a data directory (`/var/lib/hi`). * Create and start a systemd service unit for `hi`. Packages are built for arm64 and amd64 (aka x86_64).
*	Update saved SQL	Owen Jacobson	2024-10-26
\|
*	Invite accept error is Error	Owen Jacobson	2024-10-26
\|
*	Take a swing at putting an invite UI in place.	Owen Jacobson	2024-10-25
\|
*	To make it easier to correlate deletes to the event stream, have deletes ↵	Owen Jacobson	2024-10-25
\| \| \| \|	return the ID of the affected entity.
*	Tests for purged channels and messages.	Owen Jacobson	2024-10-25
\| \| \| \|	This required a re-think of the `.immediately()` combinator, to generalize it to cases where a message is _not_ expected. That (more or less immediately) suggested some mixed combinators, particularly for stream futures (futures of `Option<T>`).
*	Consolidate test helper event functions	Owen Jacobson	2024-10-24
\|
*	Tests for channel, invite, setup, and message deletion events.	Owen Jacobson	2024-10-24
\| \| \| \|	This also found a bug! No live event was being emitted during invite accept. The only way to find out about invites was to reconnect.
*	Tests for initial setup	Owen Jacobson	2024-10-24
\|
*	Tests for accepting invites	Owen Jacobson	2024-10-24
\|
*	Tests for retrieving invites	Owen Jacobson	2024-10-24
\|
*	Tests for channel delete endpoint	Owen Jacobson	2024-10-23
\|
*	Tests for `DELETE /api/messages/:id`	Owen Jacobson	2024-10-23
\|
*	Channel creation tests for expiry, conflicting names	Owen Jacobson	2024-10-23
\|
*	Test boot more thoroughly.	Owen Jacobson	2024-10-23
\|
*	Make sure (most) queries avoid table scans.	Owen Jacobson	2024-10-23
\| \| \| \| \| \| \| \| \| \| \| \|	I've exempted inserts (they never scan in the first place), queries on `event_sequence` (at most one row), and the coalesce()s used for event replay (for now; these are obviously a performance risk area and need addressing). Method: ``` find .sqlx -name 'query-*.json' -exec jq -r '"explain query plan " + .query + ";"' {} + > explain.sql ``` Then go query by query through the resulting file.
*	Merge branch 'broken-tests'	Owen Jacobson	2024-10-23
\|\
\| *	Spell the module name right in the recanonicalize code sample	Owen Jacobson	2024-10-23
\| \|
* \|	Remove tabs in Rust files.	Owen Jacobson	2024-10-22
\| \|
* \|	Sort out the naming of the various parts of an identity.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* A `cookie::Identity` (`IdentityCookie`) is a specialized CookieJar for working with identities. * An `Identity` is a token/login pair. I hope for this to be a bit more legible. In service of this, `Login` is no longer extractable. You have to get an identity.
* \|	Set `charset` params on returned content types.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \|	This is a somewhat indirect change; it removes `mime_guess` in favour of some very, uh, "bespoke" mime detection logic that hardcodes mime types for the small repertoire of file extensions actually present in the UI. `mime_guess` doesn't provide a way to set params as it exports its own `Mime` struct, which doesn't provide `with_params()`.
* \|	Verify node deps during pre-commit	Owen Jacobson	2024-10-22
\| \|
* \|	Let `cargo` handle building the UI, where possible.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \|	This allows skipping the `target/ui` rebuild if the UI has not changed, which has otherwise been a bit of a source of drag on my development speed.
* \|	Merge branch 'unicode-normalization'	Owen Jacobson	2024-10-22
\|\\|
\| *	Provide `hi-recanonicalize` to recover from canonicalized-name problems.	Owen Jacobson	2024-10-22
\| \|
\| *	Canonicalize login and channel names.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Canonicalization does two things: * It prevents duplicate names that differ only by case or only by normalization/encoding sequence; and * It makes certain name-based comparisons "case-insensitive" (generalizing via Unicode's case-folding rules). This change is complicated, as it means that every name now needs to be stored in two forms. Unfortunately, this is _very likely_ a breaking schema change. The migrations in this commit perform a best-effort attempt to canonicalize existing channel or login names, but it's likely any existing channels or logins with non-ASCII characters will not be canonicalize correctly. Since clients look at all channel names and all login names on boot, and since the code in this commit verifies canonicalization when reading from the database, this will effectively make the server un-usuable until any incorrectly-canonicalized values are either manually canonicalized, or removed It might be possible to do better with [the `icu` sqlite3 extension][icu], but (a) I'm not convinced of that and (b) this commit is already huge; adding database extension support would make it far larger. [icu]: https://sqlite.org/src/dir/ext/icu For some references on why it's worth storing usernames this way, see <https://www.b-list.org/weblog/2018/nov/26/case/> and the refernced talk, as well as <https://www.b-list.org/weblog/2018/feb/11/usernames/>. Bennett's treatment of this issue is, to my eye, much more readable than the referenced Unicode technical reports, and I'm inclined to trust his opinion given that he maintains a widely-used, internet-facing user registration library for Django.
\| *	Unicode normalization on input.	Owen Jacobson	2024-10-21
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This normalizes the following values: * login names * passwords * channel names * message bodies, because why not The goal here is to have a canonical representation of these values, so that, for example, the service does not inadvertently host two channels whose names are semantically identical but differ in the specifics of how diacritics are encoded, or two users whose names are identical. Normalization is done on input from the wire, using Serde hooks, and when reading from the database. The `crate::nfc::String` type implements these normalizations (as well as normalizing whenever converted from a `std::string::String` generally). This change does not cover: * Trying to cope with passwords that were created as non-normalized strings, which are now non-verifiable as all the paths to verify passwords normalize the input. * Trying to ensure that non-normalized data in the database compares reasonably to normalized data. Fortunately, we don't _do_ very many string comparisons (I think only login names), so this isn't a huge deal at this stage. Login names will probably have to Get Fixed later on, when we figure out how to handle case folding for login name verification.
*	Mention the message deleted events, and that deleted channels cannot receive ↵	Owen Jacobson	2024-10-19
\| \| \| \|	messages.
*	Make the responses for various data creation requests more consistent.	Owen Jacobson	2024-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general: * If the client can only assume the response is immediately valid (mostly, login creation, where the client cannot monitor the event stream), then 200 Okay, with data describing the server's view of the request. * If the client can monitor for completion by watching the event stream, then 202 Accepted, with data describing the server's view of the request. This comes on the heels of a comment I made on Discord: > hrm > > creating a login: 204 No Content, no body > sending a message: 202 Accepted, no body > creating a channel: 200 Okay, has a body > > past me, what were you on There wasn't any principled reason for this inconsistency; it happened as the endpoints were written at different times and with different states of mind.
*	Package upgrades (Node)	Owen Jacobson	2024-10-19
\|
*	Dependency upgrades (Rust)	Owen Jacobson	2024-10-18
\|
*	Cargo fmt	Owen Jacobson	2024-10-18
\|