pilcrow - Run-It-Yourself web chat, maybe

	Commit message (Collapse)	Author	Age
*	Rename the `login` module to `user`.	Owen Jacobson	2025-03-23
\|
*	Upgrade to Rust 1.85 and Rust 2024 edition.	Owen Jacobson	2025-02-20
\| \| \| \| \| \| \| \|	There are a couple of migration suggestions from `cargo fix --edition` that I have deliberately skipped, which are intended to make sure that the changes to `if let` scoping don't bite us. They don't, I'm pretty sure, and if I turn out to be wrong, I'd rather fix the scoping issues (as they arise) than use `match` (`cargo fix --edition`'s suggestion). This change also includes a bulk reformat and a clippy cleanup. NOTA BENE: As this requires a new Rust toolchain, you'll need to update Rust (`rustup update`, normally) or the server won't build. This also applies to the Debian builder Docker image; it'll need to be rebuilt (from scratch, pulling its base image again) as well.
*	Upgrade Axum to 0.8.1.	Owen Jacobson	2025-02-19
\|
*	Shorten the default retention, dramatically.	Owen Jacobson	2024-11-07
\| \| \| \| \| \| \| \| \| \| \| \|	The original retention values were loosely based on Slack's retention, for lack of a more specific motivator. Today's election results have changed my views; the service now defaults to retention more in line with the needs of communities for which deep message history may be a risk: * Unused channels expire after 7 days. * Used channels expire when their last message expires (as before). * Deleted channels are purged after 6 hours (which is in line with the purge behaviour of messages). * Messages expire after 15 days. * Deleted messages are purged after 6 hours (as before). No changes have been made to token expiry.
*	Track an index-friendly sequence range for both channels and messages.	Owen Jacobson	2024-10-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is meant to limit the amount of messages that event replay needs to examine. Previously, the query required a table scan; table scans on `message` can be quite large, and should really be avoided. The new schema allows replays to be carried out using an index scan. The premise of this change is that, for each (channel, message), there is a range of event sequence numbers that the (channel, message) may appear in. We'll notate that as `[start, end]` in the general case, but they are: * for active channels, `[ch.created_sequence, ch.created_sequence]`. * for deleted channels, `[ch.created_sequence, ch_del.deleted_sequence]`. * for active messages, `[mg.sent_sequence, mg.sent_sequence]`. * for deleted messages, `[mg.sent_seqeunce, mg_del.deleted_sequence]`. (The two "active" ranges may grow in future releases, to support things like channel renames and message editing. That won't change the logic, but those two things will need to update the new `last_sequence` field.) There are two families of operations that need to retrieve based on these ranges: * Boot creates a snapshot as of a specific `resume_at` sequence number, and thus must include any record whose `start` falls on or before `resume_at`. We can't exclude records whose `end` is also before it, as their terminal state may be one that is included in boot (eg. active channels). * Event replay needs to include any events that fall after the same `resume_at`, and thus must include events from any record whose `end` falls strictly after `resume_at`. We can't exclude records whose `start` is also strictly after `resume_at`, as we'd omit them from replay, inappropriately, if we did. This gives three interesting cases: 1. Record fully after `resume_at`: event sequence --increasing--> x-a … x … x+k … resume_at start end This record should be omitted by boot, but included for event replay. 2. Record fully before `resume_at`: event sequence --increasing--> x … x+k … x+a start end resume_at This record should be omitted for event replay, but included for boot. 3. Record overlapping `resume_at`: event sequence --increasing--> x … x+a … x+k start resume_at end This record needs to be included for both boot and event replay. However, the bounds of that range were previously stored in two tables (`channel` and `channel_deleted`, or `message` and `message_deleted`, respectively), which sqlite (indeed most SQL implementations) cannot index. This forced a table scan, leading to the program considering every possible (channel, message) during event replay. This commit adds a `last_sequence` field to channels and messages, which is set to the above values as channels and messages are operated on. This field is indexed, and queries can use it to rapidly identify relevant rows for event replay, cutting down the amount of reading needed to generate events on resume.
*	Resume points are no longer optional.	Owen Jacobson	2024-10-30
\| \| \| \|	This is an inconsequential change for actual clients, since "resume from the beginning" was never a preferred mode of operation, and it simplifies some internals. It should also mean we get better query plans where `coalesce(cond, true)` was previously being used.
*	Remove `hi-recanonicalize`.	Owen Jacobson	2024-10-30
\| \| \| \|	This utility was needed to support a database migration with existing data. I have it on good authority that no further databases exist that are in the state that made this tool necessary.
*	Avoid hard-coding the assumption that delete comes-after create.	Owen Jacobson	2024-10-30
\| \| \| \|	I mean, it always does, but I'd rather get a panic during message/channel reconstruction than wrong results if that assumption is ever violated inadvertently.
*	Prevent deletion of non-empty channels.	Owen Jacobson	2024-10-30
\|
*	Restrict channel names, too.	Owen Jacobson	2024-10-29
\| \| \| \|	Thankfully, channel creation only happens in one place, so we don't need a state machine for this.
*	To make it easier to correlate deletes to the event stream, have deletes ↵	Owen Jacobson	2024-10-25
\| \| \| \|	return the ID of the affected entity.
*	Tests for purged channels and messages.	Owen Jacobson	2024-10-25
\| \| \| \|	This required a re-think of the `.immediately()` combinator, to generalize it to cases where a message is _not_ expected. That (more or less immediately) suggested some mixed combinators, particularly for stream futures (futures of `Option<T>`).
*	Consolidate test helper event functions	Owen Jacobson	2024-10-24
\|
*	Tests for retrieving invites	Owen Jacobson	2024-10-24
\|
*	Tests for channel delete endpoint	Owen Jacobson	2024-10-23
\|
*	Channel creation tests for expiry, conflicting names	Owen Jacobson	2024-10-23
\|
*	Make sure (most) queries avoid table scans.	Owen Jacobson	2024-10-23
\| \| \| \| \| \| \| \| \| \| \| \|	I've exempted inserts (they never scan in the first place), queries on `event_sequence` (at most one row), and the coalesce()s used for event replay (for now; these are obviously a performance risk area and need addressing). Method: ``` find .sqlx -name 'query-*.json' -exec jq -r '"explain query plan " + .query + ";"' {} + > explain.sql ``` Then go query by query through the resulting file.
*	Sort out the naming of the various parts of an identity.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \|	* A `cookie::Identity` (`IdentityCookie`) is a specialized CookieJar for working with identities. * An `Identity` is a token/login pair. I hope for this to be a bit more legible. In service of this, `Login` is no longer extractable. You have to get an identity.
*	Provide `hi-recanonicalize` to recover from canonicalized-name problems.	Owen Jacobson	2024-10-22
\|
*	Canonicalize login and channel names.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Canonicalization does two things: * It prevents duplicate names that differ only by case or only by normalization/encoding sequence; and * It makes certain name-based comparisons "case-insensitive" (generalizing via Unicode's case-folding rules). This change is complicated, as it means that every name now needs to be stored in two forms. Unfortunately, this is _very likely_ a breaking schema change. The migrations in this commit perform a best-effort attempt to canonicalize existing channel or login names, but it's likely any existing channels or logins with non-ASCII characters will not be canonicalize correctly. Since clients look at all channel names and all login names on boot, and since the code in this commit verifies canonicalization when reading from the database, this will effectively make the server un-usuable until any incorrectly-canonicalized values are either manually canonicalized, or removed It might be possible to do better with [the `icu` sqlite3 extension][icu], but (a) I'm not convinced of that and (b) this commit is already huge; adding database extension support would make it far larger. [icu]: https://sqlite.org/src/dir/ext/icu For some references on why it's worth storing usernames this way, see <https://www.b-list.org/weblog/2018/nov/26/case/> and the refernced talk, as well as <https://www.b-list.org/weblog/2018/feb/11/usernames/>. Bennett's treatment of this issue is, to my eye, much more readable than the referenced Unicode technical reports, and I'm inclined to trust his opinion given that he maintains a widely-used, internet-facing user registration library for Django.
*	Unicode normalization on input.	Owen Jacobson	2024-10-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This normalizes the following values: * login names * passwords * channel names * message bodies, because why not The goal here is to have a canonical representation of these values, so that, for example, the service does not inadvertently host two channels whose names are semantically identical but differ in the specifics of how diacritics are encoded, or two users whose names are identical. Normalization is done on input from the wire, using Serde hooks, and when reading from the database. The `crate::nfc::String` type implements these normalizations (as well as normalizing whenever converted from a `std::string::String` generally). This change does not cover: * Trying to cope with passwords that were created as non-normalized strings, which are now non-verifiable as all the paths to verify passwords normalize the input. * Trying to ensure that non-normalized data in the database compares reasonably to normalized data. Fortunately, we don't _do_ very many string comparisons (I think only login names), so this isn't a huge deal at this stage. Login names will probably have to Get Fixed later on, when we figure out how to handle case folding for login name verification.
*	Make the responses for various data creation requests more consistent.	Owen Jacobson	2024-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general: * If the client can only assume the response is immediately valid (mostly, login creation, where the client cannot monitor the event stream), then 200 Okay, with data describing the server's view of the request. * If the client can monitor for completion by watching the event stream, then 202 Accepted, with data describing the server's view of the request. This comes on the heels of a comment I made on Discord: > hrm > > creating a login: 204 No Content, no body > sending a message: 202 Accepted, no body > creating a channel: 200 Okay, has a body > > past me, what were you on There wasn't any principled reason for this inconsistency; it happened as the endpoints were written at different times and with different states of mind.
*	Cargo fmt	Owen Jacobson	2024-10-18
\|
*	Switch to blanking tombstoned data with null, not empty string.	Owen Jacobson	2024-10-18
\| \| \| \| \| \| \|	This accomplishes two things: * It removes the need for an additional `channel_name_reservation` table, since `channel.name` now only contains non-null values for active channels, and * It nicely dovetails with the idea that `null` means an unknown value in SQL-land.
*	Retain deleted messages and channels temporarily, to preserve events for replay.	Owen Jacobson	2024-10-17
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, when a channel (message) was deleted, `hi` would send events to all _connected_ clients to inform them of the deletion, then delete all memory of the channel (message). Any disconnected client, on reconnecting, would not receive the deletion event, and would de-synch with the service. The creation events were also immediately retconned out of the event stream, as well. With this change, `hi` keeps a record of deleted channels (messages). When replaying events, these records are used to replay the deletion event. After 7 days, the retained data is deleted, both to keep storage under control and to conform to users' expectations that deleted means gone. To match users' likely intuitions about what deletion does, deleting a channel (message) _does_ immediately delete some of its associated data. Channels' names are blanked, and messages' bodies are also blanked. When the event stream is replayed, the original channel.created (message.sent) event is "tombstoned", with an additional `deleted_at` field to inform clients. The included client does not use this field, at least yet. The migration is, once again, screamingingly complicated due to sqlite's limited ALTER TABLE … ALTER COLUMN support. This change also contains capabilities that would allow the API to return 410 Gone for deleted channels or messages, instead of 404. I did experiment with this, but it's tricky to do pervasively, especially since most app-level interfaces return an `Option<Channel>` or `Option<Message>`. Redesigning these to return either `Ok(Channel)` (`Ok(Message)`) or `Err(Error::NotFound)` or `Err(Error::Deleted)` is more work than I wanted to take on for this change, and the utility of 410 Gone responses is not obvious to me. We have other, more pressing API design warts to address.
*	Organizational pass on endpoints and routes.	Owen Jacobson	2024-10-16
\|
*	Return a distinct error when an invite username is in use.	Owen Jacobson	2024-10-11
\| \| \| \|	I've also aligned channel creation with this (it's 409 Conflict). To make server setup more distinct, it now returns 503 Service Unavailable if setup has not been completed.
*	Return an instance of the client when opening a channel URL directly.	Owen Jacobson	2024-10-10
\|
*	Fix tests broken in f624a6a49c7a924cbaae41b3f73ee3fa655c459e	Owen Jacobson	2024-10-10
\|
*	Normalize `not found` errors a bit.	Owen Jacobson	2024-10-09
\|
*	Align send request fields with message fields by renaming `message` to `body`.	Owen Jacobson	2024-10-09
\|
*	Provide a view of logins to clients.	Owen Jacobson	2024-10-09
\|
*	Use a two-tier hierarchy for events.	Owen Jacobson	2024-10-09
\| \| \| \|	This will make it much easier to slot in new event types (login events!).
*	Flatten nested `channel` and `message` structs in events and API responses.	Owen Jacobson	2024-10-09
\| \| \| \|	This structure didn't accomplish anything and made certain refactorings harder.
*	Separate `/api/boot` into its own module.	Owen Jacobson	2024-10-05
\|
*	Use `/api/boot` to bootstrap the client.	Owen Jacobson	2024-10-05
\| \| \| \| \| \| \| \| \| \| \|	The client now takes an initial snapshot from the response to `/api/boot`, then picks up the event stream at the immediately-successive event to the moment the snapshot was taken. This commit removes the following unused endpoints: * `/api/channels` (GET) * `/api/channels/:channel/messages` (GET) The information therein is now part of the boot response. We can always add 'em back, but I wanted to clear the deck for designing something more capable, for dealing with client needs.
*	Clean up naming and semantics of history accessors.	Owen Jacobson	2024-10-04
\|
*	Stray warnings	Owen Jacobson	2024-10-03
\|
*	List messages per channel.	Owen Jacobson	2024-10-03
\|
*	Add endpoints for deleting channels and messages.	Owen Jacobson	2024-10-03
\| \| \| \|	It is deliberate that the expire() functions do not use them. To avoid races, the transactions must be committed before events get sent, in both cases, which makes them structurally pretty different.
*	Represent channels and messages using a split "History" and "Snapshot" model.	Owen Jacobson	2024-10-03
\| \| \| \| \| \|	This separates the code that figures out what happened to an entity from the code that represents it to a user, and makes it easier to compute a snapshot at a point in time (for things like bootstrap). It also makes the internal logic a bit easier to follow, since it's easier to tell whether you're working with a point in time or with the whole recorded history. This hefty.
*	Package up common event fields as Instant	Owen Jacobson	2024-10-02
\|
*	Retire top-level `repo`.	Owen Jacobson	2024-10-02
\| \| \| \|	This helped me discover an organizational scheme I like more.
*	First pass on reorganizing the backend.	Owen Jacobson	2024-10-02
\| \| \| \|	This is primarily renames and repackagings.
*	Organize IDs into top-level namespaces.	Owen Jacobson	2024-10-01
\| \| \| \|	(This is part of a larger reorganization.)
*	Provide a resume point to bridge clients from state snapshots to the event ↵	Owen Jacobson	2024-10-01
\| \| \| \|	sequence.
*	Track event sequences globally, not per channel.	Owen Jacobson	2024-10-01
\| \| \| \|	Per-channel event sequences were a cute idea, but it made reasoning about event resumption much, much harder (case in point: recovering the order of events in a partially-ordered collection is quadratic, since it's basically graph sort). The minor overhead of a global sequence number is likely tolerable, and this simplifies both the API and the internals.
*	Shut down the `/api/events` stream when the user logs out or their token ↵	Owen Jacobson	2024-09-29
\| \| \| \| \| \| \| \|	expires. When tokens are revoked (logout or expiry), the server now publishes an internal event via the new `logins` event broadcaster. These events are used to guard the `/api/events` stream. When a token revocation event arrives for the token used to subscribe to the stream, the stream is cut short, disconnecting the client. In service of this, tokens now have IDs, which are non-confidential values that can be used to discuss tokens without their secrets being passed around unnecessarily. These IDs are not (at this time) exposed to clients, but they could be.
*	Expire channels, too.	Owen Jacobson	2024-09-28
\|
*	Delete expired messages out of band.	Owen Jacobson	2024-09-28
\| \| \| \| \| \| \| \|	Trying to reliably do expiry mid-request was causing some anomalies: * Creating a channel with a dup name would fail, then succeed after listing channels. It was very hard to reason about which operations needed to trigger expiry, to fix this "correctly," so now expiry runs on every request.