pilcrow - Run-It-Yourself web chat, maybe

	Commit message (Collapse)	Author	Age
*	Provide `hi-recanonicalize` to recover from canonicalized-name problems.	Owen Jacobson	2024-10-22
\|
*	Canonicalize login and channel names.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Canonicalization does two things: * It prevents duplicate names that differ only by case or only by normalization/encoding sequence; and * It makes certain name-based comparisons "case-insensitive" (generalizing via Unicode's case-folding rules). This change is complicated, as it means that every name now needs to be stored in two forms. Unfortunately, this is _very likely_ a breaking schema change. The migrations in this commit perform a best-effort attempt to canonicalize existing channel or login names, but it's likely any existing channels or logins with non-ASCII characters will not be canonicalize correctly. Since clients look at all channel names and all login names on boot, and since the code in this commit verifies canonicalization when reading from the database, this will effectively make the server un-usuable until any incorrectly-canonicalized values are either manually canonicalized, or removed It might be possible to do better with [the `icu` sqlite3 extension][icu], but (a) I'm not convinced of that and (b) this commit is already huge; adding database extension support would make it far larger. [icu]: https://sqlite.org/src/dir/ext/icu For some references on why it's worth storing usernames this way, see <https://www.b-list.org/weblog/2018/nov/26/case/> and the refernced talk, as well as <https://www.b-list.org/weblog/2018/feb/11/usernames/>. Bennett's treatment of this issue is, to my eye, much more readable than the referenced Unicode technical reports, and I'm inclined to trust his opinion given that he maintains a widely-used, internet-facing user registration library for Django.
*	Unicode normalization on input.	Owen Jacobson	2024-10-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This normalizes the following values: * login names * passwords * channel names * message bodies, because why not The goal here is to have a canonical representation of these values, so that, for example, the service does not inadvertently host two channels whose names are semantically identical but differ in the specifics of how diacritics are encoded, or two users whose names are identical. Normalization is done on input from the wire, using Serde hooks, and when reading from the database. The `crate::nfc::String` type implements these normalizations (as well as normalizing whenever converted from a `std::string::String` generally). This change does not cover: * Trying to cope with passwords that were created as non-normalized strings, which are now non-verifiable as all the paths to verify passwords normalize the input. * Trying to ensure that non-normalized data in the database compares reasonably to normalized data. Fortunately, we don't _do_ very many string comparisons (I think only login names), so this isn't a huge deal at this stage. Login names will probably have to Get Fixed later on, when we figure out how to handle case folding for login name verification.
*	Mention the message deleted events, and that deleted channels cannot receive ↵	Owen Jacobson	2024-10-19
\| \| \| \|	messages.
*	Make the responses for various data creation requests more consistent.	Owen Jacobson	2024-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general: * If the client can only assume the response is immediately valid (mostly, login creation, where the client cannot monitor the event stream), then 200 Okay, with data describing the server's view of the request. * If the client can monitor for completion by watching the event stream, then 202 Accepted, with data describing the server's view of the request. This comes on the heels of a comment I made on Discord: > hrm > > creating a login: 204 No Content, no body > sending a message: 202 Accepted, no body > creating a channel: 200 Okay, has a body > > past me, what were you on There wasn't any principled reason for this inconsistency; it happened as the endpoints were written at different times and with different states of mind.
*	Merge branch 'wip/retain-deleted'	Owen Jacobson	2024-10-18
\|\
\| *	Explain (some of) the rationale for returning "empty" values in tombstone ↵	Owen Jacobson	2024-10-18
\| \| \| \| \| \| \| \|	events in the docs.
\| *	Retain deleted messages and channels temporarily, to preserve events for replay.	Owen Jacobson	2024-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, when a channel (message) was deleted, `hi` would send events to all _connected_ clients to inform them of the deletion, then delete all memory of the channel (message). Any disconnected client, on reconnecting, would not receive the deletion event, and would de-synch with the service. The creation events were also immediately retconned out of the event stream, as well. With this change, `hi` keeps a record of deleted channels (messages). When replaying events, these records are used to replay the deletion event. After 7 days, the retained data is deleted, both to keep storage under control and to conform to users' expectations that deleted means gone. To match users' likely intuitions about what deletion does, deleting a channel (message) _does_ immediately delete some of its associated data. Channels' names are blanked, and messages' bodies are also blanked. When the event stream is replayed, the original channel.created (message.sent) event is "tombstoned", with an additional `deleted_at` field to inform clients. The included client does not use this field, at least yet. The migration is, once again, screamingingly complicated due to sqlite's limited ALTER TABLE … ALTER COLUMN support. This change also contains capabilities that would allow the API to return 410 Gone for deleted channels or messages, instead of 404. I did experiment with this, but it's tricky to do pervasively, especially since most app-level interfaces return an `Option<Channel>` or `Option<Message>`. Redesigning these to return either `Ok(Channel)` (`Ok(Message)`) or `Err(Error::NotFound)` or `Err(Error::Deleted)` is more work than I wanted to take on for this change, and the utility of 410 Gone responses is not obvious to me. We have other, more pressing API design warts to address.
* \|	Get loaded data using `export let data`, instead of fishing around in $page.	Owen Jacobson	2024-10-17
\|/ \| \| \| \| \|	This is mostly a how-to-Svelte thing. I've also made the API responses for invites a bit more caller-friendly by flattening them and adding the ID field into them. The ID is redundant (the client knows it because the client has the invitation URL), but it makes presenting invitations and actioning them a bit easier.
*	API docs rewrite.	Owen Jacobson	2024-10-16
\| \| \| \| \| \|	Having the whole API in a single file was starting to feel very cramped and constraining. This rewrite breaks it out into sections; as a side effect, the docs are now about 2.5x as long as they were, as the rewrite allows more space for each idea without crowding the page. The docs are best read by running `tools/docs-api`.
*	Return a distinct error when an invite username is in use.	Owen Jacobson	2024-10-11
\| \| \| \|	I've also aligned channel creation with this (it's 409 Conflict). To make server setup more distinct, it now returns 503 Service Unavailable if setup has not been completed.
*	Create APIs for inviting users.	Owen Jacobson	2024-10-11
\|
*	Provide a separate "initial setup" endpoint that creates a user.	Owen Jacobson	2024-10-11
\|
*	Automatically delete database backups if automatic restore is successful.	Owen Jacobson	2024-10-10
\| \| \| \|	Operational experience with the server has shown that leaving the backup in place is not helpful. The near-automatic choice is to immediately delete it, and the server won't start until it has been deleted. If the backup restore succeeded, then we know the user has a copy of their database, since the sqlite3 online backups API promises to make the target database bitwise-identical to the source database, so there's little chance the user will need a duplicate.
*	Align send request fields with message fields by renaming `message` to `body`.	Owen Jacobson	2024-10-09
\|
*	Return a flat message list on boot, not nested lists by channel.	Owen Jacobson	2024-10-09
\| \| \| \|	This is a bit easier to compute, and sets us up nicely for pulling message boot out of the `/api/boot` response entirely.
*	Provide a view of logins to clients.	Owen Jacobson	2024-10-09
\|
*	Simplify channel IDs in events. Remove redundant ones.	Owen Jacobson	2024-10-09
\|
*	Use a two-tier hierarchy for events.	Owen Jacobson	2024-10-09
\| \| \| \|	This will make it much easier to slot in new event types (login events!).
*	Flatten nested `channel` and `message` structs in events and API responses.	Owen Jacobson	2024-10-09
\| \| \| \|	This structure didn't accomplish anything and made certain refactorings harder.
*	Use `/api/boot` to bootstrap the client.	Owen Jacobson	2024-10-05
\| \| \| \| \| \| \| \| \| \| \|	The client now takes an initial snapshot from the response to `/api/boot`, then picks up the event stream at the immediately-successive event to the moment the snapshot was taken. This commit removes the following unused endpoints: * `/api/channels` (GET) * `/api/channels/:channel/messages` (GET) The information therein is now part of the boot response. We can always add 'em back, but I wanted to clear the deck for designing something more capable, for dealing with client needs.
*	Wrote down the DB recovery process	Owen Jacobson	2024-10-05
\|
*	List messages per channel.	Owen Jacobson	2024-10-03
\|
*	Add endpoints for deleting channels and messages.	Owen Jacobson	2024-10-03
\| \| \| \|	It is deliberate that the expire() functions do not use them. To avoid races, the transactions must be committed before events get sent, in both cases, which makes them structurally pretty different.
*	Provide a resume point to bridge clients from state snapshots to the event ↵	Owen Jacobson	2024-10-01
\| \| \| \|	sequence.
*	Expire channels, too.	Owen Jacobson	2024-09-28
\|
*	Push message body into its own object in events	Owen Jacobson	2024-09-28
\|
*	Send created events when channels are added.	Owen Jacobson	2024-09-28
\|
*	Make `/api/events` a firehose endpoint.	Owen Jacobson	2024-09-27
\| \| \| \| \| \| \| \|	It now includes events for all channels. Clients are responsible for filtering. The schema for channel events has changed; it now includes a channel name and ID, in the same format as the sender's name and ID. They also now include a `"type"` field, whose only valid value (as of this writing) is `"message"`. This is groundwork for delivering message deletion (expiry) events to clients, and notifying clients of channel lifecycle events.
*	Use a vector of sequence numbers, not timestamps, to restart /api/events ↵	Owen Jacobson	2024-09-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	streams. The timestamp-based approach had some formal problems. In particular, it assumed that time always went forwards, which isn't necessarily the case: * Alice calls `/api/channels/Cfoo` to send a message. * The server assigns time T to the request. * The server stalls somewhere in send() for a while, before storing and broadcasting the message. If it helps, imagine blocking on `tx.begin().await?` for a while. * In this interval, Bob calls `/api/events?channel=Cfoo`, receives historical messages up to time U (after T), and disconnects. * The server resumes Alice's request and finishes it. * Bob reconnects, setting his Last-Event-Id header to timestamp U. In this scenario, Bob never sees Alice's message unless he starts over. It wasn't in the original stream, since it wasn't broadcast while Bob was subscribed, and it's not in the new stream, since Bob's resume point is after the timestamp on Alice's message. The new approach avoids this. Each message is assigned a _sequence number_ when it's stored. Bob can be sure that his stream included every event, since the resume point is identified by sequence number even if the server processes them out of chronological order: * Alice calls `/api/channels/Cfoo` to send a message. * The server assigns time T to the request. * The server stalls somewhere in send() for a while, before storing and broadcasting. * In this interval, Bob calls `/api/events?channel=Cfoo`, receives historical messages up to sequence Cfoo=N, and disconnects. * The server resumes Alice's request, assigns her message sequence M (after N), and finishes it. * Bob resumes his subscription at Cfoo=N. * Bob receives Alice's message at Cfoo=M. There's a natural mutual exclusion on sequence numbers, enforced by sqlite, which ensures that no two messages have the same sequence number. Since sqlite promises that transactions are serializable by default (and enforces this with a whole-DB write lock), we can be confident that sequence numbers are monotonic, as well. This scenario is, to put it mildly, contrived and unlikely - which is what motivated me to fix it. These kinds of bugs are fiendishly hard to identify, let alone reproduce or understand. I wonder how costly cloning a map is going to turn out to be… A note on database migrations: sqlite3 really, truly has no `alter table … alter column` statement. The only way to modify an existing column is to add the column to a new table. If `alter column` existed, I would create the new `sequence` column in `message` in a much less roundabout way. Fortunately, these migrations assume that they are being run _offline_, so operations like "replace the whole table" are reasonable.
*	Small design doc	Owen Jacobson	2024-09-23
\|
*	Docs updates	Owen Jacobson	2024-09-21
\|
*	Write tests.	Owen Jacobson	2024-09-20
\|
*	Docs updates:	Owen Jacobson	2024-09-20
\| \| \| \| \|	* Document message expiry. * More warnings about Last-Event-Id.
*	Remove the HTML client, and expose a JSON API.	Owen Jacobson	2024-09-20
	This API structure fell out of a conversation with Kit. Described loosely: kit: ok kit: Here's what I'm picturing in a client kit: list channels, make-new-channel, zero to one active channels, post-to-active. kit: login/sign-up, logout owen: you will likely also want "am I logged in" here kit: sure, whoami