pilcrow - Run-It-Yourself web chat, maybe

	Commit message (Collapse)	Author	Age
*	Upgrade to Rust 1.85 and Rust 2024 edition.	Owen Jacobson	2025-02-20
\| \| \| \| \| \| \| \|	There are a couple of migration suggestions from `cargo fix --edition` that I have deliberately skipped, which are intended to make sure that the changes to `if let` scoping don't bite us. They don't, I'm pretty sure, and if I turn out to be wrong, I'd rather fix the scoping issues (as they arise) than use `match` (`cargo fix --edition`'s suggestion). This change also includes a bulk reformat and a clippy cleanup. NOTA BENE: As this requires a new Rust toolchain, you'll need to update Rust (`rustup update`, normally) or the server won't build. This also applies to the Debian builder Docker image; it'll need to be rebuilt (from scratch, pulling its base image again) as well.
*	Shorten the default retention, dramatically.	Owen Jacobson	2024-11-07
\| \| \| \| \| \| \| \| \| \| \| \|	The original retention values were loosely based on Slack's retention, for lack of a more specific motivator. Today's election results have changed my views; the service now defaults to retention more in line with the needs of communities for which deep message history may be a risk: * Unused channels expire after 7 days. * Used channels expire when their last message expires (as before). * Deleted channels are purged after 6 hours (which is in line with the purge behaviour of messages). * Messages expire after 15 days. * Deleted messages are purged after 6 hours (as before). No changes have been made to token expiry.
*	Remove `hi-recanonicalize`.	Owen Jacobson	2024-10-30
\| \| \| \|	This utility was needed to support a database migration with existing data. I have it on good authority that no further databases exist that are in the state that made this tool necessary.
*	Prevent deletion of non-empty channels.	Owen Jacobson	2024-10-30
\|
*	Restrict channel names, too.	Owen Jacobson	2024-10-29
\| \| \| \|	Thankfully, channel creation only happens in one place, so we don't need a state machine for this.
*	Tests for purged channels and messages.	Owen Jacobson	2024-10-25
\| \| \| \|	This required a re-think of the `.immediately()` combinator, to generalize it to cases where a message is _not_ expected. That (more or less immediately) suggested some mixed combinators, particularly for stream futures (futures of `Option<T>`).
*	Provide `hi-recanonicalize` to recover from canonicalized-name problems.	Owen Jacobson	2024-10-22
\|
*	Canonicalize login and channel names.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Canonicalization does two things: * It prevents duplicate names that differ only by case or only by normalization/encoding sequence; and * It makes certain name-based comparisons "case-insensitive" (generalizing via Unicode's case-folding rules). This change is complicated, as it means that every name now needs to be stored in two forms. Unfortunately, this is _very likely_ a breaking schema change. The migrations in this commit perform a best-effort attempt to canonicalize existing channel or login names, but it's likely any existing channels or logins with non-ASCII characters will not be canonicalize correctly. Since clients look at all channel names and all login names on boot, and since the code in this commit verifies canonicalization when reading from the database, this will effectively make the server un-usuable until any incorrectly-canonicalized values are either manually canonicalized, or removed It might be possible to do better with [the `icu` sqlite3 extension][icu], but (a) I'm not convinced of that and (b) this commit is already huge; adding database extension support would make it far larger. [icu]: https://sqlite.org/src/dir/ext/icu For some references on why it's worth storing usernames this way, see <https://www.b-list.org/weblog/2018/nov/26/case/> and the refernced talk, as well as <https://www.b-list.org/weblog/2018/feb/11/usernames/>. Bennett's treatment of this issue is, to my eye, much more readable than the referenced Unicode technical reports, and I'm inclined to trust his opinion given that he maintains a widely-used, internet-facing user registration library for Django.
*	Unicode normalization on input.	Owen Jacobson	2024-10-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This normalizes the following values: * login names * passwords * channel names * message bodies, because why not The goal here is to have a canonical representation of these values, so that, for example, the service does not inadvertently host two channels whose names are semantically identical but differ in the specifics of how diacritics are encoded, or two users whose names are identical. Normalization is done on input from the wire, using Serde hooks, and when reading from the database. The `crate::nfc::String` type implements these normalizations (as well as normalizing whenever converted from a `std::string::String` generally). This change does not cover: * Trying to cope with passwords that were created as non-normalized strings, which are now non-verifiable as all the paths to verify passwords normalize the input. * Trying to ensure that non-normalized data in the database compares reasonably to normalized data. Fortunately, we don't _do_ very many string comparisons (I think only login names), so this isn't a huge deal at this stage. Login names will probably have to Get Fixed later on, when we figure out how to handle case folding for login name verification.
*	Switch to blanking tombstoned data with null, not empty string.	Owen Jacobson	2024-10-18
\| \| \| \| \| \| \|	This accomplishes two things: * It removes the need for an additional `channel_name_reservation` table, since `channel.name` now only contains non-null values for active channels, and * It nicely dovetails with the idea that `null` means an unknown value in SQL-land.
*	Retain deleted messages and channels temporarily, to preserve events for replay.	Owen Jacobson	2024-10-17
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, when a channel (message) was deleted, `hi` would send events to all _connected_ clients to inform them of the deletion, then delete all memory of the channel (message). Any disconnected client, on reconnecting, would not receive the deletion event, and would de-synch with the service. The creation events were also immediately retconned out of the event stream, as well. With this change, `hi` keeps a record of deleted channels (messages). When replaying events, these records are used to replay the deletion event. After 7 days, the retained data is deleted, both to keep storage under control and to conform to users' expectations that deleted means gone. To match users' likely intuitions about what deletion does, deleting a channel (message) _does_ immediately delete some of its associated data. Channels' names are blanked, and messages' bodies are also blanked. When the event stream is replayed, the original channel.created (message.sent) event is "tombstoned", with an additional `deleted_at` field to inform clients. The included client does not use this field, at least yet. The migration is, once again, screamingingly complicated due to sqlite's limited ALTER TABLE … ALTER COLUMN support. This change also contains capabilities that would allow the API to return 410 Gone for deleted channels or messages, instead of 404. I did experiment with this, but it's tricky to do pervasively, especially since most app-level interfaces return an `Option<Channel>` or `Option<Message>`. Redesigning these to return either `Ok(Channel)` (`Ok(Message)`) or `Err(Error::NotFound)` or `Err(Error::Deleted)` is more work than I wanted to take on for this change, and the utility of 410 Gone responses is not obvious to me. We have other, more pressing API design warts to address.
*	Organizational pass on endpoints and routes.	Owen Jacobson	2024-10-16
\|
*	Return a distinct error when an invite username is in use.	Owen Jacobson	2024-10-11
\| \| \| \|	I've also aligned channel creation with this (it's 409 Conflict). To make server setup more distinct, it now returns 503 Service Unavailable if setup has not been completed.
*	Return an instance of the client when opening a channel URL directly.	Owen Jacobson	2024-10-10
\|
*	Provide a view of logins to clients.	Owen Jacobson	2024-10-09
\|
*	Separate `/api/boot` into its own module.	Owen Jacobson	2024-10-05
\|
*	Clean up naming and semantics of history accessors.	Owen Jacobson	2024-10-04
\|
*	Stray warnings	Owen Jacobson	2024-10-03
\|
*	List messages per channel.	Owen Jacobson	2024-10-03
\|
*	Add endpoints for deleting channels and messages.	Owen Jacobson	2024-10-03
\| \| \| \|	It is deliberate that the expire() functions do not use them. To avoid races, the transactions must be committed before events get sent, in both cases, which makes them structurally pretty different.
*	Represent channels and messages using a split "History" and "Snapshot" model.	Owen Jacobson	2024-10-03
\| \| \| \| \| \|	This separates the code that figures out what happened to an entity from the code that represents it to a user, and makes it easier to compute a snapshot at a point in time (for things like bootstrap). It also makes the internal logic a bit easier to follow, since it's easier to tell whether you're working with a point in time or with the whole recorded history. This hefty.
*	Package up common event fields as Instant	Owen Jacobson	2024-10-02
\|
*	Retire top-level `repo`.	Owen Jacobson	2024-10-02
\| \| \| \|	This helped me discover an organizational scheme I like more.
*	First pass on reorganizing the backend.	Owen Jacobson	2024-10-02
\| \| \| \|	This is primarily renames and repackagings.
*	Provide a resume point to bridge clients from state snapshots to the event ↵	Owen Jacobson	2024-10-01
\| \| \| \|	sequence.
*	Track event sequences globally, not per channel.	Owen Jacobson	2024-10-01
\| \| \| \|	Per-channel event sequences were a cute idea, but it made reasoning about event resumption much, much harder (case in point: recovering the order of events in a partially-ordered collection is quadratic, since it's basically graph sort). The minor overhead of a global sequence number is likely tolerable, and this simplifies both the API and the internals.
*	Shut down the `/api/events` stream when the user logs out or their token ↵	Owen Jacobson	2024-09-29
\| \| \| \| \| \| \| \|	expires. When tokens are revoked (logout or expiry), the server now publishes an internal event via the new `logins` event broadcaster. These events are used to guard the `/api/events` stream. When a token revocation event arrives for the token used to subscribe to the stream, the stream is cut short, disconnecting the client. In service of this, tokens now have IDs, which are non-confidential values that can be used to discuss tokens without their secrets being passed around unnecessarily. These IDs are not (at this time) exposed to clients, but they could be.
*	Expire channels, too.	Owen Jacobson	2024-09-28
\|
*	Send created events when channels are added.	Owen Jacobson	2024-09-28
\|
*	Make `/api/events` a firehose endpoint.	Owen Jacobson	2024-09-27
\| \| \| \| \| \| \| \|	It now includes events for all channels. Clients are responsible for filtering. The schema for channel events has changed; it now includes a channel name and ID, in the same format as the sender's name and ID. They also now include a `"type"` field, whose only valid value (as of this writing) is `"message"`. This is groundwork for delivering message deletion (expiry) events to clients, and notifying clients of channel lifecycle events.
*	More reorganizing.	Owen Jacobson	2024-09-25
\|
*	Code organization changes considered during implementation of ↵	Owen Jacobson	2024-09-25
\| \| \| \|	vector-of-sequence-numbers stream resume.
*	Redundant code missed in previous commit.	Owen Jacobson	2024-09-25
\|
*	Use a vector of sequence numbers, not timestamps, to restart /api/events ↵	Owen Jacobson	2024-09-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	streams. The timestamp-based approach had some formal problems. In particular, it assumed that time always went forwards, which isn't necessarily the case: * Alice calls `/api/channels/Cfoo` to send a message. * The server assigns time T to the request. * The server stalls somewhere in send() for a while, before storing and broadcasting the message. If it helps, imagine blocking on `tx.begin().await?` for a while. * In this interval, Bob calls `/api/events?channel=Cfoo`, receives historical messages up to time U (after T), and disconnects. * The server resumes Alice's request and finishes it. * Bob reconnects, setting his Last-Event-Id header to timestamp U. In this scenario, Bob never sees Alice's message unless he starts over. It wasn't in the original stream, since it wasn't broadcast while Bob was subscribed, and it's not in the new stream, since Bob's resume point is after the timestamp on Alice's message. The new approach avoids this. Each message is assigned a _sequence number_ when it's stored. Bob can be sure that his stream included every event, since the resume point is identified by sequence number even if the server processes them out of chronological order: * Alice calls `/api/channels/Cfoo` to send a message. * The server assigns time T to the request. * The server stalls somewhere in send() for a while, before storing and broadcasting. * In this interval, Bob calls `/api/events?channel=Cfoo`, receives historical messages up to sequence Cfoo=N, and disconnects. * The server resumes Alice's request, assigns her message sequence M (after N), and finishes it. * Bob resumes his subscription at Cfoo=N. * Bob receives Alice's message at Cfoo=M. There's a natural mutual exclusion on sequence numbers, enforced by sqlite, which ensures that no two messages have the same sequence number. Since sqlite promises that transactions are serializable by default (and enforces this with a whole-DB write lock), we can be confident that sequence numbers are monotonic, as well. This scenario is, to put it mildly, contrived and unlikely - which is what motivated me to fix it. These kinds of bugs are fiendishly hard to identify, let alone reproduce or understand. I wonder how costly cloning a map is going to turn out to be… A note on database migrations: sqlite3 really, truly has no `alter table … alter column` statement. The only way to modify an existing column is to add the column to a new table. If `alter column` existed, I would create the new `sequence` column in `message` in a much less roundabout way. Fortunately, these migrations assume that they are being run _offline_, so operations like "replace the whole table" are reasonable.
*	Write tests.	Owen Jacobson	2024-09-20
\|
*	Push the handling of the `Last-Event-Id` _format_ inside of `channels::app`.	Owen Jacobson	2024-09-20
\| \| \| \|	This is intended to make it a bit more opaque to callers, and to free me up to experiment with the event ID format. It also makes event IDs tractable for testing.
*	Push events into a module structure consistent with the rest of the project.	Owen Jacobson	2024-09-20
\|
*	Remove the HTML client, and expose a JSON API.	Owen Jacobson	2024-09-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	This API structure fell out of a conversation with Kit. Described loosely: kit: ok kit: Here's what I'm picturing in a client kit: list channels, make-new-channel, zero to one active channels, post-to-active. kit: login/sign-up, logout owen: you will likely also want "am I logged in" here kit: sure, whoami
*	Expire messages after 90 days.	Owen Jacobson	2024-09-20
\| \| \| \| \| \| \| \| \| \|	This is intended to manage storage growth. A community with broadly steady traffic will now reach a steady state (ish) where the amount of storage in use stays within a steady band. The 90 day threshold is a spitball; this should be made configurable for the community's needs. I've also hoisted expiry out into the `app` classes, to reduce the amount of non-database work repo types are doing. This should make it easier to make expiry configurable later on. Includes incidental cleanup and style changes.
*	Somewhere along the line this lifetime bound became redundant.	Owen Jacobson	2024-09-18
\|
*	Most pass-through errors do not need additional message text	Owen Jacobson	2024-09-18
\|
*	Make BoxedError an implementation detail of InternalError.	Owen Jacobson	2024-09-18
\|
*	App methods now return errors that allow not-found cases to be distinguished.	Owen Jacobson	2024-09-18
\|
*	Express record dependencies through types.	Owen Jacobson	2024-09-17
\| \| \| \|	This provides a convenient place to _stick_ "not found" errors, though actually introducing them will come in a later commit.
*	Consolidate most repository types into a repo module.	Owen Jacobson	2024-09-16
\| \| \| \| \| \| \| \| \| \| \| \|	Having them contained in the individual endpoint groups conveyed an unintended sense that their intended scope was _only_ that endpoint group. It also made most repo-related import paths _quite_ long. This splits up the repos as follows: * "General applicability" repos - those that are only loosely connected to a single task, and are likely to be shared between tasks - go in crate::repo. * Specialized repos - those tightly connected to a specific task - go in the module for that task, under crate::PATH::repo. In both cases, each repo goes in its own submodule, to make it easier to use the module name as a namespace. Which category a repo goes in is a judgment call. `crate::channel::repo::broadcast` (formerly `channel::repo::messages`) is used outside of `crate::channel`, for example, but its main purpose is to support channel message broadcasts. It could arguably live under `crate::event::repo::channel`, but the resulting namespace is less legible to me.
*	Move channel list into the `app.channels()` namespace.	Owen Jacobson	2024-09-15
\| \| \| \|	This is groundwork for a JSON-based API, after a conversation with Kit.
*	Consolidate channel events into a single stream endpoint.	Owen Jacobson	2024-09-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While reviewing [MDN], I noticed this note: > SSE suffers from a limitation to the maximum number of open connections, which can be specially painful when opening various tabs as the limit is per browser and set to a very low number (6). […] This limit is per browser + domain, so that means that you can open 6 SSE connections across all of the tabs to www.example1.com and another 6 SSE connections to www.example2.com. I tested it in Safari; this is true, and once six streams are open, _no_ more requests can be made - in any tab, even a fresh one. Since the design _was_ that each channel had its own events endpoint, this is an obvious operations risk. Any client that tries to read multiple channels' streams will hit this limit quickly. This change consolidates all channel events into a single endpoint: `/events`. This takes a list of channel IDs (as query parameters, one `channel=` param per channel), and streams back events from all listed channels. The previous `/:channel/events` endpoint has been removed. Clients can selectively request events for the channels they're interested in. [MDN]: https://developer.mozilla.org/en-US/docs/Web/API/EventSource
*	Support Last-Event-Id as a method of resuming channel events after a disconnect	Owen Jacobson	2024-09-13
\|
*	Generate the required structure for broadcasting from a join, not from O(n) ↵	Owen Jacobson	2024-09-13
\| \| \| \|	queries.
*	Tolerate panics in channel::app where they can only be triggered by ↵	Owen Jacobson	2024-09-13
\| \| \| \|	implementation errors.