summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* Move the `/ch` channel view to `/c` (for conversation).Owen Jacobson2025-07-03
|
* Replace `channel` with `conversation` throughout the API.Owen Jacobson2025-07-03
| | | | This is a **breaking change** for essentially all clients. Thankfully, there's presently just the one, so we don't need to go to much effort to accommoate that; the client is modified in this commit to adapt, users can reload their client, and life will go on.
* Rename "channel" to "conversation" within the server.Owen Jacobson2025-07-03
| | | | | | I've split this from the schema and API changes because, frankly, it's huge. Annoyingly so. There are no semantic changes in this, it's all symbol changes, but there are a _lot_ of them because the term "channel" leaks all over everything in a service whose primary role is managing messages sent to channels (now, conversations). I found a buggy test while working on this! It's not fixed in this commit, because it felt mean to hide a real change in the middle of this much chaff.
* Rename "channel" to "conversation" in the database.Owen Jacobson2025-07-03
| | | | | | I've - somewhat arbitrarily - started renaming column aliases, as well, though the corresponding Rust data model, API fields and nouns, and client code still references them as "channel" (or as derived terms). As with so many schema changes, this entails a complete rebuild of a substantial portion of the schema. sqlite3 still doesn't have very many `alter table` primitives, for renaming columns in particular.
* Prevent sending messages to deleted channels.Owen Jacobson2025-07-03
| | | | I've opted to make it clear in the error message which scenario - deleted vs. non-existant - a channel falls into. This isn't particularly consistent with the rest of the API, so we might need to review this decision later, but it's at least relatively harmless if it's mistaken. (Formally, they're both 404s, so clients that go by error code won't notice.)
* Sending messages to a deleted channel should send to the deleted channel's ↵Owen Jacobson2025-07-01
| | | | | | | | | | | | | | | | | | | | | | | ID, not to a fictitious ID. The existing test scenario: * Create a channel (with ID C1). * Delete channel C1. * Roll the dice to invent a channel ID C2. * Send a message to channel C2. * Observe that sending fails. This was not verifying anything about the deleted channel C1 - it was basically reproducing the `nonexistent_channel` test scenario with the most marginal of garnishes on it. This is probably copy-paste damage from when this test was originally written. Sending did fail, so this test scenario passed, but it passed effectively by accident. The new test scenario: * Create a channel (with ID C1). * Delete channel C1. * Send a message to channel C1. * Observe that sending fails. Concerningly, sending does _not_ fail in this scenario (i.e., the test _does_ fail), so the broken test case was masking a real bug.
* Send back the current state as events, not snapshots, during client boot.ojacobson2025-07-01
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a couple of contributing reasons for this. * Each client's author - even ourselves - is best positioned to know how best to convert history into state to meet the needs of that specific client. There is (probably) no universal solution. You can already see this with the built-in client, where unread tracking gets stapled onto snapshots locally and maintained as events roll in, and I would expect this to happen more and more regularly over time. If we ever sprout other clients, I'd also expect their local state to be different. The API, on the other hand, must expose a surface that's universal to all clients. For boot, that was a very rote list-of-nouns data model. The other option is to expose a surface specific to one client and make other clients accommodate, which is contrary to the goals of this project. * The need to compute snapshots adds friction when adding or changing the behaviour of the API, even when those changes only tangentially touch `/api/boot`. For example, my work on adding messages to multiple conversations got hung up in trying to figure out how to represent that at boot time, plus how to represent that in the event stream. * The rationale for sending back a computed snapshot of the state was to avoid having the client replay events from the beginning of time, and to limit the amount of data sent back. This didn't pan out - most snapshots in practice consisted of the same data you'd get from the event stream anyways, almost with a 1:1 correspondence (with a `sent` or `created` event standing in for a `messages`, `channels`, or `users` entry). Exceptions - deleted messages and channels - were rare, and are ephemeral. * Generating the snapshots requires loading the entire history into memory anyways. We're not saving any server-side IO by computing snapshots, but we are spending server-side compute time to generate them for clients that are then going to throw them away, as above. This change resolves these tensions by delegating state management _entirely_ to the client, removing the server-side state snapshots. The server communicates in events only. ## Alternatives I joked to @wlonk that the "2.0-bis" version of this change always returns `resume_point` 0 and an empty events list. That would be correct, and compatible with the client logic in this change, and would actually work. In fact, we could get rid of the event part of `/api/boot` entirely, and require clients to consume the event stream from the beginning every time they reconnect. The main reason I _don't_ want to do this has to do with reconnects. Right now - both with snapshots, before this change, and with events, after - the client can cleanly delineate "historical" events (to be applied while the state is not presented to the user) and "current" events (to be presented to the user immediately). The `application/event-stream` protocol has no way to make that distinction out of the box, and while we can hack something in, all the approaches I can think of are nasty. Merges boot-events into main.
| * Remove the snapshot fields from `/api/boot`.Owen Jacobson2025-06-20
| | | | | | | | Clients now _must_ construct their state from the event stream; it is no longer possible for them to delegate that work to the server.
| * Include historical events in the boot response.Owen Jacobson2025-06-20
| | | | | | | | The returned events are all events up to and including the `resume_point` in the same response. If combined with the events from `/api/events?resume_point=x`, using the same `resume_point`, the client will have a complete event history, less any events from histories that have been purged.
| * Support querying event sequences via iterators or streams.Owen Jacobson2025-06-20
| | | | | | | | | | | | | | | | | | | | | | | | | | These filters are meant to be used with, respectively, `Iterator::filter_map` and `StreamExt::filter_map`. The two operations are conceptually the same - they pass an item from the underlying sequence to a function that returns an option, drops the values for which the function returns `None`, and yields the value inside of `Some` in the resulting sequence. However, `Iterator::filter_map` takes a function from the iterator elements to `Option<T>`. `StreamExt::filter_map` takes a function from the iterator elements to _a `Future` whose output is `Option<T>`_. As such, you can't easily use functions designed for one use case, for the other. You need an adapter - conventionally, `futures::ready`, if you have a non-async function and need an async one. This provides two sets of sequence filters: * `crate::test::fixtures::event` contains functions which return `Option` directly, and which are intended for use with `Iterator::filter_map`. * `crate::test::fixtures::event::stream` contains lifted versions that return a `Future`, and which are intended for use with `StreamExt::filter_map`. The lifting is done purely manually. I spent a lot of time writing clever-er versions before deciding on this; those were fun to write, but hell to read and not meaningfully better, and this is test support code, so we want it to be dumb and obvious. Complexity for the sake of intellectual satisfaction is a huge antifeature in this context.
| * Hoist heartbeat configuration out to the web handler.Owen Jacobson2025-06-20
| | | | | | | | | | | | The _snapshot_ is specifically a snapshot of app state. The purpose of the response struct is to annotate the snapshot with information that isn't from the app, but rather from the request or the web layer. The heartbeat timeout isn't ever used by the app layer in any way; it's used by the Axum handler for `/api/events`, instead. I straight-up missed this when I wrote the original heartbeat changes.
* | Remove misleading and long-since stale "panic" notes.Owen Jacobson2025-06-23
| | | | | | | | These were invalidated by eff129bc1f29bcb1b2b9d10c6b49ab886edc83d6, back in September.
* | Remove a stray reference to users being called "logins."Owen Jacobson2025-06-23
|/ | | | Missed in 9f7f82dbd9adee8ae18ae7ff2600b3e1dc8fadbc.
* Handlers are _named operations_, which can be exposed via routes.Owen Jacobson2025-06-18
| | | | | | Each domain module that exposes handlers does so through a `handlers` child module, ideally as a top-level symbol that can be plugged directly into Axum's `MethodRouter`. Modules could make exceptions to this - kill the doctrinaire inside yourself, after all - but none of the API modules that actually exist need such exceptions, and consistency is useful. The related details of request types, URL types, response types, errors, &c &c are then organized into modules under `handlers`, along with their respective tests.
* Reorganize and consolidate HTTP routes.Owen Jacobson2025-06-18
| | | | | | | | HTTP routes are now defined in a single, unified module, pulling them out of the topical modules they were formerly part of. This is intended to improve the navigability of the codebase. Previously, finding the handler corresponding to a specific endpoint required prior familiarity, though in practice you could usually guess from topic area. Now, all routes are defined in `crate::routes`. Other than changing visibility, I've avoided making changes to the handlers at the ends of those routes.
* Fix stray reference to a nonexistent file.Owen Jacobson2025-06-18
| | | | I've replaced it with something more general, which will be applicable no matter how we restructure the routing.
* Use a fluent style for the middleware layers.Owen Jacobson2025-06-17
| | | | | | For endpoints that are unavailable, that default behaviour no longer needs to be specified: `Required(app)` will do that for you. For endpoints that are redirects until setup is completed, `Require(app).with_fallback(…response…)` will do that. To make this a bit harder to break by accident, the default unavailable response is now its own type.
* Unify `setup_required` middlewares.Owen Jacobson2025-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The two middlewares were identical but for the specific `IntoResponse` impl used to generate the response when setup has not been completed. However, unifying them while still using `from_fn_with_state` lead to this horrorshow: .route_layer(middleware::from_fn_with_state( app.clone(), |state, req, next| { setup::middeware::setup_required(UNAVAILABLE, state, req, next) } )) It's a lot to read, and it surfaces the entire signature of a state-driven middleware `fn` into the call site solely to close over one argument (`UNAVAILABLE`). Rather than doing that, I've converted this middleware into a full blown Tower middleware, following <https://docs.rs/axum/latest/axum/middleware/index.html#towerservice-and-pinboxdyn-future>. I considered taking this further and implementing a custom future to remove the allocation for `Box::pin`, but honestly, that allocation isn't hurting anyone and this code already got long enough in the translation. The new API looks like: .route_layer(setup::Required::or_unavailable(app.clone())) Or like: .route_layer(setup::Required::with_fallback(app.clone(), RESPONSE)) One thing I would have liked to have avoided is the additional `app.clone()` argument, but there isn't a way to extract the _state_ from a request inside of an Axum middleware. It has to be passed in externally - that's what `from_fn_with_state` is doing under the hood, as well. Using `State` as an extractor doesn't work; the `State` extractor is special in a _bunch_ of ways, and this is one of them. Other extractors would work. Realistically, I'd probably want to explore interfaces like .route_layer(setup::Required(app).or_unavailable()) or .route_layer(app.setup().required().or_unavailable())
* tools/reformatOwen Jacobson2025-06-11
|
* Fix a couple of stray `clippy` lints.ojacobson2025-05-30
|\ | | | | | | | | | | | | | | | | | | The added suppression for `manual_ok_err` is a mixed choice; I'd prefer `r.ok()` in most senses, but `BroadcastStream` is still new enough that I wouldn't be entirely surprised if the Tokio team added new error variants, that we'd want to expressly handle. I do feel a bit better suppressing individual [`clippy::pedantic`][pedantic] lints; they're allow-by-default for this reason anyways, and I opted into them (see 452c8d0d9edb9894c108b6d577806c7c9d0071dd) knowing that not all of them would be perfectly appropriate. [pedantic]: https://doc.rust-lang.org/clippy/ Merges prop/missed-lints into main.
| * Fix a couple of stray `clippy` lints.Owen Jacobson2025-05-27
| | | | | | | | | | | | | | | | The added suppression for `manual_ok_err` is a mixed choice; I'd prefer `r.ok()` in most senses, but `BroadcastStream` is still new enough that I wouldn't be entirely surprised if the Tokio team added new error variants, that we'd want to expressly handle. I do feel a bit better suppressing individual [`clippy::pedantic`][pedantic] lints; they're allow-by-default for this reason anyways, and I opted into them (see 452c8d0d9edb9894c108b6d577806c7c9d0071dd) knowing that not all of them would be perfectly appropriate. [pedantic]: https://doc.rust-lang.org/clippy/
* | Build the Sveltekit UI into Cargo's OUT_DIR.Owen Jacobson2025-05-28
|/ | | | | | | | | | | | | | | | This has a couple of material consequences: * It will be (much) easier to reorganize the source tree, as the path to the output is no longer relative to where the config files are when building the final binary. If we do decide to move `ui` into its own child crate, we won't have to make a bunch of (very similar) changes to the Svelte build process at that time. * There is less chance of a stale build contaminating a new one, since changes to the crate change the project hash in `OUT_DIR`. For example, while working on this change, `OUT_DIR` was at various points: * `target/debug/build/pilcrow-7cfeef3536ddd3e7/out` * `target/debug/build/pilcrow-09d4ddbc12bef36b/out` * `target/release/build/pilcrow-070d373bd5f850a1` This may use more space on disk, but it's all reclaimable with `cargo clean` and Rust is _far_ more profligate with disk space than Svelte will ever be. * It's more consistent with Cargo's expectations around generated source files, and thus potentially easier to onboard Rust developers into.
* Remove a bunch of clippy suppressions.Owen Jacobson2025-05-21
| | | | Notably, one of them was hiding a real (if unreachable) bug, by converting a "the token you have presented is not valid" scenario into an internal server error.
* Make creation time an intrinsic fact about channels, the way it is for events.Owen Jacobson2025-05-13
| | | | To make unread handling of empty channels coherent (and to make it possible to mark an empty channel as having been read), they need to be associated with a specific point in time. This change exposes their creation time in the snapshot - it was already part of the event view, though the client doesn't know that yet.
* Heartbeats are part of the event protocol.Owen Jacobson2025-04-08
| | | | | | | | | | | | | | | | | | | A heartbeat is an event that the server synthesizes any time an event stream has been idle for longer than some timeout. They allow clients to detect disconnection and network problems, which would otherwise go unnoticed because event streams are a one-way channel. Most network problems only become clear when the offended party tries to _send_ something, and subscribing to an event stream only sends something during the request phase. Technically, Pilcrow has always sent these, since we started using Axum's SSE support: it defaults to sending a dummy event after 15 seconds (consisting of `":\n\n"`, which is then ignored). I've built Pilcrow's heartbeat support out of that, by customizing the event sent back. The results _mostly_ look like existing events, but there are two key differences: * Heartbeats don't have `id` fields in the event stream. They're synthetic, and they don't participate in either the "resume at" sequence management, or the last-event-id header-based resumption management. * Heartbeats have an `event` but no `type` field in the message body. There are no subtypes. To make it less likely that clients will race with the server on expiring timeouts, heartbeats are sent about five seconds early. In this change, heartbeats are due after 20 seconds, but are sent after 15. If it takes longer than five seconds for a heartbeat to arrive, a client can and should treat that as a network problem and reconnect, but I'd really like to avoid that happening over differences smaller than a second, so I've left a margin. I originally sketched this out in conversation with @wlonk as having each event carry a deadline for the next one. I ultimately opted not to do that for a few reasons. First, Axum makes it hard - the built-in keep-alive support only works with a static event, and cannot make dynamic ones whose payloads might vary (for example if the deadline is variable). Second, it's complex, to no apparent gain, and adds deadline information to _every_ event type. This implementation, instead, sends deadline information as part of boot, as a fixed interval in seconds. Clients are responsible for working out deadlines based on message arrivals. This is fine; heartbeat-based connection management is best effort at the best of times, so a few milliseconds of slop in either direction won't hurt anything. The existing client ignores these events entirely, which is convenient. The new heartbeat event type is defined alongside the main event type, to make it less likely that we'll inadvertently make changes to one but not the other. We can still do so advertently, I just don't want it to be an accident.
* Merge branch 'prop/minor-cleanups'Owen Jacobson2025-04-03
|\
| * Hopefully make the "no control characters" criterion for names easier to follow.Owen Jacobson2025-03-24
| |
| * The label used to mask "Secret" strings in ↵Owen Jacobson2025-03-24
| | | | | | | | | | | | 357116366c1307bedaac6a3dfe9c5ed8e0e0c210 wasn't updated (and wasn't quite correct then, either). I haven't found a way to derive it from the name of the type.
* | Rename a bunch of straggler references to `login`.Owen Jacobson2025-03-24
| |
* | Rename `login` to `user` throughout the serverOwen Jacobson2025-03-23
| |
* | Change the prefix for newly-generated user IDs to `U`, for `User`.Owen Jacobson2025-03-23
| |
* | Rename the `login` module to `user`.Owen Jacobson2025-03-23
| |
* | Rename `user` to `login` at the database.Owen Jacobson2025-03-23
|/
* Expire messages after 30 days.Owen Jacobson2025-03-23
| | | | In a discussion with wlonk, we both agreed that 15 days is _too_ aggressive, but also that it's not quite time to implement configurable expiry.
* Ensure `must_use` warnings fire even after results are unwrapped.Owen Jacobson2025-02-21
|
* Be a little more pedantic about constant str ref lifetimesOwen Jacobson2025-02-21
|
* Reorder impl to match traitOwen Jacobson2025-02-21
|
* Upgrade to Rust 1.85 and Rust 2024 edition.Owen Jacobson2025-02-20
| | | | | | | | There are a couple of migration suggestions from `cargo fix --edition` that I have deliberately skipped, which are intended to make sure that the changes to `if let` scoping don't bite us. They don't, I'm pretty sure, and if I turn out to be wrong, I'd rather fix the scoping issues (as they arise) than use `match` (`cargo fix --edition`'s suggestion). This change also includes a bulk reformat and a clippy cleanup. NOTA BENE: As this requires a new Rust toolchain, you'll need to update Rust (`rustup update`, normally) or the server won't build. This also applies to the Debian builder Docker image; it'll need to be rebuilt (from scratch, pulling its base image again) as well.
* Upgrade to latest thiserrorOwen Jacobson2025-02-19
|
* Upgrade Axum to 0.8.1.Owen Jacobson2025-02-19
|
* Merge branch 'main' into prop/shorter-expiryKit La Touche2024-11-15
|\
| * Rename the project to `pilcrow`.Owen Jacobson2024-11-08
| |
* | Shorten the default retention, dramatically.Owen Jacobson2024-11-07
|/ | | | | | | | | | | | The original retention values were loosely based on Slack's retention, for lack of a more specific motivator. Today's election results have changed my views; the service now defaults to retention more in line with the needs of communities for which deep message history may be a risk: * Unused channels expire after 7 days. * Used channels expire when their last message expires (as before). * Deleted channels are purged after 6 hours (which is in line with the purge behaviour of messages). * Messages expire after 15 days. * Deleted messages are purged after 6 hours (as before). No changes have been made to token expiry.
* Limit background expiry to the API.Owen Jacobson2024-10-31
| | | | | | Using requests to drive background work (expiring things, mainly) is a hack to avoid the complexity of background workers, but it's reaching its limits. In the live deployment at `hi.grimoire.ca`, we found that requests for the UI were taking 300+ milliseconds as the expiry process required database access. The DB there is slow, which is a separate issue, but also being accessed many times for little benefit. Since only the API is actually _affected_ by expiry, I've scoped the middleware down to just those endpoints.
* Track an index-friendly sequence range for both channels and messages.Owen Jacobson2024-10-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is meant to limit the amount of messages that event replay needs to examine. Previously, the query required a table scan; table scans on `message` can be quite large, and should really be avoided. The new schema allows replays to be carried out using an index scan. The premise of this change is that, for each (channel, message), there is a range of event sequence numbers that the (channel, message) may appear in. We'll notate that as `[start, end]` in the general case, but they are: * for active channels, `[ch.created_sequence, ch.created_sequence]`. * for deleted channels, `[ch.created_sequence, ch_del.deleted_sequence]`. * for active messages, `[mg.sent_sequence, mg.sent_sequence]`. * for deleted messages, `[mg.sent_seqeunce, mg_del.deleted_sequence]`. (The two "active" ranges may grow in future releases, to support things like channel renames and message editing. That won't change the logic, but those two things will need to update the new `last_sequence` field.) There are two families of operations that need to retrieve based on these ranges: * Boot creates a snapshot as of a specific `resume_at` sequence number, and thus must include any record whose `start` falls on or before `resume_at`. We can't exclude records whose `end` is also before it, as their terminal state may be one that is included in boot (eg. active channels). * Event replay needs to include any events that fall after the same `resume_at`, and thus must include events from any record whose `end` falls strictly after `resume_at`. We can't exclude records whose `start` is also strictly after `resume_at`, as we'd omit them from replay, inappropriately, if we did. This gives three interesting cases: 1. Record fully after `resume_at`: event sequence --increasing--> x-a … x … x+k … resume_at start end This record should be omitted by boot, but included for event replay. 2. Record fully before `resume_at`: event sequence --increasing--> x … x+k … x+a start end resume_at This record should be omitted for event replay, but included for boot. 3. Record overlapping `resume_at`: event sequence --increasing--> x … x+a … x+k start resume_at end This record needs to be included for both boot and event replay. However, the bounds of that range were previously stored in two tables (`channel` and `channel_deleted`, or `message` and `message_deleted`, respectively), which sqlite (indeed most SQL implementations) cannot index. This forced a table scan, leading to the program considering every possible (channel, message) during event replay. This commit adds a `last_sequence` field to channels and messages, which is set to the above values as channels and messages are operated on. This field is indexed, and queries can use it to rapidly identify relevant rows for event replay, cutting down the amount of reading needed to generate events on resume.
* Resume points are no longer optional.Owen Jacobson2024-10-30
| | | | This is an inconsequential change for actual clients, since "resume from the beginning" was never a preferred mode of operation, and it simplifies some internals. It should also mean we get better query plans where `coalesce(cond, true)` was previously being used.
* Remove `hi-recanonicalize`.Owen Jacobson2024-10-30
| | | | This utility was needed to support a database migration with existing data. I have it on good authority that no further databases exist that are in the state that made this tool necessary.
* Avoid hard-coding the assumption that delete comes-after create.Owen Jacobson2024-10-30
| | | | I mean, it always does, but I'd rather get a panic during message/channel reconstruction than wrong results if that assumption is ever violated inadvertently.
* Prevent deletion of non-empty channels.Owen Jacobson2024-10-30
|
* Add `change password` UI + API.Owen Jacobson2024-10-29
| | | | The protocol here re-checks the caller's password, as a "I left myself logged in" anti-pranking check.