| Commit message (Collapse) | Author | Age |
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This covers two things:
* For _any_ static asset, send back an `ETag` header based on its content. If the request had an `If-None-Match` header, use that header to instead send back a `Not Modified` response, with no payload, if appropriate, saving the cost of transferring data the client already has.
* For _immutable_ static assets, send back a `Cache-Control` header to allow browsers to replay the response from cache, without making a request, for up to 90 days, saving whole HTTP round-trips.
In practice, startup time is now dominated by the time needed to check whether the user is logged in, and the time needed to satisfy the `/api/boot` request if they are.
"Static" here includes anything that's served in the HTML itself, so things like `/`, `/login`, `/c/:channelid`, and `/me` are all "static," and are all affected by this change even though the logical content of those endpoints also includes data that will vary from user to user and from time to time. All the dynamic data for those responses comes from separate API requests, which don't affect the cacheability of those pages. However, this change is conservative and _does not_ instruct the browser to cache those pages for long periods; it only supports telling the browser that the cached copy they have is fine, via a `Not Modified` response.
This change also includes a minimum Rust version bump, to the recently-released 1.91 - as it uses `Duration::from_hours` to compute the `Cache-Control` header. I could use `Duration::new` or `Duration::from_secs`, but really, working in increments of days from such tiny units is awkward. I'd prefer to use `Duration::from_days`, but it's not stable yet.
Merges asset-etags into main.
|
| | |
| |
| |
| |
| |
| |
| |
| | |
Immutable assets include compiled script chunks, as well as stylesheets, fonts, and images baked into the SvelteKit app bundle. They do _not_ include the service worker, the PWA manifest, or the version and environment JSON files that SvelteKit uses to configure the application on startup.
This complements Pilcrow's `If-None-Modified` support: if a browser asks but has a cached copy locally, we use the `If-Not-Modified` header to detect that and send back a response instructing the client to use that copy, saving bandwidth but not round trips. This change instructs clients not to even try asking, if they're willing to cache the response, and to instead satisfy those assets entirely from cache, saving round trips.
`Duration::from_hours` was added in Rust 1.91, so this change also includes a minimum Rust version bump.
|
| |/
|
|
|
|
|
|
|
|
|
|
| |
When Pilcrow returns static assets, it now sets an `ETag` header, derived from the content of the asset. This header will change if the asset changes.
Conforming browsers _may_ cache the response, and make a conditional request with an `If-None-Match` header on future requests for the same asset. If we see that header, Pilcrow will check whether the loaded asset is the same as the one the browser already had, and skip the response with a 304 if appropriate. This cuts down on the number of times clients will load the same script files and the same assets from the server.
Endpoints that route to the index after doing some logic got a pretty major cleanup. The tangled logic between the actual handler and the error type made them challening to follow, and there wasn't a clean way to pass the `If-None-Match` header through into the error for use when determining the final response. This version instead combines the negative cases with _success_, which produces the desired responses with much more straightforwards code.
I've also opted to support `If-None-Match` for these endpoints, even though they do logically change if the underlying chat state changes - because the _response body_ does not change, and that's what the HTTP spec (and HTTP clients) care about in this context. They will, however, return the response in full for situations like Not Found.
(I know it looks like these endpoints now _require_ the `If-None-Match` header. Trust me on this: they do not. The `headers` create cooks up an empty `If-None-Match` header if none is supplied, and that empty header differs from any non-empty ETag.)
|
| |\
| |
| |
| |
| |
| | |
apply immediately when downloaded.
Merges 'immediate-service-worker-registration'
|
| | |
| |
| |
| |
| |
| |
| |
| | |
The default behaviour of a service worker is to check for updates any time the site associated with the worker is visited, but to only activate the new worker once no clients are open for that site. That effectively means that you need to open the site, close the site, and then open the site again to use new service worker capabilities, which is counterintuitive at best.
I believe that the spec works this way for good reasons: probably because it means that the running service worker never changes out from underneath of a page while it's open, and only changes _between_ visits to a site. That's a good default.
However, it does make testing service worker changes more difficult, as you need to restart the client application twice to see the results (or, at least, restart it once and then exit it). Since most of what we intend to use a service worker for mediates live app behaviour, it will be convenient to see those effects immediately when the service worker changes.
|
| |/
|
|
|
|
| |
This was added as part of the original service worker spike, without much consideration for design goals or correctness, and while it _works_, it doesn't meet any specific needs. We can get most of the same behaviour by letting the browser handle fetches directly.
The main thing we lose is offline rendering of the Pilcrow UI, but that only worked partially and only by accident. We should build that from the ground up.
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We were previously exporting root causes from one layer of abstraction to the next. For example, anything that called into the database could cause an `sqlx::Error`, so anything that transitively called into that logic exported a `Database(sqlx::Error)` error variant of its own, using `From` to map errors from inner type to outer type.
This had a couple of side effects. First, it required each layer of error handling to carry with it a `From` implementation unwrapping and rewrapping root causes from the next layer down. This was particularly apparent in the event and boot endpoints, which had separate error cases unique to crypto key processing errors solely because they happened to involve handling events that contained those keys. There were others, including the pervasive `Database(sqlx::Error)` error variants.
Separately, none of the error variants introduced for this purpose were being used for anything other than printing to stderr. All the complexity of From impls and all the structure of the error types was being thrown away at top-level error handlers.
This change replaces most of those error types with a generic `Failed` error. A `Failed` carries with it two pieces of information: a (boxed) underlying error, of any boxable `Error` type, and text meant to explain the context and cause of an error. Code which acts on errors can treat `Failed` as a catch-all case, while individually handling errors that signify important cases. Errors can be moved into or out of the `Failed` case by refactoring, as needed.
The design of `Failed` is heavily motivated by [anyhow's `context` system][context] as a way for the programmer to capture immediate intention as an explanation for some underlying error. However, instead of accepting the full breadth of types that implement `Display`, a `Failed` can only carry strings as explanation. We don't need the generality at this time, and the implementation underlying it is pretty complex for what it does.
[context]: https://docs.rs/anyhow/latest/anyhow/struct.Error.html#method.context
This change also means that the full source chain for an error is now available to top-level error handlers, allowing more complete error messages. For example, starting `pilcrow` with an invalid network listen address produces
Failed to bind to www.google.com:64209
Caused by:
Can't assign requested address (os error 49)
instead of the previous
Error: Io(Os { code: 49, kind: AddrNotAvailable, message: "Can't assign requested address" })
which previously captured the same _cause_, but without the formatting (see previous commit) and without the _context_ (this commit). Similar improvements are available for many of the error scenarios Pilcrow is designed to give up on.
When deciding which errors to use `Failed` with, I've used the heuristic that if something can fail for more than one underlying reason, and if the caller will only ever need to be able to differentiate those reasons after substantial refactoring anyways, then the reasons should collase into `Failed`. If there's either only a single underlying failure reason possible, or only errors arising out of the function body possible, then I've left error handling alone.
In the process I've refactored most request-handler-level error mappings to explicitly map `Failed` to `Internal`, rather than having a catch-all mapping for all unhandled errors, to make it easier to remember to add request-level error representations when adding app-level error cases.
This also includes helper traits for `Error` and `Result`, to make constructing `Failed` (and errors that include `Failed` as an alternative) easier to do, and some constants for the recurring error messages related to transaction demarcation. I'm not completely happy with the repetitive nature of those error cases, but this is the best I've arrived at so far.
As errors are no longer expected to be convertible up the call stack, the `NotFound` and `Duplicate` helper traits for database errors had to change a bit. Those previously assumed that they would be used in the context of an error type implementing `From<sqlx::Error>` (or from another error type with similar characteristics), and that's not the case any more. The resulting idiom for converting a missing value into a domain error is `foo.await.optional().fail(MESSAGE)?.ok_or(DOMAIN ERROR)?`, which is rather clunky, but I've opted not to go further with it. The `Duplicate` helper is just plain gone, as it's not easily generalizable in this structure and using `match` is more tractable for me.
Finally, I've changed the convention for error messages from `all lowercase messages in whatever tense i feel like at the moment` to `Sentence-case messages in the past tense`, frequently starting with `Failed to` and a short summary of the task at hand. This, as above, makes error message capitalization between Pilcrow's own messages and messages coming from other libraries/the Rust stdlib much more coherent and less jarring to read.
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an error is printed, if that error has a source, print the source. Then repeat that process until there are no more sources.
For an error with no source:
the error message
For an error with a single source:
the error message
caused by:
the source's error message
For an error message with chained sources:
the error message
caused by:
the immediate source's error message
the first transitive source's error message
the second transitive source's error message
…
the ultimate source's error message
That is, sources are printed from the outermost inwards, terminating with the lowest-level source.
This printing occurs in two places: when pilcrow starts, and whenever it processes an internal server error. For internal server errors, the error message is prefixed with the error ID, as before, to allow operators to correlate log messages with responses if they choose to do so:
[E1234] the error message
caused by:
the immediate source's error message
the first transitive source's error message
the second transitive source's error message
…
the ultimate source's error message
Both call sites lock stderr to try to prevent interleaving messages. No promises about how well it works.
I've not made any effort to ensure that multi-line error messages are correctly indented. So far as I know, pilcrow never emits a multi-line error message (except for long messages which are wrapped), so this likely isn't necessary. It's also surprisingly hard to do well, though there is [a crate for it][indenter].
[indenter]: https://docs.rs/indenter/latest/indenter/
There's a widespread convention for unix tools to prefix stderr output with the name of the program. For example, cat on my system prints the following when asked to open a nonexistent file:
cat: nope: No such file or directory
I've opted not to do that for Pilcrow. By my understanding, that convention is intended to help users identify which of several programs emitted a message when several programs share a stderr channel, for example when a program is used as part of a shell pipeline.
Programs that are largely intended to be freestanding do not consistently follow the same convention - Cargo, clang, and gcc all write diagnostics to stderr with no prefix, for example. Pilcrow is, similarly, meant to be run in isolation, either as the single thing a user is running interactively, or because it's being run as a service. In either case, the prefix is not necessary to identify Pilcrow as the source of a message.
I debated making the new `Exit` type crate-private, and having `cli::Args::run` return `impl Termination` to handle the interface between the crate and the outside world. For now, I've opted to instead return a result, so that any ill-advised callers trying to use Pilcrow as a library can at least attempt to understand the structure of its return value; this entails making `Exit` public as well, so that the CLI entry point (`main.rs`) can import it.
This is an enabling change, with little immediate impact on its own. It probably makes some error messages print less than perfectly, but Pilcrow generates errors so rarely in development that it's hard to check. I'll be revising individual errors in later
|
| |
|
|
|
| |
* Remove push subscriptions on logout, to trigger a resubscription when logging back in.
* Don't try to call `this.push` when there's no `this` to have a `push` on.
|
| |
|
|
|
|
| |
The `jsdom` upgrade entails upgrading Node, as something changed internally to jsdom in how it imports other modules.
Upgrading vitest caused the details of some of our test cases to change, but the semantics of the affected tests are the same. They also split resetting mocks from resetting mocks' behaviours, which required a small config change to preserve our tests' correctness.
|
| |
|
|
|
|
| |
We're still constrained by the matching dependency on libsqlite3-sys between sqlx and rusqlite, but we can at least upgrade sqlx freely. Later versions of rusqlite require newer libsqlite3-sys than sqlx supports.
This does not upgrade rand/rand_core, as there's a breaking API change in 0.9 that needs more work to integrate.
|
| |
|
|
|
|
| |
migration streams.
All known instances of that migration stream have long since disappeared, and this lets us get rid of a dependency.
|
| |
|
|
|
|
| |
In 1f44cd930cdff94bb8cf04f645a5b035507438d9, we added `web_push` to the server's dependencies. The `web_push` crate indirectly depends on OpenSSL for crypto operations, which in turn depends on OpenSSL being installed on the target system. This adds it to the builder. It's not actually needed at runtime (and does not appear in the resulting package's dependencies), for reasons that are somewhat obscure but which probably have to do with static linking.
This was missed during that change because we weren't trying to deploy it.
|
| |\
| |
| |
| | |
We're going to move forwards with this for now, as low-utility as it is, so that we can more easily iterate on it in a real-world environment (hi.grimoire.ca).
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When a user clicks "send a test notification," Pilcrow delivers a push message (with a fixed payload) to all active subscriptions. The included client then displays this as a notification, using browser APIs to do so. This lets us verify that push notification works, end to end - and it appears to.
The API endpoint for sending a test notification is not documented. I didn't feel it prudent to extensively document an endpoint that is intended to be temporary and whose side effects are very much subject to change. However, for posterity, the endpoint is
POST /api/push/ping
{}
and the push message payload is
ping
Subscriptions with permanent delivery failures are nuked when we encounter them. Subscriptions with temporary failures cause the `ping` endpoint to return an internal server error, and are not retried. We'll likely want retry logic - including retry logic to handle server restarts - for any more serious use, but for a smoke test, giving up immediately is fine.
To make the push implementation testable, `App` is now generic over it. Tests use a dummy implementation that stores sent messages in memory. This has some significant limitations, documented in the test suite, but it beats sending real notifications to nowhere in tests.
|
| | | |
|
| | |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A subscription allows an application server (here, the Pilcrow server) to send web push messages to a user agent.
On the server, Pilcrow records subscriptions verbatim, in the clear. Each subscription has an associated key, which will be used to encrypt messages for the corresponding client, but we store them in the clear, for the same broad reason that we store the VAPID key in the clear. They allow anyone who obtains them to impersonate the server and send push messages to clients, but they're rotated regularly - clients must rotate them whenever the server's VAPID key changes.
On the client, we monitor VAPID key change events to drive automatic subscription management, once the user sets up an initial subscription manually (which we must do as it can involve a user-interaction-only prompt for permission to send notifications). This isn't the final UI, but rather a bare-minimum version to let us move on with testing push notifications.
Merges push-subscribe into push-notify.
|
| | | |
| | |
| | |
| | |
| | |
| | | |
Once a user has set up a push subscription, the client will re-establish it as needed whenever possible, falling back to manual intervention only when it is unable to create a push subscription.
This change imposes some architectural changes to the client, though they're not huge: the `session` type now includes a body of state (`push`) whose methods also call into the Pilcrow API. Previously, calls to the API were not made within the `session` types, and were instead only made by page and layout code, but orchestrating that for the push subscription lifecycle proved too complex to deal with. This is an experimental alternative, but it might be something we explore further in the future.
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The semantics of this endpoint are somewhat complex, and are incompletely captured in the associated docs change. For posterity, the intended workflow is:
1. Obtain Pilcrow's current VAPID key by connecting (it's in the events, either from boot or from the event stream).
2. Use the browser push APIs to create a push subscription, using that VAPID key.
3. Send Pilcrow the push subscription endpoint and keys, plus the VAPID key the client used to create it so that the server can detect race conditions with key rotation.
4. Wait for messages to arrive.
This commit does not introduce any actual messages, just subscription management endpoints.
When the server's VAPID key is rotated, all existing subscriptions are discarded. Without the VAPID key, the server cannot service those subscriptions. We can't exactly notify the broker to stop processing messages on those subscriptions, so this is an incomplete solution to what to do if the key is being rotated due to a compromise, but it's better than nothing.
The shape of the API endpoint is heavily informed by the [JSON payload][web-push-json] provided by browser Web Push implementations, to ease client development in a browser-based context. The idea is that a client can take that JSON and send it to the server verbatim, without needing to transform it in any way, to submit the subscription to the server for use.
[web-push-json]: https://developer.mozilla.org/en-US/docs/Web/API/PushSubscription/toJSON
Push subscriptions are operationally associated with a specific _user agent_, and have no inherent relationship with a Pilcrow login or token (session). Taken as-is, a subscription created by user A could be reused by user B if they share a user agent, even if user A logs out before user B logs in. Pilcrow therefore _logically_ associates push subscriptions with specific tokens, and abandons those subscriptions when the token is invalidated by
* logging out,
* expiry, or
* changing passwords.
(There are no other token invalidation workflows at this time.)
Stored subscriptions are also abandoned when the server's VAPID key changes.
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The [Serde attribute docs][serde-attr] don't spell out that this will work, but experimentally, it looks like a module used with `#[serde(with)]` only needs to have the `encode`/`decode` functions if they're actually used, and can be "incomplete" if the missing ones are also unused in your code. That's the case here: we serialize VAPID keys, but never deserialize them.
[serde-attr]: https://serde.rs/field-attrs.html#with
This improves organization a bit in my view, but more importantly it also sets us up for a coming change where we _will_ start deserializing VAPID keys, and where I'd like to use the same logic: giving it its own module will make that easier to organize.
|
| | |/
| |
| |
| | |
This is the same "use a component directly rather than obtaining one from the `App`" change that was previously applied to most endpoints and middleware. I just forgot to do it here when making that change.
|
| | |
| |
| |
| |
| |
| | |
In 4a91792e023a5877f8ac9b8a352e99c4486d698f, I merged in the app component struct changes, but neglected to notice that the `app.vapid()` method had ended up attached to the wrong impl block during the merge. This fixes that.
I've also carried the change to component structs through, so `Vapid` is now a freestanding component, rather than a view of the `App` struct's internals.
|
| | |\
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The `web-push` crate's VAPID signing support requires a private key. The `p256` crate is more than capable of generating one, but the easiest way to get a key from a `p256::ecdsa::SigningKey` to a `web_push::PartialVapidSignature` is via PKCS #8 PEM, not via the bytes. Since we'll need it in that form anyways, store it that way, so that we don't have to decode it using `p256`, re-encode to PEM, then decode to `PartialVapidSignature`.
The migration in this commit invalidates existing VAPID keys. We could include support for re-encoding them on read, but there's little point: this code is still in flux anyways, and only development deployments exist. By the time this is final, the schema will have settled.
Merges pem-stored-vapid into push-notify.
|
| | |/
| |
| |
| |
| |
| | |
The `web-push` crate's VAPID signing support requires a private key. The `p256` crate is more than capable of generating one, but the easiest way to get a key from a `p256::ecdsa::SigningKey` to a `web_push::PartialVapidSignature` is via PKCS #8 PEM, not via the bytes. Since we'll need it in that form anyways, store it that way, so that we don't have to decode it using `p256`, re-encode to PEM, then decode to `PartialVapidSignature`.
The migration in this commit invalidates existing VAPID keys. We could include support for re-encoding them on read, but there's little point: this code is still in flux anyways, and only development deployments exist. By the time this is final, the schema will have settled.
|
| | |\
| |/
|/| |
|
| |\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A "component" is a struct that provides domain-specific operations on the service. `App` largely acts as a way to obtain components for domain-specific operations: for example, given `let app: App = todo!();`, then `app.tokens()` provides a component (`Tokens`) that supports operations on authentication tokens, and `app.conversations()` provides a component (`Conversations`) that supports operations on conversations.
This has been a major piece of the server's internal organization for a long time. Historically, these components have been built as short-lived view structs, which hold onto their functional dependencies by reference. A given component was therefore bound to its source `App`, and had a lifetime bounded by the life of that `App` instance or reference.
This change turns components into freestanding structs - that is, they can outlive the `App` that provided them. They hold their dependencies by value, not by reference; `App` provides clones when creating a component, instead of borrowing its own state. For the functional dependencies we have today, cloning is a supported and cheap way to share access; details are documented in the individual commits.
I'm making this change because we're working on web push, and we discovered while prototyping that it will be useful to be able to support multiple distinct types of web push client. A running Pilcrow server would use a "real" client (which sends real HTTP requests to deliver push messages), while tests would use a client stub (which doesn't). However, to make that work, we'll need to make `App` generic over the client, and the resulting type parameter would then end up in every handler and in most other things that touch the `App` type. This refactoring dramatically reduces the number of places we mention the `App` type, by making most uses rely on specific components instead of relying on `App` generally.
There are still a few places that work on `App` generally, rather than on specific components, because an operation requires the use of two or more components.
I don't love all this cloning, even if I know in my head that it's fine. The alternatives that we looked at include:
* Provider traits, as we have for `Transaction`, that allow endpoints to specify that they want any type that can provide a `Tokens` or a `Conversation` instead of specifically an `App` (`App<PushClient>`). This is wordy enough that we've opted to punt on that approach for now.
* Accept the type parameter as the cost of doing business. This is still an open alternative.
* Use `dyn` dispatch instead of a type parameter for the push client. This is still an open alternative, though not one I love as we'd be incurring function call indirection without getting any generality out of it.
Merges freestanding-app-components into main.
|
| | | |
| | |
| | |
| | |
| | |
| | | |
an App.
There are a few places in the test fixtures that still call `App` methods directly, as they call `app.users()` (which, as per previous commits, has no `FromRef` impl).
|
| | | |
| | |
| | |
| | | |
Because `Users` is test-only and is not used in any endpoints, it doesn't need a FromRef impl.
|
| | | |
| | |
| | |
| | | |
As with the `Setup` component, I've generalized the associated middleware across anything that can provide a `Tokens`, where possible.
|
| | | |
| | |
| | |
| | | |
The changes to the setup-requiring middleware are probably more general than was strictly needed, but they will make it work with anything that can provide a `Setup` component rather than being bolted to `App` specifically, which feels tidier.
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | |
| | |
| | |
| | | |
This one doesn't need a FromRef impl at this time, as it's only ever used in a handler that also uses other components and so will need to continue receiving `App`. However, there's little reason not to make the implementatino of the `Events` struct consistent.
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Unlike the previous example, this involves cloning an event broadcaster, as well. This is, per the documentation, how the type may be used. From <https://docs.rs/tokio/latest/tokio/sync/broadcast/fn.channel.html>:
> The Sender can be cloned to send to the same channel from multiple points in the process or it can be used concurrently from an `Arc`.
The language is less firm than the language sqlx uses for its pool, but the intent is clear enough, and it works in practice.
|
| |/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
`crate::app::App`'s internals.
In the course of working on web push, I determined that we probably need to make `App` generic over the web push client we're using, so that tests can use a dummy client while the real app uses a client created at startup and maintained over the life of the program's execution. The most direct implementation of that is to render App as `App<P>`, where the parameter is occupied by the specific web push client type in use. However, doing this requires refactoring at _every_ site that mentions `App`, including every handler, even though the vast majority of those sites will not be concerned with web push.
I reviewed a few options with @wlonk:
* Accept the type parameter and apply it everywhere, as the cost of supporting web push.
* Hard-code the use of a specific web push client.
* Insulate handlers &c from `App` via provider traits, mimicing what we do for repository provider traits today.
* Treat each app type as a freestanding state in its own right, so that only push-related components need to consider push clients (as far as is feasible).
This is a prototype towards that last point, using a simple app component (boot) as a testbed. `FromRef` allows handlers that take a `Boot` to be used in routes that provide an `App`, so this is a contained change. However, the structure of `FromRef` prevents `Boot` from carrying any lifetime narrower than `'static`, so it now holds clones of the state fields it acquires from App, instead of references. This is fine - that's just a database pool, and sqlx's pool type is designed to be shared via cloning. From <https://docs.rs/sqlx/latest/sqlx/struct.Pool.html>:
> Cloning Pool is cheap as it is simply a reference-counted handle to the inner pool state.
|
| | |
| |
| |
| | |
We've made stylistic changes that follow from Rust 1.86 through 1.89 anyways, and I'm not at all confident we aren't using APIs that only exist in those versions.
|
| | |
| |
| |
| | |
We've made stylistic changes that follow from Rust 1.86 through 1.89 anyways, and I'm not at all confident we aren't using APIs that only exist in those versions.
|
| | | |
|
| |/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is intended to be a low-operator-involvement, self-managing approach:
* The server generates its own VAPID key. The operator does not need to run complex `openssl` commands.
* The server rotates its VAPID key regularly. The operator does not need to maintain a calendar or run administrative tools.
* The included CLI changes allow for immediate key rotation if needed.
## Correctness
This change is speculative, to a degree, and is based on some guesses and loose understandings of the various Web Push and VAPID specs and docs. We use RustCrypto's p256 crate to generate keys rather than a dedicated VAPID crate (one does exist) as I trust their code review and security standards further, but that means that I've had to guess what kind of P256 key is actually appropriate for this use case, and what encoding clients need. If those elements are incorrect they're not hard to change, but they need further verification before we commit to them on `main`.
Changing the encoding can be done in the `src/vapid/event.rs` file. Changing the key type is more involved, but is still contained to `src/vapid/`. I already did that once, switching from `p256::SecretKey` to `p256::ecdsa::SingingKey`, and it's not too bad - takes maybe 20 minutes on a good day. The real risk is that we can't tell from the database what kind of key we have, so we need to settle this before making any permanent, durable changes to other peoples' data.
## Safety
This change stores the private key, entirely in the clear, in the Pilcrow database. Anyone who gets the database or who gets access to `vapid_signing_key` can impersonate the server in Web Push messages.
Operators are theoretically responsible for managing that risk. We are responsible for setting them up for success, and for making the right thing to do the easiest thing to do. This change tries to accomplish those goals in two ways:
* It relies on previous changes to the process' umask to ensure that the database is not readable by other OS users. OS permissions are pretty thin protection these days, as they only apply to operations carried out through the OS, but it's better than nothing at preventing casual snooping on the private key.
* Key rotation. The maximum period for which an exposed key can stay valid is 30 days, plus as long as it takes for clients to wake up, notice the key has changed, and invalidate their existing subscriptions.
The impact of a key leak is also moderate at best. We don't intend to send clients confidential information, or instruct clients to take compromising action, via push message. We intend to use them to notify clients of events that users may be interested in, like messages. Abusing the push system will, at worst, distract users or prompt them to disable notifications, or potentially deliver notifications for messages that never existed, which users can identify by reading the chat for themselves.
## Operator tooling
This change introduces the `pilcrow rotate-vapid-key` command, to immediately rotate the VAPID key. This is designed to be an online _or_ offline command: all it does is destroy the current key, so that the server will generate a new one. I've included some sketchy but complete documentation; it's probably inadequate, but then, so is that whole file.
I would resist the use of operator CLI tools as contrary to our low-admin-involvement aesthetic, but in this situation it's hard to avoid. The alternatives on the one hand are documenting the schema and telling operators how to use `sqlite3` to rotate keys, or splitting the keys into a separate database that the operator can delete outright to trigger key rotation, or on the other hand stopping to implement privileges and operator interfaces inside of Pilcrow. I think a CLI is an acceptable compromise between inadequate operator support from the one set of alternatives, and massive complexity and a large delay on the other. We will likely, however, have to revise this later.
Merges vapid-keys into push-notify.
|
| | |
| |
| |
| |
| |
| |
| |
| | |
In spite of my design preference away from CLI tools, this is a CLI tool: `pilcrow --database-url <URL> rotate-vapid-key`. This is something we can implement here and now, which does not require us to confront the long-avoided issue of how to handle the idea that some users are allowed to make server-operational changes and some aren't, by delegating the problem back to the OS.
The implementation is a little half-baked to make it easier to rip out later. I would ordinarily prefer to push both `serve` (the default verb, not actually named in this change) and `rotate-vapid-key` into their own, separate CLI submodules, with their own argument structs, but that change is much more intrusive and would make this effectively permanent. This can be yanked out in a few minutes by deleting a few lines of `cli.rs` and inlining the `serve` function.
Nonetheless, nothing is as permanent as a temporary solution, so I've written at least some bare-minimum operations documentation on how to use this and what it does.
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VAPID is used to authenticate applications to push brokers, as part of the [Web Push] specification. It's notionally optional, but we believe [Apple requires it][apple], and in any case making it impossible to use subscription URLs without the corresponding private key available, and thus harder to impersonate the server, seems like a good security practice regardless.
[Web Push]: https://developer.mozilla.org/en-US/docs/Web/API/Push_API
[apple]: https://developer.apple.com/documentation/usernotifications/sending-web-push-notifications-in-web-apps-and-browsers
There are several implementations of VAPID for Rust:
* [web_push](https://docs.rs/web-push/latest/web_push/) includes an implementation of VAPID but requires callers to provision their own keys. We will likely use this crate for Web Push fulfilment, but we cannot use it for key generation.
* [vapid](https://docs.rs/vapid/latest/vapid/) includes an implementation of VAPID key generation. It delegates to `openssl` to handle cryptographic operations.
* [p256](https://docs.rs/p256/latest/p256/) implements NIST P-256 in Rust. It's maintained by the RustCrypto team, though as of this writing it is largely written by a single contributor. It isn't specifically designed for use with VAPID.
I opted to use p256 for this, as I believe the RustCrypto team are the most likely to produce a correct and secure implementation, and because openssl has consistently left a bad taste in my mouth for years. Because it's a general implementation of the algorithm, I expect that it will require more work for us to adapt it for use with VAPID specifically; I'm willing to chance it and we can swap it out for the vapid crate if it sucks.
This has left me with one area of uncertainty: I'm not actually sure I'm using the right parts of p256. The choice of `ecdsa::SigningKey` over `p256::SecretKey` is based on [the MDN docs] using phrases like "This value is part of a signing key pair generated by your application server, and usable with elliptic curve digital signature (ECDSA), over the P-256 curve." and on [RFC 8292]'s "The 'k' parameter includes an ECDSA public key in uncompressed form that is encoded using base64url encoding. However, we won't be able to test my implementation until we implement some other key parts of Web Push, which are out of scope of this commit.
[the MDN docs]: https://developer.mozilla.org/en-US/docs/Web/API/PushSubscription/options
[RFC 8292]: https://datatracker.ietf.org/doc/html/rfc8292#section-3.2
Following the design used for storing logins and users, VAPID keys are split into a non-synchronized part (consisting of the private key), whose exposure would allow others to impersonate the Pilcrow server, and a synchronized part (consisting of event coordinates and, notionally, the public key), which is non-sensitive and can be safely shared with any user. However, the public key is derived from the stored private key, rather than being stored directly, to minimize redundancy in the stored data.
Following the design used for expiring stale entities, the app checks for and creates, or rotates, its VAPID key using middleware that runs before most API requests. If, at that point, the key is either absent, or more than 30 days old, it is replaced. This imposes a small tax on API request latency, which is used to fund prompt and automatic key rotation without the need for an operator-facing key management interface.
VAPID keys are delivered to clients via the event stream, as laid out in `docs/api/events.md`. There are a few reasons for this, but the big one is that changing the VAPID key would immediately invalidate push subscriptions: we throw away the private key, so we wouldn't be able to publish to them any longer. Clients must replace their push subscriptions in order to resume delivery, and doing so promptly when notified that the key has changed will minimize the gap.
This design is intended to allow for manual key rotation. The key can be rotated "immedately" by emptying the `vapid_key` and `vapid_signing_key` tables (which destroys the rotated kye); the server will generate a new one before it is needed, and will notify clients that the key has been invalidated.
This change includes client support for tracking the current VAPID key. The client doesn't _use_ this information anywhere, yet, but it has it.
|
| | |
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conversations, users, messages, and all other "synchronized" entities now have an in-memory implementation of their lifecycle, rather than a database-backed one. These operations take a history, apply one lifecycle change to that history, and emit a new history. Storage is then implemented by applying the events in this new history to the database.
The storage methods in repo types, which process these events by emitting SQL statements, make necessary assumptions that the events being passed to them are coherent with the data already in storage. For example, the code to handle a conversation's delete event is allowed to assume that the database already contains a row for that conversation, inserted in response to a prior conversation creation event.
Data retrieval is not modified in this commit, and probably never will be without a more thorough storage rewrite. The whole intention of the data modelling approach I've been using is that a single row per entity represents its entire history, in turn so that the data in the database should be legible to people approaching it using normal SQL tools.
Developed as an aesthetic response to increasing unease with the lack of an ORM versus the boring-ness of our actual queries.
Merges event-based-storage into main.
|
| | |
| |
| |
| |
| |
| | |
Per <https://docs.rs/tokio/latest/tokio/sync/broadcast/struct.Sender.html>, a `Sender` is safe to share between threads. The clone behaviour we want is also provided by its `Clone` impl directly, and we don't need to wrap the sender in an `Arc` to share it.
It's amazing what you can find in the docs.
|
| | |
| |
| |
| |
| |
| | |
This conversion, from an iterator of type-specific events (say, `user::Event` or `message::Event`), into a `Vec<event::Event>`, is prevasive, and it needs to be done each time. Having Broadcaster expose a support method for this cuts down on the repetition, at the cost of a slightly alarming amount of type-system nonsense in `broadcast_from`.
Historical footnote: the internal message structure is a Vec and not an individual message so that bulk operations, like expiring channels and messages, won't disconnect everyone if they happen to dispatch more than sixteen messages (current queue depth limit) at once. We trade allocation and memory pressure for keeping the connections alive. _Most_ event publishing is an iterator of one item, so the Vec allocation is redundant.
|
| | | |
|
| | |
| |
| |
| | |
I found a test bug! The tests for deleting previously-deleted or previously-expired tests were using the wrong user to try to delete those messages. The tests happened to pass anyways because the message authorship check was done after the message lifecycle check. They would have no longer passed; the tests are fixed to use the sender, instead.
|
| | |
| |
| |
| | |
This replaces the approach of having the repo type know about conversation lifecycle in detail. Instead, the repo type accepts events and applies them to the DB blindly. The SQL written to implement each event does, however, embed assumptions about what order events will happen in.
|