Implement VAPID key support for Web Push.

This is intended to be a low-operator-involvement, self-managing approach: * The server generates its own VAPID key. The operator does not need to run complex `openssl` commands. * The server rotates its VAPID key regularly. The operator does not need to maintain a calendar or run administrative tools. * The included CLI changes allow for immediate key rotation if needed. ## Correctness This change is speculative, to a degree, and is based on some guesses and loose understandings of the various Web Push and VAPID specs and docs. We use RustCrypto's p256 crate to generate keys rather than a dedicated VAPID crate (one does exist) as I trust their code review and security standards further, but that means that I've had to guess what kind of P256 key is actually appropriate for this use case, and what encoding clients need. If those elements are incorrect they're not hard to change, but they need further verification before we commit to them on `main`. Changing the encoding can be done in the `src/vapid/event.rs` file. Changing the key type is more involved, but is still contained to `src/vapid/`. I already did that once, switching from `p256::SecretKey` to `p256::ecdsa::SingingKey`, and it's not too bad - takes maybe 20 minutes on a good day. The real risk is that we can't tell from the database what kind of key we have, so we need to settle this before making any permanent, durable changes to other peoples' data. ## Safety This change stores the private key, entirely in the clear, in the Pilcrow database. Anyone who gets the database or who gets access to `vapid_signing_key` can impersonate the server in Web Push messages. Operators are theoretically responsible for managing that risk. We are responsible for setting them up for success, and for making the right thing to do the easiest thing to do. This change tries to accomplish those goals in two ways: * It relies on previous changes to the process' umask to ensure that the database is not readable by other OS users. OS permissions are pretty thin protection these days, as they only apply to operations carried out through the OS, but it's better than nothing at preventing casual snooping on the private key. * Key rotation. The maximum period for which an exposed key can stay valid is 30 days, plus as long as it takes for clients to wake up, notice the key has changed, and invalidate their existing subscriptions. The impact of a key leak is also moderate at best. We don't intend to send clients confidential information, or instruct clients to take compromising action, via push message. We intend to use them to notify clients of events that users may be interested in, like messages. Abusing the push system will, at worst, distract users or prompt them to disable notifications, or potentially deliver notifications for messages that never existed, which users can identify by reading the chat for themselves. ## Operator tooling This change introduces the `pilcrow rotate-vapid-key` command, to immediately rotate the VAPID key. This is designed to be an online _or_ offline command: all it does is destroy the current key, so that the server will generate a new one. I've included some sketchy but complete documentation; it's probably inadequate, but then, so is that whole file. I would resist the use of operator CLI tools as contrary to our low-admin-involvement aesthetic, but in this situation it's hard to avoid. The alternatives on the one hand are documenting the schema and telling operators how to use `sqlite3` to rotate keys, or splitting the keys into a separate database that the operator can delete outright to trigger key rotation, or on the other hand stopping to implement privileges and operator interfaces inside of Pilcrow. I think a CLI is an acceptable compromise between inadequate operator support from the one set of alternatives, and massive complexity and a large delay on the other. We will likely, however, have to revise this later. Merges vapid-keys into push-notify.
author: ojacobson <ojacobson@noreply.codeberg.org> 2025-09-10 23:11:33 +0200
committer: ojacobson <ojacobson@noreply.codeberg.org> 2025-09-10 23:11:33 +0200
commit: 10ccf6afafa2b2c3beab991505fc780e7f8d8357 (patch)
tree: 1e10b660f2c68cb73d3c17703257ec0be51c36af /src/vapid/app.rs
parent: 342d4720576369f83d88b0dbde6ca5d58aebd252 (diff)
parent: 96b62f1018641b3dc28cc189d314a1bff475b751 (diff)
1 files changed, 104 insertions, 0 deletions
diff --git a/src/vapid/app.rs b/src/vapid/app.rs
new file mode 100644
index 0000000..ddb1f4d
--- /dev/null
+++ b/src/vapid/app.rs
@@ -0,0 +1,104 @@
+use chrono::{TimeDelta, Utc};
+use sqlx::SqlitePool;
+
+use super::{History, repo, repo::Provider as _};
+use crate::{
+    clock::DateTime,
+    db::NotFound as _,
+    event::{Broadcaster, Sequence, repo::Provider},
+};
+
+pub struct Vapid<'a> {
+    db: &'a SqlitePool,
+    events: &'a Broadcaster,
+}
+
+impl<'a> Vapid<'a> {
+    pub const fn new(db: &'a SqlitePool, events: &'a Broadcaster) -> Self {
+        Self { db, events }
+    }
+
+    pub async fn rotate_key(&self, rotate_at: &DateTime) -> Result<(), sqlx::Error> {
+        let mut tx = self.db.begin().await?;
+        // This is called from a separate CLI utility (see `cli.rs`), and we _can't_ deliver events
+        // to active clients from another process, so don't do anything that would require us to
+        // send events, like generating a new key.
+        //
+        // Instead, the server's next `refresh_key` call will generate a key and notify clients
+        // of the change. All we have to do is remove the existing key, so that the server can know
+        // to do so.
+        tx.vapid().clear().await?;
+        tx.commit().await?;
+
+        Ok(())
+    }
+
+    pub async fn refresh_key(&self, ensure_at: &DateTime) -> Result<(), Error> {
+        let mut tx = self.db.begin().await?;
+        let key = tx.vapid().current().await.optional()?;
+        if key.is_none() {
+            let changed_at = tx.sequence().next(ensure_at).await?;
+            let (key, secret) = History::begin(&changed_at);
+
+            tx.vapid().clear().await?;
+            tx.vapid().store_signing_key(&secret).await?;
+
+            let events = key.events().filter(Sequence::start_from(changed_at));
+            tx.vapid().record_events(events.clone()).await?;
+
+            tx.commit().await?;
+
+            self.events.broadcast_from(events);
+        } else if let Some(key) = key
+            // Somewhat arbitrarily, rotate keys every 30 days.
+            && key.older_than(ensure_at.to_owned() - TimeDelta::days(30))
+        {
+            // If you can think of a way to factor out this duplication, be my guest. I tried.
+            // The only approach I could think of mirrors `crate::user::create::Create`, encoding
+            // the process in a state machine made of types, and that's a very complex solution
+            // to a problem that doesn't seem to merit it. -o
+            let changed_at = tx.sequence().next(ensure_at).await?;
+            let (key, secret) = key.rotate(&changed_at);
+
+            tx.vapid().clear().await?;
+            tx.vapid().store_signing_key(&secret).await?;
+
+            // Refactoring constraint: this `events` iterator borrows `key`. Anything that moves
+            // `key` has to give it back, but it can't give both `key` back and an event iterator
+            // borrowing from `key` because Rust doesn't support types that borrow from other
+            // parts of themselves.
+            let events = key.events().filter(Sequence::start_from(changed_at));
+            tx.vapid().record_events(events.clone()).await?;
+
+            // Refactoring constraint: we _really_ want to commit the transaction before we send
+            // out events, so that anything acting on those events is guaranteed to see the state
+            // of the service at some point at or after the side effects of this. I'd also prefer
+            // to keep the commit in the same method that the transaction is begun in, for clarity.
+            tx.commit().await?;
+
+            self.events.broadcast_from(events);
+        }
+        // else, the key exists and is not stale. Don't bother allocating a sequence number, and
+        // in fact throw away the whole transaction.
+
+        Ok(())
+    }
+}
+
+#[derive(Debug, thiserror::Error)]
+pub enum Error {
+    #[error(transparent)]
+    Database(#[from] sqlx::Error),
+    #[error(transparent)]
+    Ecdsa(#[from] p256::ecdsa::Error),
+}
+
+impl From<repo::Error> for Error {
+    fn from(error: repo::Error) -> Self {
+        use repo::Error;
+        match error {
+            Error::Database(error) => error.into(),
+            Error::Ecdsa(error) => error.into(),
+        }
+    }
+}
author	ojacobson <ojacobson@noreply.codeberg.org>	2025-09-10 23:11:33 +0200
committer	ojacobson <ojacobson@noreply.codeberg.org>	2025-09-10 23:11:33 +0200
commit	10ccf6afafa2b2c3beab991505fc780e7f8d8357 (patch)
tree	1e10b660f2c68cb73d3c17703257ec0be51c36af /src/vapid/app.rs
parent	342d4720576369f83d88b0dbde6ca5d58aebd252 (diff)
parent	96b62f1018641b3dc28cc189d314a1bff475b751 (diff)