pilcrow - Run-It-Yourself web chat, maybe

	Commit message (Collapse)	Author	Age
*	Split `user` into a chat-facing entity and an authentication-facing entity.	Owen Jacobson	2025-08-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The taxonomy is now as follows: * A _login_ is someone's identity for the purposes of authenticating to the service. Logins are not synchronized, and in fact are not published anywhere in the current API. They have a login ID, a name and a password. * A _user_ is someone's identity for the purpose of participating in conversations. Users _are_ synchronized, as before. They have a user ID, a name, and a creation instant for the purposes of synchronization. In practice, a user exists for every login - in fact, users' names are stored in the login table and are joined in, rather than being stored redundantly in the user table. A login ID and its corresponding user ID are always equal, and the user and login ID types support conversion and comparison to facilitate their use in this context. Tokens are now associated with logins, not users. The currently-acting identity is passed down into app types as a login, not a user, and then resolved to a user where appropriate within the app methods. As a side effect, the `GET /api/boot` method now returns a `login` key instead of a `user` key. The structure of the nested value is unchanged.
*	Add a missing docs note about the behaviour of `POST /api/auth/logout` when ↵	Owen Jacobson	2025-08-24
\| \| \| \| \| \|	the current token is invalid. It's inconsistent with the behaviour when the current token is unset. Shrug.
*	Stop returning a body from `POST /api/password`.	Owen Jacobson	2025-08-24
\|
*	Stop returning an HTTP body from `POST /api/invite/:id`.	Owen Jacobson	2025-08-24
\| \| \| \|	As with the previous commits, the body was never actually being used.
*	Stop returning body data from `POST /api/auth/login`.	Owen Jacobson	2025-08-24
\| \| \| \|	As with `/api/setup`, the response was an ad-hoc choice, which we are not using and which constrains future development just by existing.
*	Stop returning body data from `POST /api/setup`.	Owen Jacobson	2025-08-24
\| \| \| \|	This API response was always ad-hoc, and the client doesn't use it. To free up some maneuvering room for server refactorings, stop sending it. We can add a response in the future if there's a need.
*	Add a `--umask` option to determine what permissions new files/databases get.	Owen Jacobson	2025-07-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new `--umask` option takes one of three values: * `--umask masked`, the default, takes the inherited umask and forces o+rwx on. * `--umask inherit` takes the inherited umask as-is. * `--umask OCTAL` sets the umask to exactly `OCTAL` and is broadly equivalent to `umask OCTAL && pilcrow --umask inherit`. This fell out of a conversation with @wlonk, who is working on notifications. Since notifications may require [VAPID] keys, the server will need a way to store those keys. That would generally be "in the pilcrow database," which lead me to the observation that Pilcrow creates that database as world-readable by default. "World-readable" and "encryption/signing keys" are not things that belong in the same sentence. [VAPID]: https://datatracker.ietf.org/doc/html/rfc8292 The most "obvious" solution would be to set the permissions used for the sqlite database when it's created. That's harder than it sounds: sqlite has no built-in facility for doing this. The closest thing that exists today is the [`modeof`] query parameter, which copies the permissions (and ownership) from some other file. We also can't reliably set the permissions ourselves, as sqlite may - depending on build options and configuration - [create multiple files][wal]. [`modeof`]: https://www.sqlite.org/uri.html [wal]: https://www.sqlite.org/wal.html Using `umask` is a whole-process solution to this. As Pilcrow doesn't attempt to create other files, there's little issue with doing it this way, but this is a design risk for future work if it creates files that are _intended_ to be readable by more than just the Pilcrow daemon user.
*	Do not support users entering bare HTML in swatches.	Owen Jacobson	2025-07-09
\| \| \| \| \| \| \| \|	You can inject Javascript into a swatch that uses `{@html <expr>}` fairly easily. `<script>foo()</script>` doesn't appear to work, but `<img src="x" onerror="foo()">` does, for example. That code then runs with the same access to cookies, and the same access to local data, as the Pilcrow client. This change removes that capability, by replacing the two swatches that exposed it with more limited examples. I love the generality and flexibility of generic HTML entry here, and I think it might have been useful for swatching components that are generic DOM containers (which both `Message` and `MessageRun` are today), but swatches are a user interface and are exposed to _all_ users. A user who is unfamiliar with HTML and Javascript, but who is persuaded to open a swatch and enter some code into it (think about an attacker who tells their victim "hey check out this funny thing that happens," preying on curiousity, while providing a lightly-obfuscated payload) can then impersonate that user, exfiltrate anything saved locally, or potentially install persistent code using JS' various background-processing APIs. Gnarly stuff. We're not up to mitigating that in place. Anyone who knows JS can likely learn to build the client from source, and can experiment with arbitrary input that way, taking responsibility for the results in the process, while anyone who doesn't is unlikely to be persuaded to set up an entire Node toolchain just for an exploit.
*	Set up a skeleton for swatches.	Owen Jacobson	2025-07-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A swatch is a live, and ideally editable, example of an element of the service. They serve as: * Documentation: what is this element, how do you use it, what does it do? * Demonstration: what does this element look like? * Manual test scaffolding: when I change this element like _so_, what happens? Swatches are collectively available under `/.swatch/` on a running instance, and are set up in a separate [group] from the rest of the UI. They do not require setup or login for simplicity's sake and because they don't _do_ anything that requires either of those things. [group]: https://svelte.dev/docs/kit/advanced-routing#Advanced-layouts-(group) Swatches are manually curated, for a couple of reasons: * We lack the technical infrastructure needed to do this based on static analysis; and * Manual curation lets us include affordances like "recommended values," that would be tricky to express as part of the type or schema for the component. The tradeoff, however, is that swatches may fall out of step with the components they depic, if not reviewed regularly. I hope that, by making them part of the development process, this risk will be mitigated through regular use.
*	Replace `channel` with `conversation` throughout the API.	Owen Jacobson	2025-07-03
\| \| \| \|	This is a breaking change for essentially all clients. Thankfully, there's presently just the one, so we don't need to go to much effort to accommoate that; the client is modified in this commit to adapt, users can reload their client, and life will go on.
*	Organize the developer docs into a "Pilcrow for Developers" book.	Owen Jacobson	2025-07-01
\| \| \| \| \| \|	The audience for this is developers looking to make changes to Pilcrow, either on the server, on the included client, or via its data model. Most of the material here is drawn from existing documents, but organized somewhat more coherently. I've left some space for client documentation, though no such documents exist yet.
*	Send back the current state as events, not snapshots, during client boot.	ojacobson	2025-07-01
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a couple of contributing reasons for this. * Each client's author - even ourselves - is best positioned to know how best to convert history into state to meet the needs of that specific client. There is (probably) no universal solution. You can already see this with the built-in client, where unread tracking gets stapled onto snapshots locally and maintained as events roll in, and I would expect this to happen more and more regularly over time. If we ever sprout other clients, I'd also expect their local state to be different. The API, on the other hand, must expose a surface that's universal to all clients. For boot, that was a very rote list-of-nouns data model. The other option is to expose a surface specific to one client and make other clients accommodate, which is contrary to the goals of this project. * The need to compute snapshots adds friction when adding or changing the behaviour of the API, even when those changes only tangentially touch `/api/boot`. For example, my work on adding messages to multiple conversations got hung up in trying to figure out how to represent that at boot time, plus how to represent that in the event stream. * The rationale for sending back a computed snapshot of the state was to avoid having the client replay events from the beginning of time, and to limit the amount of data sent back. This didn't pan out - most snapshots in practice consisted of the same data you'd get from the event stream anyways, almost with a 1:1 correspondence (with a `sent` or `created` event standing in for a `messages`, `channels`, or `users` entry). Exceptions - deleted messages and channels - were rare, and are ephemeral. * Generating the snapshots requires loading the entire history into memory anyways. We're not saving any server-side IO by computing snapshots, but we are spending server-side compute time to generate them for clients that are then going to throw them away, as above. This change resolves these tensions by delegating state management _entirely_ to the client, removing the server-side state snapshots. The server communicates in events only. ## Alternatives I joked to @wlonk that the "2.0-bis" version of this change always returns `resume_point` 0 and an empty events list. That would be correct, and compatible with the client logic in this change, and would actually work. In fact, we could get rid of the event part of `/api/boot` entirely, and require clients to consume the event stream from the beginning every time they reconnect. The main reason I _don't_ want to do this has to do with reconnects. Right now - both with snapshots, before this change, and with events, after - the client can cleanly delineate "historical" events (to be applied while the state is not presented to the user) and "current" events (to be presented to the user immediately). The `application/event-stream` protocol has no way to make that distinction out of the box, and while we can hack something in, all the approaches I can think of are nasty. Merges boot-events into main.
\| *	Remove the snapshot fields from `/api/boot`.	Owen Jacobson	2025-06-20
\| \| \| \| \| \| \| \|	Clients now _must_ construct their state from the event stream; it is no longer possible for them to delegate that work to the server.
\| *	Include historical events in the boot response.	Owen Jacobson	2025-06-20
\| \| \| \| \| \| \| \|	The returned events are all events up to and including the `resume_point` in the same response. If combined with the events from `/api/events?resume_point=x`, using the same `resume_point`, the client will have a complete event history, less any events from histories that have been purged.
* \|	Regularize the capitalization in the API docs table of contents.	Owen Jacobson	2025-06-23
\|/
*	tools/reformat	Owen Jacobson	2025-06-11
\|
*	Consolidate project linting into tool scripts.	Owen Jacobson	2025-06-09
\| \| \| \|	The new `tools/check-lint` script checks lints across _all_ lintable files - JS (through `eslint`), and Rust (through `clippy` and `cargo check`). It also checks `eslint` against the whole project, not just against what's in the `ui` subdir, which means it now catches lintable issues in various JS config files.
*	Document our tooling for code style.	Owen Jacobson	2025-05-30
\|
*	Make creation time an intrinsic fact about channels, the way it is for events.	Owen Jacobson	2025-05-13
\| \| \| \|	To make unread handling of empty channels coherent (and to make it possible to mark an empty channel as having been read), they need to be associated with a specific point in time. This change exposes their creation time in the snapshot - it was already part of the event view, though the client doesn't know that yet.
*	Example requests should correspond to example responses in the API docs.	Owen Jacobson	2025-05-09
\| \| \| \|	There may be other instances of this, this is just the one I found recently.
*	Heartbeats are part of the event protocol.	Owen Jacobson	2025-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A heartbeat is an event that the server synthesizes any time an event stream has been idle for longer than some timeout. They allow clients to detect disconnection and network problems, which would otherwise go unnoticed because event streams are a one-way channel. Most network problems only become clear when the offended party tries to _send_ something, and subscribing to an event stream only sends something during the request phase. Technically, Pilcrow has always sent these, since we started using Axum's SSE support: it defaults to sending a dummy event after 15 seconds (consisting of `":\n\n"`, which is then ignored). I've built Pilcrow's heartbeat support out of that, by customizing the event sent back. The results _mostly_ look like existing events, but there are two key differences: * Heartbeats don't have `id` fields in the event stream. They're synthetic, and they don't participate in either the "resume at" sequence management, or the last-event-id header-based resumption management. * Heartbeats have an `event` but no `type` field in the message body. There are no subtypes. To make it less likely that clients will race with the server on expiring timeouts, heartbeats are sent about five seconds early. In this change, heartbeats are due after 20 seconds, but are sent after 15. If it takes longer than five seconds for a heartbeat to arrive, a client can and should treat that as a network problem and reconnect, but I'd really like to avoid that happening over differences smaller than a second, so I've left a margin. I originally sketched this out in conversation with @wlonk as having each event carry a deadline for the next one. I ultimately opted not to do that for a few reasons. First, Axum makes it hard - the built-in keep-alive support only works with a static event, and cannot make dynamic ones whose payloads might vary (for example if the deadline is variable). Second, it's complex, to no apparent gain, and adds deadline information to _every_ event type. This implementation, instead, sends deadline information as part of boot, as a fixed interval in seconds. Clients are responsible for working out deadlines based on message arrivals. This is fine; heartbeat-based connection management is best effort at the best of times, so a few milliseconds of slop in either direction won't hurt anything. The existing client ignores these events entirely, which is convenient. The new heartbeat event type is defined alongside the main event type, to make it less likely that we'll inadvertently make changes to one but not the other. We can still do so advertently, I just don't want it to be an accident.
*	Update the API docs to describe `user`s, not `login`s.	Owen Jacobson	2025-03-23
\|
*	Fix invalid JSON	Owen Jacobson	2025-02-21
\|
*	Write down a brief mission blurb.	Owen Jacobson	2025-02-21
\|
*	Rename the project to `pilcrow`.	Owen Jacobson	2024-11-08
\|
*	Resume points are no longer optional.	Owen Jacobson	2024-10-30
\| \| \| \|	This is an inconsequential change for actual clients, since "resume from the beginning" was never a preferred mode of operation, and it simplifies some internals. It should also mean we get better query plans where `coalesce(cond, true)` was previously being used.
*	Remove `hi-recanonicalize`.	Owen Jacobson	2024-10-30
\| \| \| \|	This utility was needed to support a database migration with existing data. I have it on good authority that no further databases exist that are in the state that made this tool necessary.
*	Prevent deletion of non-empty channels.	Owen Jacobson	2024-10-30
\|
*	Add `change password` UI + API.	Owen Jacobson	2024-10-29
\| \| \| \|	The protocol here re-checks the caller's password, as a "I left myself logged in" anti-pranking check.
*	Restrict deletion to deleting your own messages.	Owen Jacobson	2024-10-29
\|
*	Restrict channel names, too.	Owen Jacobson	2024-10-29
\| \| \| \|	Thankfully, channel creation only happens in one place, so we don't need a state machine for this.
*	fixup! Restrict login names.	Owen Jacobson	2024-10-29
\|
*	Restrict login names.	Owen Jacobson	2024-10-29
\| \| \| \| \| \| \| \|	There's no good reason to use an empty string as your login name, or to use one so long as to annoy others. Names beginning or ending with whitespace, or containing runs of whitespace, are also a technical problem, so they're also prohibited. This change does not implement [UTS #39], as I haven't yet fully understood how to do so. [UTS #39]: https://www.unicode.org/reports/tr39/
*	Package `hi` for Debian.	Owen Jacobson	2024-10-29
\| \| \| \| \| \| \| \| \| \| \| \|	This commit provides a Docker-based build process for generating `.deb` packages, which can be run in Docker Desktop. I don't love it, but it's the best option I have _right now_ for doing this. The resulting packages: * Install `hi` (and `hi-recanonicalize`), in `/usr/bin`. * Create a user (`hi`) and a data directory (`/var/lib/hi`). * Create and start a systemd service unit for `hi`. Packages are built for arm64 and amd64 (aka x86_64).
*	To make it easier to correlate deletes to the event stream, have deletes ↵	Owen Jacobson	2024-10-25
\| \| \| \|	return the ID of the affected entity.
*	Provide `hi-recanonicalize` to recover from canonicalized-name problems.	Owen Jacobson	2024-10-22
\|
*	Canonicalize login and channel names.	Owen Jacobson	2024-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Canonicalization does two things: * It prevents duplicate names that differ only by case or only by normalization/encoding sequence; and * It makes certain name-based comparisons "case-insensitive" (generalizing via Unicode's case-folding rules). This change is complicated, as it means that every name now needs to be stored in two forms. Unfortunately, this is _very likely_ a breaking schema change. The migrations in this commit perform a best-effort attempt to canonicalize existing channel or login names, but it's likely any existing channels or logins with non-ASCII characters will not be canonicalize correctly. Since clients look at all channel names and all login names on boot, and since the code in this commit verifies canonicalization when reading from the database, this will effectively make the server un-usuable until any incorrectly-canonicalized values are either manually canonicalized, or removed It might be possible to do better with [the `icu` sqlite3 extension][icu], but (a) I'm not convinced of that and (b) this commit is already huge; adding database extension support would make it far larger. [icu]: https://sqlite.org/src/dir/ext/icu For some references on why it's worth storing usernames this way, see <https://www.b-list.org/weblog/2018/nov/26/case/> and the refernced talk, as well as <https://www.b-list.org/weblog/2018/feb/11/usernames/>. Bennett's treatment of this issue is, to my eye, much more readable than the referenced Unicode technical reports, and I'm inclined to trust his opinion given that he maintains a widely-used, internet-facing user registration library for Django.
*	Unicode normalization on input.	Owen Jacobson	2024-10-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This normalizes the following values: * login names * passwords * channel names * message bodies, because why not The goal here is to have a canonical representation of these values, so that, for example, the service does not inadvertently host two channels whose names are semantically identical but differ in the specifics of how diacritics are encoded, or two users whose names are identical. Normalization is done on input from the wire, using Serde hooks, and when reading from the database. The `crate::nfc::String` type implements these normalizations (as well as normalizing whenever converted from a `std::string::String` generally). This change does not cover: * Trying to cope with passwords that were created as non-normalized strings, which are now non-verifiable as all the paths to verify passwords normalize the input. * Trying to ensure that non-normalized data in the database compares reasonably to normalized data. Fortunately, we don't _do_ very many string comparisons (I think only login names), so this isn't a huge deal at this stage. Login names will probably have to Get Fixed later on, when we figure out how to handle case folding for login name verification.
*	Mention the message deleted events, and that deleted channels cannot receive ↵	Owen Jacobson	2024-10-19
\| \| \| \|	messages.
*	Make the responses for various data creation requests more consistent.	Owen Jacobson	2024-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general: * If the client can only assume the response is immediately valid (mostly, login creation, where the client cannot monitor the event stream), then 200 Okay, with data describing the server's view of the request. * If the client can monitor for completion by watching the event stream, then 202 Accepted, with data describing the server's view of the request. This comes on the heels of a comment I made on Discord: > hrm > > creating a login: 204 No Content, no body > sending a message: 202 Accepted, no body > creating a channel: 200 Okay, has a body > > past me, what were you on There wasn't any principled reason for this inconsistency; it happened as the endpoints were written at different times and with different states of mind.
*	Merge branch 'wip/retain-deleted'	Owen Jacobson	2024-10-18
\|\
\| *	Explain (some of) the rationale for returning "empty" values in tombstone ↵	Owen Jacobson	2024-10-18
\| \| \| \| \| \| \| \|	events in the docs.
\| *	Retain deleted messages and channels temporarily, to preserve events for replay.	Owen Jacobson	2024-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, when a channel (message) was deleted, `hi` would send events to all _connected_ clients to inform them of the deletion, then delete all memory of the channel (message). Any disconnected client, on reconnecting, would not receive the deletion event, and would de-synch with the service. The creation events were also immediately retconned out of the event stream, as well. With this change, `hi` keeps a record of deleted channels (messages). When replaying events, these records are used to replay the deletion event. After 7 days, the retained data is deleted, both to keep storage under control and to conform to users' expectations that deleted means gone. To match users' likely intuitions about what deletion does, deleting a channel (message) _does_ immediately delete some of its associated data. Channels' names are blanked, and messages' bodies are also blanked. When the event stream is replayed, the original channel.created (message.sent) event is "tombstoned", with an additional `deleted_at` field to inform clients. The included client does not use this field, at least yet. The migration is, once again, screamingingly complicated due to sqlite's limited ALTER TABLE … ALTER COLUMN support. This change also contains capabilities that would allow the API to return 410 Gone for deleted channels or messages, instead of 404. I did experiment with this, but it's tricky to do pervasively, especially since most app-level interfaces return an `Option<Channel>` or `Option<Message>`. Redesigning these to return either `Ok(Channel)` (`Ok(Message)`) or `Err(Error::NotFound)` or `Err(Error::Deleted)` is more work than I wanted to take on for this change, and the utility of 410 Gone responses is not obvious to me. We have other, more pressing API design warts to address.
* \|	Get loaded data using `export let data`, instead of fishing around in $page.	Owen Jacobson	2024-10-17
\|/ \| \| \| \| \|	This is mostly a how-to-Svelte thing. I've also made the API responses for invites a bit more caller-friendly by flattening them and adding the ID field into them. The ID is redundant (the client knows it because the client has the invitation URL), but it makes presenting invitations and actioning them a bit easier.
*	API docs rewrite.	Owen Jacobson	2024-10-16
\| \| \| \| \| \|	Having the whole API in a single file was starting to feel very cramped and constraining. This rewrite breaks it out into sections; as a side effect, the docs are now about 2.5x as long as they were, as the rewrite allows more space for each idea without crowding the page. The docs are best read by running `tools/docs-api`.
*	Return a distinct error when an invite username is in use.	Owen Jacobson	2024-10-11
\| \| \| \|	I've also aligned channel creation with this (it's 409 Conflict). To make server setup more distinct, it now returns 503 Service Unavailable if setup has not been completed.
*	Create APIs for inviting users.	Owen Jacobson	2024-10-11
\|
*	Provide a separate "initial setup" endpoint that creates a user.	Owen Jacobson	2024-10-11
\|
*	Automatically delete database backups if automatic restore is successful.	Owen Jacobson	2024-10-10
\| \| \| \|	Operational experience with the server has shown that leaving the backup in place is not helpful. The near-automatic choice is to immediately delete it, and the server won't start until it has been deleted. If the backup restore succeeded, then we know the user has a copy of their database, since the sqlite3 online backups API promises to make the target database bitwise-identical to the source database, so there's little chance the user will need a duplicate.
*	Align send request fields with message fields by renaming `message` to `body`.	Owen Jacobson	2024-10-09
\|