pilcrow - Run-It-Yourself web chat, maybe

	Commit message (Collapse)	Author	Age
*	Fix invalid migration.	Owen Jacobson	2024-10-10
\| \| \| \| \| \|	The original version of this migration happened to work correctly, by accident, for databases with exactly one login. I missed this, and so did Kit, because both of our test databases _actually do_ contain exactly one login, and because I didn't run the tests before committing the migration. The fixed version works correctly for all scenarios I tested (zero, one, and two users, not super thorough). I've added code to patch out the original migration hash in databases that have it; no further corrective work is needed, as if the migration failed, then it got backed out anyways, and if it succeeded, you fell into the "one user" case.
*	Return a flat message list on boot, not nested lists by channel.	Owen Jacobson	2024-10-09
\| \| \| \|	This is a bit easier to compute, and sets us up nicely for pulling message boot out of the `/api/boot` response entirely.
*	Provide a view of logins to clients.	Owen Jacobson	2024-10-09
\|
*	Simplify channel IDs in events. Remove redundant ones.	Owen Jacobson	2024-10-09
\|
*	Use sqlx's API, not SQL groveling, to find unwanted migrations.	Owen Jacobson	2024-10-05
\|
*	Start fresh with database migrations.	Owen Jacobson	2024-10-04
\| \| \| \| \| \|	The migration path from the original project inception to now was complicated and buggy, and stranded _both_ Kit and I with broken databases due to oversights and incomplete migrations. We've agreed to start fresh, once. If this is mistakenly started with an original-schema-flavour DB, startup will be aborted.
*	List messages per channel.	Owen Jacobson	2024-10-03
\|
*	Add endpoints for deleting channels and messages.	Owen Jacobson	2024-10-03
\| \| \| \|	It is deliberate that the expire() functions do not use them. To avoid races, the transactions must be committed before events get sent, in both cases, which makes them structurally pretty different.
*	Represent channels and messages using a split "History" and "Snapshot" model.	Owen Jacobson	2024-10-03
\| \| \| \| \| \|	This separates the code that figures out what happened to an entity from the code that represents it to a user, and makes it easier to compute a snapshot at a point in time (for things like bootstrap). It also makes the internal logic a bit easier to follow, since it's easier to tell whether you're working with a point in time or with the whole recorded history. This hefty.
*	First pass on reorganizing the backend.	Owen Jacobson	2024-10-02
\| \| \| \|	This is primarily renames and repackagings.
*	Provide a resume point to bridge clients from state snapshots to the event ↵	Owen Jacobson	2024-10-01
\| \| \| \|	sequence.
*	Track event sequences globally, not per channel.	Owen Jacobson	2024-10-01
\| \| \| \|	Per-channel event sequences were a cute idea, but it made reasoning about event resumption much, much harder (case in point: recovering the order of events in a partially-ordered collection is quadratic, since it's basically graph sort). The minor overhead of a global sequence number is likely tolerable, and this simplifies both the API and the internals.
*	Prevent racing between `limit_stream` and logging out.	Owen Jacobson	2024-10-01
\|
*	Reimplement the logout machinery in terms of token IDs, not token secrets.	Owen Jacobson	2024-09-29
\| \| \| \| \| \|	This (a) reduces the amount of passing secrets around that's needed, and (b) allows tests to log out in a more straightforwards manner. Ish. The fixtures are a mess, but so is the nomenclature. Fix the latter and the former will probably follow.
*	Shut down the `/api/events` stream when the user logs out or their token ↵	Owen Jacobson	2024-09-29
\| \| \| \| \| \| \| \|	expires. When tokens are revoked (logout or expiry), the server now publishes an internal event via the new `logins` event broadcaster. These events are used to guard the `/api/events` stream. When a token revocation event arrives for the token used to subscribe to the stream, the stream is cut short, disconnecting the client. In service of this, tokens now have IDs, which are non-confidential values that can be used to discuss tokens without their secrets being passed around unnecessarily. These IDs are not (at this time) exposed to clients, but they could be.
*	Wrap credential and credential-holding types to prevent `Debug` leaks.	Owen Jacobson	2024-09-28
\| \| \| \| \| \| \| \| \| \| \| \|	The following values are considered confidential, and should never be logged, even by accident: * `Password`, which is a durable bearer token for a specific Login; * `IdentitySecret`, which is an ephemeral but potentially long-lived bearer token for a specific Login; or * `IdentityToken`, which may hold cookies containing an `IdentitySecret`. These values are now wrapped in types whose `Debug` impls output opaque values, so that they can be included in structs that `#[derive(Debug)]` without requiring any additional care. The wrappers also avoid implementing `Display`, to prevent inadvertent `to_string()`s. We don't bother obfuscating `IdentitySecret`s in memory or in the `.hi` database. There's no point: we'd also need to store the information needed to de-obfuscate them, and they can be freely invalidated and replaced by blanking that table and asking everyone to log in again. Passwords _are_ obfuscated for storage, as they're intended to be durable.
*	Expire channels, too.	Owen Jacobson	2024-09-28
\|
*	Delete expired messages out of band.	Owen Jacobson	2024-09-28
\| \| \| \| \| \| \| \|	Trying to reliably do expiry mid-request was causing some anomalies: * Creating a channel with a dup name would fail, then succeed after listing channels. It was very hard to reason about which operations needed to trigger expiry, to fix this "correctly," so now expiry runs on every request.
*	Assign sequence numbers from a counter, not by scanning messages	Owen Jacobson	2024-09-28
\|
*	Send created events when channels are added.	Owen Jacobson	2024-09-28
\|
*	Use a vector of sequence numbers, not timestamps, to restart /api/events ↵	Owen Jacobson	2024-09-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	streams. The timestamp-based approach had some formal problems. In particular, it assumed that time always went forwards, which isn't necessarily the case: * Alice calls `/api/channels/Cfoo` to send a message. * The server assigns time T to the request. * The server stalls somewhere in send() for a while, before storing and broadcasting the message. If it helps, imagine blocking on `tx.begin().await?` for a while. * In this interval, Bob calls `/api/events?channel=Cfoo`, receives historical messages up to time U (after T), and disconnects. * The server resumes Alice's request and finishes it. * Bob reconnects, setting his Last-Event-Id header to timestamp U. In this scenario, Bob never sees Alice's message unless he starts over. It wasn't in the original stream, since it wasn't broadcast while Bob was subscribed, and it's not in the new stream, since Bob's resume point is after the timestamp on Alice's message. The new approach avoids this. Each message is assigned a _sequence number_ when it's stored. Bob can be sure that his stream included every event, since the resume point is identified by sequence number even if the server processes them out of chronological order: * Alice calls `/api/channels/Cfoo` to send a message. * The server assigns time T to the request. * The server stalls somewhere in send() for a while, before storing and broadcasting. * In this interval, Bob calls `/api/events?channel=Cfoo`, receives historical messages up to sequence Cfoo=N, and disconnects. * The server resumes Alice's request, assigns her message sequence M (after N), and finishes it. * Bob resumes his subscription at Cfoo=N. * Bob receives Alice's message at Cfoo=M. There's a natural mutual exclusion on sequence numbers, enforced by sqlite, which ensures that no two messages have the same sequence number. Since sqlite promises that transactions are serializable by default (and enforces this with a whole-DB write lock), we can be confident that sequence numbers are monotonic, as well. This scenario is, to put it mildly, contrived and unlikely - which is what motivated me to fix it. These kinds of bugs are fiendishly hard to identify, let alone reproduce or understand. I wonder how costly cloning a map is going to turn out to be… A note on database migrations: sqlite3 really, truly has no `alter table … alter column` statement. The only way to modify an existing column is to add the column to a new table. If `alter column` existed, I would create the new `sequence` column in `message` in a much less roundabout way. Fortunately, these migrations assume that they are being run _offline_, so operations like "replace the whole table" are reasonable.
*	Remove the HTML client, and expose a JSON API.	Owen Jacobson	2024-09-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	This API structure fell out of a conversation with Kit. Described loosely: kit: ok kit: Here's what I'm picturing in a client kit: list channels, make-new-channel, zero to one active channels, post-to-active. kit: login/sign-up, logout owen: you will likely also want "am I logged in" here kit: sure, whoami
*	Expire messages after 90 days.	Owen Jacobson	2024-09-20
\| \| \| \| \| \| \| \| \| \|	This is intended to manage storage growth. A community with broadly steady traffic will now reach a steady state (ish) where the amount of storage in use stays within a steady band. The 90 day threshold is a spitball; this should be made configurable for the community's needs. I've also hoisted expiry out into the `app` classes, to reduce the amount of non-database work repo types are doing. This should make it easier to make expiry configurable later on. Includes incidental cleanup and style changes.
*	Express record dependencies through types.	Owen Jacobson	2024-09-17
\| \| \| \|	This provides a convenient place to _stick_ "not found" errors, though actually introducing them will come in a later commit.
*	Consolidate most repository types into a repo module.	Owen Jacobson	2024-09-16
\| \| \| \| \| \| \| \| \| \| \| \|	Having them contained in the individual endpoint groups conveyed an unintended sense that their intended scope was _only_ that endpoint group. It also made most repo-related import paths _quite_ long. This splits up the repos as follows: * "General applicability" repos - those that are only loosely connected to a single task, and are likely to be shared between tasks - go in crate::repo. * Specialized repos - those tightly connected to a specific task - go in the module for that task, under crate::PATH::repo. In both cases, each repo goes in its own submodule, to make it easier to use the module name as a namespace. Which category a repo goes in is a judgment call. `crate::channel::repo::broadcast` (formerly `channel::repo::messages`) is used outside of `crate::channel`, for example, but its main purpose is to support channel message broadcasts. It could arguably live under `crate::event::repo::channel`, but the resulting namespace is less legible to me.
*	Revoking a nonexistent token should fail	Owen Jacobson	2024-09-16
\|
*	Annotate channel events with channel ID at the router, not intrinsically.	Owen Jacobson	2024-09-15
\| \| \| \|	This bugged me aesthetically. At `app.channel().events(channel)`, the caller knows the channel ID; they don't need to be told. Having the same info come back out in the returned events felt bad.
*	Consolidate channel events into a single stream endpoint.	Owen Jacobson	2024-09-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While reviewing [MDN], I noticed this note: > SSE suffers from a limitation to the maximum number of open connections, which can be specially painful when opening various tabs as the limit is per browser and set to a very low number (6). […] This limit is per browser + domain, so that means that you can open 6 SSE connections across all of the tabs to www.example1.com and another 6 SSE connections to www.example2.com. I tested it in Safari; this is true, and once six streams are open, _no_ more requests can be made - in any tab, even a fresh one. Since the design _was_ that each channel had its own events endpoint, this is an obvious operations risk. Any client that tries to read multiple channels' streams will hit this limit quickly. This change consolidates all channel events into a single endpoint: `/events`. This takes a list of channel IDs (as query parameters, one `channel=` param per channel), and streams back events from all listed channels. The previous `/:channel/events` endpoint has been removed. Clients can selectively request events for the channels they're interested in. [MDN]: https://developer.mozilla.org/en-US/docs/Web/API/EventSource
*	Placeholder UX, probably	Owen Jacobson	2024-09-14
\|
*	Support Last-Event-Id as a method of resuming channel events after a disconnect	Owen Jacobson	2024-09-13
\|
*	Generate the required structure for broadcasting from a join, not from O(n) ↵	Owen Jacobson	2024-09-13
\| \| \| \|	queries.
*	Embed the sender's whole login (id and name) in messages, drop the redundant ↵	Owen Jacobson	2024-09-13
\| \| \| \|	channel ID.
*	Transmit messages via `/:chan/send` and `/:chan/events`.	Owen Jacobson	2024-09-13
\|
*	Push most endpoint and extractor logic into functoins of `App`.	Owen Jacobson	2024-09-12
\| \| \| \| \| \| \| \|	This is, again, groundwork for logic that requires more than just a database connection. The login process has been changed to be more conventional, attempting login _before_ account creation rather than after it. This was not previously possible, because the data access methods used to perform these steps did not return enough information to carry out the workflow in that order. Separating storage from password validation and hashing forces the issue, and makes it clearer _at the App_ whether an account exists or not. This does introduce the possibility of two racing inserts trying to lay claim to the same username. Transaction isolation should ensure that only one of them "wins," which is what you get before this change anyways.
*	Expire tokens based on when they were last used, not based on when they were ↵	Owen Jacobson	2024-09-11
\| \| \| \| \| \| \| \|	issued. This lets us shorten the expiry interval - by quite a bit. Tokens in regular use will now live indefinitely, while tokens that go unused for _one week_ will be invalidated and deleted. This will reduce the number of "dead" tokens (still valid, but _de facto_ no longer in use) stored in the table, and limit the exposure period if a token is leaked and then not used immediately. It's also much less likely to produce surprise logouts three months after installation. You'll either stay logged in, or have to log in again much, much sooner, making it feel a lot more regular and less surprising.
*	Remove the notion of "channel members."	Owen Jacobson	2024-09-11
\| \| \| \| \| \|	This came out of a conversation with Kit. Their position, loosely, was that seeing scrollback when you look at a channel is useful, and since message delivery isn't meaningfully tied to membership (or at least doesn't have to be), what the hell is membership even doing? (I may have added that last part.) My take, on top of that, is that membership increases the amount of concepts we're committed to. We don't need that commitment yet.
*	Support joining channels.	Owen Jacobson	2024-09-04
\|
*	Allow any login to create channels.	Owen Jacobson	2024-09-04
\|
*	Expire sessions after 90 days.	Owen Jacobson	2024-09-04
\|
*	Display a different / page depending on whether the current identity is ↵	Owen Jacobson	2024-09-04
\| \| \| \| \| \| \| \|	valid or not. This is mostly a proof of concept for the implementation of form login implemented in previous commits, but it _is_ useful as it controls whether the / page shows login, or shows logout. From here, chat is next!
*	Add logout support.	Owen Jacobson	2024-09-03
\|
*	Allow login creation and authentication.	Owen Jacobson	2024-09-03
	This is a beefy change, as it adds a TON of smaller pieces needed to make this all function: * A database migration. * A ton of new crates for things like password validation, timekeeping, and HTML generation. * A first cut at a module structure for routes, templates, repositories. * A family of ID types, for identifying various kinds of domain thing. * AppError, which _doesn't_ implement Error but can be sent to clients.