| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were previously exporting root causes from one layer of abstraction to the next. For example, anything that called into the database could cause an `sqlx::Error`, so anything that transitively called into that logic exported a `Database(sqlx::Error)` error variant of its own, using `From` to map errors from inner type to outer type.
This had a couple of side effects. First, it required each layer of error handling to carry with it a `From` implementation unwrapping and rewrapping root causes from the next layer down. This was particularly apparent in the event and boot endpoints, which had separate error cases unique to crypto key processing errors solely because they happened to involve handling events that contained those keys. There were others, including the pervasive `Database(sqlx::Error)` error variants.
Separately, none of the error variants introduced for this purpose were being used for anything other than printing to stderr. All the complexity of From impls and all the structure of the error types was being thrown away at top-level error handlers.
This change replaces most of those error types with a generic `Failed` error. A `Failed` carries with it two pieces of information: a (boxed) underlying error, of any boxable `Error` type, and text meant to explain the context and cause of an error. Code which acts on errors can treat `Failed` as a catch-all case, while individually handling errors that signify important cases. Errors can be moved into or out of the `Failed` case by refactoring, as needed.
The design of `Failed` is heavily motivated by [anyhow's `context` system][context] as a way for the programmer to capture immediate intention as an explanation for some underlying error. However, instead of accepting the full breadth of types that implement `Display`, a `Failed` can only carry strings as explanation. We don't need the generality at this time, and the implementation underlying it is pretty complex for what it does.
[context]: https://docs.rs/anyhow/latest/anyhow/struct.Error.html#method.context
This change also means that the full source chain for an error is now available to top-level error handlers, allowing more complete error messages. For example, starting `pilcrow` with an invalid network listen address produces
Failed to bind to www.google.com:64209
Caused by:
Can't assign requested address (os error 49)
instead of the previous
Error: Io(Os { code: 49, kind: AddrNotAvailable, message: "Can't assign requested address" })
which previously captured the same _cause_, but without the formatting (see previous commit) and without the _context_ (this commit). Similar improvements are available for many of the error scenarios Pilcrow is designed to give up on.
When deciding which errors to use `Failed` with, I've used the heuristic that if something can fail for more than one underlying reason, and if the caller will only ever need to be able to differentiate those reasons after substantial refactoring anyways, then the reasons should collase into `Failed`. If there's either only a single underlying failure reason possible, or only errors arising out of the function body possible, then I've left error handling alone.
In the process I've refactored most request-handler-level error mappings to explicitly map `Failed` to `Internal`, rather than having a catch-all mapping for all unhandled errors, to make it easier to remember to add request-level error representations when adding app-level error cases.
This also includes helper traits for `Error` and `Result`, to make constructing `Failed` (and errors that include `Failed` as an alternative) easier to do, and some constants for the recurring error messages related to transaction demarcation. I'm not completely happy with the repetitive nature of those error cases, but this is the best I've arrived at so far.
As errors are no longer expected to be convertible up the call stack, the `NotFound` and `Duplicate` helper traits for database errors had to change a bit. Those previously assumed that they would be used in the context of an error type implementing `From<sqlx::Error>` (or from another error type with similar characteristics), and that's not the case any more. The resulting idiom for converting a missing value into a domain error is `foo.await.optional().fail(MESSAGE)?.ok_or(DOMAIN ERROR)?`, which is rather clunky, but I've opted not to go further with it. The `Duplicate` helper is just plain gone, as it's not easily generalizable in this structure and using `match` is more tractable for me.
Finally, I've changed the convention for error messages from `all lowercase messages in whatever tense i feel like at the moment` to `Sentence-case messages in the past tense`, frequently starting with `Failed to` and a short summary of the task at hand. This, as above, makes error message capitalization between Pilcrow's own messages and messages coming from other libraries/the Rust stdlib much more coherent and less jarring to read.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The semantics of this endpoint are somewhat complex, and are incompletely captured in the associated docs change. For posterity, the intended workflow is:
1. Obtain Pilcrow's current VAPID key by connecting (it's in the events, either from boot or from the event stream).
2. Use the browser push APIs to create a push subscription, using that VAPID key.
3. Send Pilcrow the push subscription endpoint and keys, plus the VAPID key the client used to create it so that the server can detect race conditions with key rotation.
4. Wait for messages to arrive.
This commit does not introduce any actual messages, just subscription management endpoints.
When the server's VAPID key is rotated, all existing subscriptions are discarded. Without the VAPID key, the server cannot service those subscriptions. We can't exactly notify the broker to stop processing messages on those subscriptions, so this is an incomplete solution to what to do if the key is being rotated due to a compromise, but it's better than nothing.
The shape of the API endpoint is heavily informed by the [JSON payload][web-push-json] provided by browser Web Push implementations, to ease client development in a browser-based context. The idea is that a client can take that JSON and send it to the server verbatim, without needing to transform it in any way, to submit the subscription to the server for use.
[web-push-json]: https://developer.mozilla.org/en-US/docs/Web/API/PushSubscription/toJSON
Push subscriptions are operationally associated with a specific _user agent_, and have no inherent relationship with a Pilcrow login or token (session). Taken as-is, a subscription created by user A could be reused by user B if they share a user agent, even if user A logs out before user B logs in. Pilcrow therefore _logically_ associates push subscriptions with specific tokens, and abandons those subscriptions when the token is invalidated by
* logging out,
* expiry, or
* changing passwords.
(There are no other token invalidation workflows at this time.)
Stored subscriptions are also abandoned when the server's VAPID key changes.
|
| |
|
|
| |
As with the `Setup` component, I've generalized the associated middleware across anything that can provide a `Tokens`, where possible.
|
| | |
|
| |
|
|
|
|
| |
This conversion, from an iterator of type-specific events (say, `user::Event` or `message::Event`), into a `Vec<event::Event>`, is prevasive, and it needs to be done each time. Having Broadcaster expose a support method for this cuts down on the repetition, at the cost of a slightly alarming amount of type-system nonsense in `broadcast_from`.
Historical footnote: the internal message structure is a Vec and not an individual message so that bulk operations, like expiring channels and messages, won't disconnect everyone if they happen to dispatch more than sixteen messages (current queue depth limit) at once. We trade allocation and memory pressure for keeping the connections alive. _Most_ event publishing is an iterator of one item, so the Vec allocation is redundant.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The taxonomy is now as follows:
* A _login_ is someone's identity for the purposes of authenticating to the service. Logins are not synchronized, and in fact are not published anywhere in the current API. They have a login ID, a name and a password.
* A _user_ is someone's identity for the purpose of participating in conversations. Users _are_ synchronized, as before. They have a user ID, a name, and a creation instant for the purposes of synchronization.
In practice, a user exists for every login - in fact, users' names are stored in the login table and are joined in, rather than being stored redundantly in the user table. A login ID and its corresponding user ID are always equal, and the user and login ID types support conversion and comparison to facilitate their use in this context.
Tokens are now associated with logins, not users. The currently-acting identity is passed down into app types as a login, not a user, and then resolved to a user where appropriate within the app methods.
As a side effect, the `GET /api/boot` method now returns a `login` key instead of a `user` key. The structure of the nested value is unchanged.
|
| |
|
|
| |
This is the leading edge of a larger storage refactoring, where repo types stop doing things like generating secrets or deciding whether to carry out an operation. To make this work, there is now a `Token` type that holds the complete state of a token, in memory.
|
| |
|
|
| |
We'll be building separate entities around this in future commits, to better separate the authentication data (non-synchronized and indeed "not public") from the chat data (synchronized and public).
|
| |
|
|
|
|
| |
identity token.
This is a small refactoring that's been possible for a while, and we only just noticed.
|
| | |
|
| |
|
|
| |
As with `/api/setup`, the response was an ad-hoc choice, which we are not using and which constrains future development just by existing.
|
| |
|
|
| |
Having this buried under `crate::user` makes it hard to split up the roles `user` fulfils right now. Moving it out to its own module makes it a bit tidier to reuse it in a separate, authentication-only way.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rust 1.89 added a new warning:
warning: hiding a lifetime that's elided elsewhere is confusing
--> src/setup/repo.rs:4:14
|
4 | fn setup(&mut self) -> Setup;
| ^^^^^^^^^ ----- the same lifetime is hidden here
| |
| the lifetime is elided here
|
= help: the same lifetime is referred to in inconsistent ways, making the signature confusing
help: use `'_` for type paths
|
4 | fn setup(&mut self) -> Setup<'_>;
| ++++
I don't entirely agree with the style advice here, but lifetime elision style is an evolving area in Rust and I'd rather track the Rust team's recommendations than invent my own, so I've added all of them.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is based heavily on the work done for normalized strings, in `crate::normalize`. The key realization in that module is that the logic distinguishing one kind of thing (normalized strings in that case, IDs, in this case) can be packaged up as a type token, and that doing so may reduce the overall complexity. This implementation for ID also borrows heavily from the implementation for normalized strings.
It's less flexible: an ID implemented this way can't expose _less_ of `crate::id::ID`'s interface, whereas newtype wrappers can, for example. However, our code doesn't use that flexiblity on purpose anywhere and we're relatively unlikely to change that. In return, the individual ID types require substantially less code - they do not, for example, need to re-implement `Display` for themselves.
I very nearly made the trait `Prefix`:
```rust
pub trait Prefix {
const PREFIX: &str;
}
```
however, I think having an effectively-constant method is less surprising overall.
|
| |
|
|
| |
Missed in 9f7f82dbd9adee8ae18ae7ff2600b3e1dc8fadbc.
|
| |\ |
|
| | |
| |
| |
| |
| |
| | |
357116366c1307bedaac6a3dfe9c5ed8e0e0c210 wasn't updated (and wasn't quite correct then, either).
I haven't found a way to derive it from the name of the type.
|
| | | |
|
| | | |
|
| |/ |
|
| | |
|
| |
|
|
|
|
|
|
| |
There are a couple of migration suggestions from `cargo fix --edition` that I have deliberately skipped, which are intended to make sure that the changes to `if let` scoping don't bite us. They don't, I'm pretty sure, and if I turn out to be wrong, I'd rather fix the scoping issues (as they arise) than use `match` (`cargo fix --edition`'s suggestion).
This change also includes a bulk reformat and a clippy cleanup.
NOTA BENE: As this requires a new Rust toolchain, you'll need to update Rust (`rustup update`, normally) or the server won't build. This also applies to the Debian builder Docker image; it'll need to be rebuilt (from scratch, pulling its base image again) as well.
|
| | |
|
| |
|
|
| |
The protocol here re-checks the caller's password, as a "I left myself logged in" anti-pranking check.
|
| |
|
|
|
|
|
|
|
| |
* A `cookie::Identity` (`IdentityCookie`) is a specialized CookieJar for working with identities.
* An `Identity` is a token/login pair.
I hope for this to be a bit more legible.
In service of this, `Login` is no longer extractable. You have to get an identity.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Canonicalization does two things:
* It prevents duplicate names that differ only by case or only by normalization/encoding sequence; and
* It makes certain name-based comparisons "case-insensitive" (generalizing via Unicode's case-folding rules).
This change is complicated, as it means that every name now needs to be stored in two forms. Unfortunately, this is _very likely_ a breaking schema change. The migrations in this commit perform a best-effort attempt to canonicalize existing channel or login names, but it's likely any existing channels or logins with non-ASCII characters will not be canonicalize correctly. Since clients look at all channel names and all login names on boot, and since the code in this commit verifies canonicalization when reading from the database, this will effectively make the server un-usuable until any incorrectly-canonicalized values are either manually canonicalized, or removed
It might be possible to do better with [the `icu` sqlite3 extension][icu], but (a) I'm not convinced of that and (b) this commit is already huge; adding database extension support would make it far larger.
[icu]: https://sqlite.org/src/dir/ext/icu
For some references on why it's worth storing usernames this way, see <https://www.b-list.org/weblog/2018/nov/26/case/> and the refernced talk, as well as <https://www.b-list.org/weblog/2018/feb/11/usernames/>. Bennett's treatment of this issue is, to my eye, much more readable than the referenced Unicode technical reports, and I'm inclined to trust his opinion given that he maintains a widely-used, internet-facing user registration library for Django.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This normalizes the following values:
* login names
* passwords
* channel names
* message bodies, because why not
The goal here is to have a canonical representation of these values, so that, for example, the service does not inadvertently host two channels whose names are semantically identical but differ in the specifics of how diacritics are encoded, or two users whose names are identical.
Normalization is done on input from the wire, using Serde hooks, and when reading from the database. The `crate::nfc::String` type implements these normalizations (as well as normalizing whenever converted from a `std::string::String` generally).
This change does not cover:
* Trying to cope with passwords that were created as non-normalized strings, which are now non-verifiable as all the paths to verify passwords normalize the input.
* Trying to ensure that non-normalized data in the database compares reasonably to normalized data. Fortunately, we don't _do_ very many string comparisons (I think only login names), so this isn't a huge deal at this stage. Login names will probably have to Get Fixed later on, when we figure out how to handle case folding for login name verification.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In general:
* If the client can only assume the response is immediately valid (mostly, login creation, where the client cannot monitor the event stream), then 200 Okay, with data describing the server's view of the request.
* If the client can monitor for completion by watching the event stream, then 202 Accepted, with data describing the server's view of the request.
This comes on the heels of a comment I made on Discord:
> hrm
>
> creating a login: 204 No Content, no body
> sending a message: 202 Accepted, no body
> creating a channel: 200 Okay, has a body
>
> past me, what were you on
There wasn't any principled reason for this inconsistency; it happened as the endpoints were written at different times and with different states of mind.
|
| |
|
|
| |
MSRV is now 1.82.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Previously, when a channel (message) was deleted, `hi` would send events to all _connected_ clients to inform them of the deletion, then delete all memory of the channel (message). Any disconnected client, on reconnecting, would not receive the deletion event, and would de-synch with the service. The creation events were also immediately retconned out of the event stream, as well.
With this change, `hi` keeps a record of deleted channels (messages). When replaying events, these records are used to replay the deletion event. After 7 days, the retained data is deleted, both to keep storage under control and to conform to users' expectations that deleted means gone.
To match users' likely intuitions about what deletion does, deleting a channel (message) _does_ immediately delete some of its associated data. Channels' names are blanked, and messages' bodies are also blanked. When the event stream is replayed, the original channel.created (message.sent) event is "tombstoned", with an additional `deleted_at` field to inform clients. The included client does not use this field, at least yet.
The migration is, once again, screamingingly complicated due to sqlite's limited ALTER TABLE … ALTER COLUMN support.
This change also contains capabilities that would allow the API to return 410 Gone for deleted channels or messages, instead of 404. I did experiment with this, but it's tricky to do pervasively, especially since most app-level interfaces return an `Option<Channel>` or `Option<Message>`. Redesigning these to return either `Ok(Channel)` (`Ok(Message)`) or `Err(Error::NotFound)` or `Err(Error::Deleted)` is more work than I wanted to take on for this change, and the utility of 410 Gone responses is not obvious to me. We have other, more pressing API design warts to address.
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
We now (try to) use the identity cookie in `/ch/:channel`. This will not work, because the cookie's path doesn't include `/ch/`.
|
| | |
|
| |
|
|
|
|
| |
This separates the code that figures out what happened to an entity from the code that represents it to a user, and makes it easier to compute a snapshot at a point in time (for things like bootstrap). It also makes the internal logic a bit easier to follow, since it's easier to tell whether you're working with a point in time or with the whole recorded history.
This hefty.
|
| |
|
|
| |
This helped me discover an organizational scheme I like more.
|
| | |
|
|
|
This is primarily renames and repackagings.
|