| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Canonicalization does two things:
* It prevents duplicate names that differ only by case or only by normalization/encoding sequence; and
* It makes certain name-based comparisons "case-insensitive" (generalizing via Unicode's case-folding rules).
This change is complicated, as it means that every name now needs to be stored in two forms. Unfortunately, this is _very likely_ a breaking schema change. The migrations in this commit perform a best-effort attempt to canonicalize existing channel or login names, but it's likely any existing channels or logins with non-ASCII characters will not be canonicalize correctly. Since clients look at all channel names and all login names on boot, and since the code in this commit verifies canonicalization when reading from the database, this will effectively make the server un-usuable until any incorrectly-canonicalized values are either manually canonicalized, or removed
It might be possible to do better with [the `icu` sqlite3 extension][icu], but (a) I'm not convinced of that and (b) this commit is already huge; adding database extension support would make it far larger.
[icu]: https://sqlite.org/src/dir/ext/icu
For some references on why it's worth storing usernames this way, see <https://www.b-list.org/weblog/2018/nov/26/case/> and the refernced talk, as well as <https://www.b-list.org/weblog/2018/feb/11/usernames/>. Bennett's treatment of this issue is, to my eye, much more readable than the referenced Unicode technical reports, and I'm inclined to trust his opinion given that he maintains a widely-used, internet-facing user registration library for Django.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This normalizes the following values:
* login names
* passwords
* channel names
* message bodies, because why not
The goal here is to have a canonical representation of these values, so that, for example, the service does not inadvertently host two channels whose names are semantically identical but differ in the specifics of how diacritics are encoded, or two users whose names are identical.
Normalization is done on input from the wire, using Serde hooks, and when reading from the database. The `crate::nfc::String` type implements these normalizations (as well as normalizing whenever converted from a `std::string::String` generally).
This change does not cover:
* Trying to cope with passwords that were created as non-normalized strings, which are now non-verifiable as all the paths to verify passwords normalize the input.
* Trying to ensure that non-normalized data in the database compares reasonably to normalized data. Fortunately, we don't _do_ very many string comparisons (I think only login names), so this isn't a huge deal at this stage. Login names will probably have to Get Fixed later on, when we figure out how to handle case folding for login name verification.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In general:
* If the client can only assume the response is immediately valid (mostly, login creation, where the client cannot monitor the event stream), then 200 Okay, with data describing the server's view of the request.
* If the client can monitor for completion by watching the event stream, then 202 Accepted, with data describing the server's view of the request.
This comes on the heels of a comment I made on Discord:
> hrm
>
> creating a login: 204 No Content, no body
> sending a message: 202 Accepted, no body
> creating a channel: 200 Okay, has a body
>
> past me, what were you on
There wasn't any principled reason for this inconsistency; it happened as the endpoints were written at different times and with different states of mind.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Previously, when a channel (message) was deleted, `hi` would send events to all _connected_ clients to inform them of the deletion, then delete all memory of the channel (message). Any disconnected client, on reconnecting, would not receive the deletion event, and would de-synch with the service. The creation events were also immediately retconned out of the event stream, as well.
With this change, `hi` keeps a record of deleted channels (messages). When replaying events, these records are used to replay the deletion event. After 7 days, the retained data is deleted, both to keep storage under control and to conform to users' expectations that deleted means gone.
To match users' likely intuitions about what deletion does, deleting a channel (message) _does_ immediately delete some of its associated data. Channels' names are blanked, and messages' bodies are also blanked. When the event stream is replayed, the original channel.created (message.sent) event is "tombstoned", with an additional `deleted_at` field to inform clients. The included client does not use this field, at least yet.
The migration is, once again, screamingingly complicated due to sqlite's limited ALTER TABLE … ALTER COLUMN support.
This change also contains capabilities that would allow the API to return 410 Gone for deleted channels or messages, instead of 404. I did experiment with this, but it's tricky to do pervasively, especially since most app-level interfaces return an `Option<Channel>` or `Option<Message>`. Redesigning these to return either `Ok(Channel)` (`Ok(Message)`) or `Err(Error::NotFound)` or `Err(Error::Deleted)` is more work than I wanted to take on for this change, and the utility of 410 Gone responses is not obvious to me. We have other, more pressing API design warts to address.
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
This is a little excessive, as PasswordHash (which StoredHash converts to) _does_ derive Debug and exposes the hash, but I'll feel better if the hash never ends up in logs.
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
| |
The client now takes an initial snapshot from the response to `/api/boot`, then picks up the event stream at the immediately-successive event to the moment the snapshot was taken.
This commit removes the following unused endpoints:
* `/api/channels` (GET)
* `/api/channels/:channel/messages` (GET)
The information therein is now part of the boot response. We can always add 'em back, but I wanted to clear the deck for designing something more capable, for dealing with client needs.
|
| |
|
|
| |
This helped me discover an organizational scheme I like more.
|
| | |
|
| |
|
|
| |
This is primarily renames and repackagings.
|
| |
|
|
| |
(This is part of a larger reorganization.)
|
| |
|
|
| |
sequence.
|
| | |
|
| |
|
|
|
|
| |
This (a) reduces the amount of passing secrets around that's needed, and (b) allows tests to log out in a more straightforwards manner.
Ish. The fixtures are a mess, but so is the nomenclature. Fix the latter and the former will probably follow.
|
| |
|
|
|
|
|
|
| |
expires.
When tokens are revoked (logout or expiry), the server now publishes an internal event via the new `logins` event broadcaster. These events are used to guard the `/api/events` stream. When a token revocation event arrives for the token used to subscribe to the stream, the stream is cut short, disconnecting the client.
In service of this, tokens now have IDs, which are non-confidential values that can be used to discuss tokens without their secrets being passed around unnecessarily. These IDs are not (at this time) exposed to clients, but they could be.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The following values are considered confidential, and should never be logged, even by accident:
* `Password`, which is a durable bearer token for a specific Login;
* `IdentitySecret`, which is an ephemeral but potentially long-lived bearer token for a specific Login; or
* `IdentityToken`, which may hold cookies containing an `IdentitySecret`.
These values are now wrapped in types whose `Debug` impls output opaque values, so that they can be included in structs that `#[derive(Debug)]` without requiring any additional care. The wrappers also avoid implementing `Display`, to prevent inadvertent `to_string()`s.
We don't bother obfuscating `IdentitySecret`s in memory or in the `.hi` database. There's no point: we'd also need to store the information needed to de-obfuscate them, and they can be freely invalidated and replaced by blanking that table and asking everyone to log in again. Passwords _are_ obfuscated for storage, as they're intended to be durable.
|
| | |
|
| |
|
|
| |
This change makes the identity cookie available throughout `/api`.
|
| |
|
|
| |
I had no idea `std` included a `matches!` macro, and I feel we're better off using it.
|
| |
|
|
| |
This'll catch style issues, mostly.
|
| |
|
|
|
|
|
|
| |
This silences some `-Wclippy::pedantic` warning, and it's just a good thing to do.
I've made the choice to have the docs comment face programmers, and to provide `hi --help` and `hi -h` content via Clap attributes instead of inferring it from the docs comment.
Internal (private) "rustdoc" comments have been converted to regular comments until I learn how to write better rustdoc.
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This API structure fell out of a conversation with Kit. Described loosely:
kit: ok
kit: Here's what I'm picturing in a client
kit: list channels, make-new-channel, zero to one active channels, post-to-active.
kit: login/sign-up, logout
owen: you will likely also want "am I logged in" here
kit: sure, whoami
|
| |
|
|
|
|
|
|
|
|
| |
This is intended to manage storage growth. A community with broadly steady traffic will now reach a steady state (ish) where the amount of storage in use stays within a steady band.
The 90 day threshold is a spitball; this should be made configurable for the community's needs.
I've also hoisted expiry out into the `app` classes, to reduce the amount of non-database work repo types are doing. This should make it easier to make expiry configurable later on.
Includes incidental cleanup and style changes.
|
| | |
|
| | |
|
| |
|
|
| |
This is implemented by making the return values, in most cases, idiosyncratic ad-hoc types that then convert to the approprate error response. This also should make endpoints more testable, since the return values can now be inspected to check their properties without having to process or parse an HTTP response.
|
| | |
|
| |
|
|
| |
This provides a convenient place to _stick_ "not found" errors, though actually introducing them will come in a later commit.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Having them contained in the individual endpoint groups conveyed an unintended sense that their intended scope was _only_ that endpoint group. It also made most repo-related import paths _quite_ long. This splits up the repos as follows:
* "General applicability" repos - those that are only loosely connected to a single task, and are likely to be shared between tasks - go in crate::repo.
* Specialized repos - those tightly connected to a specific task - go in the module for that task, under crate::PATH::repo.
In both cases, each repo goes in its own submodule, to make it easier to use the module name as a namespace.
Which category a repo goes in is a judgment call. `crate::channel::repo::broadcast` (formerly `channel::repo::messages`) is used outside of `crate::channel`, for example, but its main purpose is to support channel message broadcasts. It could arguably live under `crate::event::repo::channel`, but the resulting namespace is less legible to me.
|
| | |
|
| |
|
|
| |
BoxedError conceals the exact nature of the error, which in turn prevents me from using sqlx::Error::RowNotFound to signal absences.
|
| | |
|
| |
|
|
| |
This is mostly a style thing, the type stays `Infallible`, we're just less coupled to the specifics.
|
| |
|
|
| |
queries.
|
| | |
|
| |
|
|
| |
channel ID.
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
| |
This is, again, groundwork for logic that requires more than just a database connection.
The login process has been changed to be more conventional, attempting login _before_ account creation rather than after it. This was not previously possible, because the data access methods used to perform these steps did not return enough information to carry out the workflow in that order. Separating storage from password validation and hashing forces the issue, and makes it clearer _at the App_ whether an account exists or not.
This does introduce the possibility of two racing inserts trying to lay claim to the same username. Transaction isolation should ensure that only one of them "wins," which is what you get before this change anyways.
|
| |
|
|
|
|
| |
This is a jumping-off point for adding logic that needs more than just the DB for state, such as chat message handling.
The name sucks, but it's the best I've got.
|
| |
|
|
| |
the value in that record used to verify logins.
|
| |
|
|
|
|
|
|
| |
issued.
This lets us shorten the expiry interval - by quite a bit. Tokens in regular use will now live indefinitely, while tokens that go unused for _one week_ will be invalidated and deleted. This will reduce the number of "dead" tokens (still valid, but _de facto_ no longer in use) stored in the table, and limit the exposure period if a token is leaked and then not used immediately.
It's also much less likely to produce surprise logouts three months after installation. You'll either stay logged in, or have to log in again much, much sooner, making it feel a lot more regular and less surprising.
|
| | |
|
| | |
|