summaryrefslogtreecommitdiff
path: root/docs/api
diff options
context:
space:
mode:
authorOwen Jacobson <owen@grimoire.ca>2024-10-21 00:36:44 -0400
committerOwen Jacobson <owen@grimoire.ca>2024-10-22 10:58:11 -0400
commit3f9648eed48cd8b6cd35d0ae2ee5bbe25fa735ac (patch)
tree8ecdd86cd9e09d8a3bd55ec1f72888a81498cc38 /docs/api
parent379e97c2cb145bc3a495aa14746273d83b508214 (diff)
Canonicalize login and channel names.
Canonicalization does two things: * It prevents duplicate names that differ only by case or only by normalization/encoding sequence; and * It makes certain name-based comparisons "case-insensitive" (generalizing via Unicode's case-folding rules). This change is complicated, as it means that every name now needs to be stored in two forms. Unfortunately, this is _very likely_ a breaking schema change. The migrations in this commit perform a best-effort attempt to canonicalize existing channel or login names, but it's likely any existing channels or logins with non-ASCII characters will not be canonicalize correctly. Since clients look at all channel names and all login names on boot, and since the code in this commit verifies canonicalization when reading from the database, this will effectively make the server un-usuable until any incorrectly-canonicalized values are either manually canonicalized, or removed It might be possible to do better with [the `icu` sqlite3 extension][icu], but (a) I'm not convinced of that and (b) this commit is already huge; adding database extension support would make it far larger. [icu]: https://sqlite.org/src/dir/ext/icu For some references on why it's worth storing usernames this way, see <https://www.b-list.org/weblog/2018/nov/26/case/> and the refernced talk, as well as <https://www.b-list.org/weblog/2018/feb/11/usernames/>. Bennett's treatment of this issue is, to my eye, much more readable than the referenced Unicode technical reports, and I'm inclined to trust his opinion given that he maintains a widely-used, internet-facing user registration library for Django.
Diffstat (limited to 'docs/api')
-rw-r--r--docs/api/authentication.md19
-rw-r--r--docs/api/channels-messages.md12
2 files changed, 29 insertions, 2 deletions
diff --git a/docs/api/authentication.md b/docs/api/authentication.md
index 7e05443..135e91b 100644
--- a/docs/api/authentication.md
+++ b/docs/api/authentication.md
@@ -13,6 +13,23 @@ stateDiagram-v2
Authentication associates each authenticated request with a login.
+To create logins, see [initial setup](./initial-setup.md) and [invitations](./invitations.md).
+
+
+## Names
+
+<!-- This prose is duplicated in channels-messages.md. If you change it here, consider changing it there, too. -->
+The service handles login names using two separate forms.
+
+The first form is as given in the request used to create the login. This form of the login name is used throughout the API, and the service will preserve the name as entered (other than applying normalization), so that users' preferences around capitalization and accent marks are preserved.
+
+The second form is a "canonical" form, used internally by the service to control uniqueness and match names to logins. The canonical form is both case-folded and normalized.
+
+The canonical form is not available to API clients, but its use has practical consequences:
+
+* Names that differ only by case or only by code point sequence are treated as the same name. If the name is in use, changing the capitalization or changing the sequence of combining marks will not allow the creation of a second "identical" login.
+* The login API accepts any name that canonicalizes to the form stored in the database, making login names effectively case-insensitive.
+
## Identity tokens
@@ -32,8 +49,6 @@ Unless the endpoint's documentation says otherwise, all endpoints require authen
Authenticates the user using their login name and password. The login must exist before calling this endpoint.
-To create logins, see [initial setup](./initial-setup.md) and [invitations](./invitations.md).
-
**This endpoint does not require an `identity` cookie.**
### Request
diff --git a/docs/api/channels-messages.md b/docs/api/channels-messages.md
index a441f52..9ef4e66 100644
--- a/docs/api/channels-messages.md
+++ b/docs/api/channels-messages.md
@@ -27,6 +27,18 @@ Messages allow logins to communicate with one another. Channels are the conversa
Every channel has a unique name, chosen when the channel is created.
+## Names
+
+<!-- This prose is duplicated in authentication.md. If you change it here, consider changing it there, too. -->
+The service handles channel names using two separate forms.
+
+The first form is as given in the request used to create the channel. This form of the channel name is used throughout the API, and the service will preserve the name as entered (other than applying normalization), so that users' preferences around capitalization and accent marks are preserved.
+
+The second form is a "canonical" form, used internally by the service to control uniqueness and match names to channels. The canonical form is both case-folded and normalized.
+
+The canonical form is not available to API clients, but its use has practical consequences. Names that differ only by case or only by code point sequence are treated as the same name. If the name is in use, changing the capitalization or changing the sequence of combining marks will not allow the creation of a second "identical" channel.
+
+
## Expiry and purging
Both channels and messages expire after a time. Messages expire 90 days after being sent. Channels expire 90 days after the last message sent to them, or after creation if no messages are sent in that time.