From 379e97c2cb145bc3a495aa14746273d83b508214 Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Sat, 19 Oct 2024 01:51:30 -0400 Subject: Unicode normalization on input. This normalizes the following values: * login names * passwords * channel names * message bodies, because why not The goal here is to have a canonical representation of these values, so that, for example, the service does not inadvertently host two channels whose names are semantically identical but differ in the specifics of how diacritics are encoded, or two users whose names are identical. Normalization is done on input from the wire, using Serde hooks, and when reading from the database. The `crate::nfc::String` type implements these normalizations (as well as normalizing whenever converted from a `std::string::String` generally). This change does not cover: * Trying to cope with passwords that were created as non-normalized strings, which are now non-verifiable as all the paths to verify passwords normalize the input. * Trying to ensure that non-normalized data in the database compares reasonably to normalized data. Fortunately, we don't _do_ very many string comparisons (I think only login names), so this isn't a huge deal at this stage. Login names will probably have to Get Fixed later on, when we figure out how to handle case folding for login name verification. --- docs/api/channels-messages.md | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'docs/api/channels-messages.md') diff --git a/docs/api/channels-messages.md b/docs/api/channels-messages.md index 1ff037d..a441f52 100644 --- a/docs/api/channels-messages.md +++ b/docs/api/channels-messages.md @@ -70,6 +70,8 @@ The response will have the following fields: | `id` | string | A unique identifier for the channel. This can be used to associate the channel with events, or to make API calls targeting the channel. | | `name` | string | The channel's name. | +The returned name may not be identical to the name requested, as the name will be converted to [normalization form C](http://www.unicode.org/reports/tr15/) automatically. The returned name will include this normalization; the service will use the normalized name elsewhere, and does not store the originally requested name. + When completed, the service will emit a [channel created](events.md#channel-created) event with the channel's ID. ### Duplicate channel name @@ -125,6 +127,8 @@ The response will have the following fields: | `id` | string | A unique identifier for the message. This can be used to associate the message with events, or to make API calls targeting the message. | | `body` | string | The message's body. | +The returned message body may not be identical to the body as sent, as the body will be converted to [normalization form C](http://www.unicode.org/reports/tr15/) automatically. The returned body will include this normalization; the service will use the normalized body elsewhere, and does not store the originally submitted body. + When completed, the service will emit a [message sent](events.md#message-sent) event with the message's ID. ### Invalid channel ID -- cgit v1.2.3