From e2cdb46c3f6707c1b01f8827d8ba491469b5679f Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Tue, 8 Apr 2025 19:40:32 -0400 Subject: Heartbeats are part of the event protocol. A heartbeat is an event that the server synthesizes any time an event stream has been idle for longer than some timeout. They allow clients to detect disconnection and network problems, which would otherwise go unnoticed because event streams are a one-way channel. Most network problems only become clear when the offended party tries to _send_ something, and subscribing to an event stream only sends something during the request phase. Technically, Pilcrow has always sent these, since we started using Axum's SSE support: it defaults to sending a dummy event after 15 seconds (consisting of `":\n\n"`, which is then ignored). I've built Pilcrow's heartbeat support out of that, by customizing the event sent back. The results _mostly_ look like existing events, but there are two key differences: * Heartbeats don't have `id` fields in the event stream. They're synthetic, and they don't participate in either the "resume at" sequence management, or the last-event-id header-based resumption management. * Heartbeats have an `event` but no `type` field in the message body. There are no subtypes. To make it less likely that clients will race with the server on expiring timeouts, heartbeats are sent about five seconds early. In this change, heartbeats are due after 20 seconds, but are sent after 15. If it takes longer than five seconds for a heartbeat to arrive, a client can and should treat that as a network problem and reconnect, but I'd really like to avoid that happening over differences smaller than a second, so I've left a margin. I originally sketched this out in conversation with @wlonk as having each event carry a deadline for the next one. I ultimately opted not to do that for a few reasons. First, Axum makes it hard - the built-in keep-alive support only works with a static event, and cannot make dynamic ones whose payloads might vary (for example if the deadline is variable). Second, it's complex, to no apparent gain, and adds deadline information to _every_ event type. This implementation, instead, sends deadline information as part of boot, as a fixed interval in seconds. Clients are responsible for working out deadlines based on message arrivals. This is fine; heartbeat-based connection management is best effort at the best of times, so a few milliseconds of slop in either direction won't hurt anything. The existing client ignores these events entirely, which is convenient. The new heartbeat event type is defined alongside the main event type, to make it less likely that we'll inadvertently make changes to one but not the other. We can still do so advertently, I just don't want it to be an accident. --- docs/api/boot.md | 2 ++ docs/api/events.md | 25 ++++++++++++++++++++----- 2 files changed, 22 insertions(+), 5 deletions(-) (limited to 'docs') diff --git a/docs/api/boot.md b/docs/api/boot.md index 0c2dc08..46b972f 100644 --- a/docs/api/boot.md +++ b/docs/api/boot.md @@ -42,6 +42,7 @@ This endpoint will respond with a status of "id": "U1234abcd" }, "resume_point": 1312, + "heartbeat": 30, "users": [ { "id": "U1234abcd", @@ -72,6 +73,7 @@ The response will include the following fields: |:---------------|:----------------|:-------------------------------------------------------------------------------------------------------------------------| | `user` | object | The details of the caller's identity. | | `resume_point` | integer | A resume point for [events](./events.md), such that the event stream will begin immediately after the included snapshot. | +| `heartbeat` | integer | The [heartbeat timeout](./events.md#heartbeat-events), in seconds, for events. | | `users` | array of object | A snapshot of the users present in the service. | | `channels` | array of object | A snapshot of the channels present in the service. | | `messages` | array of object | A snapshot of the messages present in the service. | diff --git a/docs/api/events.md b/docs/api/events.md index 3347a26..7fc7d78 100644 --- a/docs/api/events.md +++ b/docs/api/events.md @@ -86,12 +86,27 @@ The service may terminate the connection at any time. Clients should reconnect a Each event's `data` consists of a JSON object describing one event. Every event includes the following fields: -| Field | Type | Description | -|:--------|:-------|:-------------------------------------------------------------------------------------------------------------| -| `type` | string | The type of entity the event describes. Will be one of the types listed in the next section. | -| `event` | string | The specific kind of event. Will be one of the events listed with the associated `type` in the next section. | +| Field | Type | Description | +|:--------|:-----------------|:-------------------------------------------------------------------------------------------------------------| +| `type` | string | The type of entity the event describes. Will be one of the types listed in the next section. | +| `event` | string, optional | The specific kind of event. Will be one of the events listed with the associated `type` in the next section. | -The remaining fields depend on the `type` and `event` field. +The remaining fields depend on the `type` and (if present) the `event` field. + + +## Heartbeat events + +```json +{ + "type": "heartbeat" +} +``` + +To help clients detect network interruptions, the service guarantees that it will deliver an event after a fixed interval called the "heartbeat interval." The specific interval length is given in seconds as part of the [boot response](./boot.md). If the service determines that the heartbeat interval is close to expiring, it will synthesize and deliver a heartbeat event. + +Clients should treat any period of time without events, longer than the heartbeat interval, as an indication that the event stream may have been interrupted. Clients may also use other techniques, such as [browser APIs](https://developer.mozilla.org/en-US/docs/Web/API/EventSource/error_event), to detect this condition and restart the connection. + +These events have the `type` field set to `"heartbeat"`. The `event` field is absent. ## User events -- cgit v1.2.3