diff --git a/spec/TOPICS.md b/spec/TOPICS.md index c29e61705..0489d79df 100644 --- a/spec/TOPICS.md +++ b/spec/TOPICS.md @@ -35,3 +35,35 @@ - **Queue rotation protocol**: Four agent messages (QADD → QKEY → QUSE → QTEST) on top of SMP commands, with asymmetric state machines on receiver side (`RcvSwitchStatus`: 4 states) and sender side (`SndSwitchStatus`: 2 states). Receiver initiates, creates new queue, sends QADD. Sender responds with QKEY. Receiver sends QUSE. Sender sends QTEST to complete. State types in Agent/Protocol.hs, orchestration in Agent.hs, queue creation/deletion in Agent/Client.hs. Protocol spec in agent-protocol.md. The fast variant (v9+ SMP with SKEY) skips the KEY command step. - **Outside-STM lookup pattern**: Multiple modules use the pattern of looking up TVar references outside STM (via readTVarIO/TM.lookupIO), then reading/modifying the TVar contents inside STM. This avoids transaction re-evaluation from unrelated map changes. Used in: Server.hs (serverThread client lookup, tryDeliverMessage subscriber lookup), Env/STM.hs (deleteSubcribedClient), Client/Agent.hs (removeClientAndSubs, reconnectSMPClient). The safety invariant is that the outer map entries (TVars) are never removed — only their contents change. + +- **NTF token lifecycle**: Token registration (TNEW) → verification push → NTConfirmed → TVFY → NTActive, with idempotent re-registration (DH secret check), TRPL (device token replacement reusing DH key), status repair for stuck tokens, and `PPApnsNull` test tokens suppressing stats. The lifecycle spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md) (command handling, verification push delivery), [Store/Postgres.hs](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md) (conditional status updates, duplicate registration cleanup), [Types.hs](modules/Simplex/Messaging/Notifications/Types.md) (NtfTknStatus state machine), and [Env.hs](modules/Simplex/Messaging/Notifications/Server/Env.md) (push client lazy initialization). + +- **NTF push delivery pipeline**: Bounded TBQueue (`pushQ`) creates backpressure → `ntfPush` thread reads → `checkActiveTkn` gates PNMessage (but not PNVerification or PNCheckMessages) → APNS delivery with single retry on connection errors (new push client on retry) → PPTokenInvalid marks token NTInvalid. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md), [APNS.hs](modules/Simplex/Messaging/Notifications/Server/Push/APNS.md) (DER JWT signing, HTTP/2 serializing queue, fire-and-forget connection), [Env.hs](modules/Simplex/Messaging/Notifications/Server/Env.md) (push client caching with race tolerance). + +- **NTF service subscription model**: Service-level subscriptions (SUBS/NSUBS on SMP) vs individual queue subscriptions, with fallback from service to individual when `CAServiceUnavailable`. Service credentials are lazily generated per SMP server with 25h backdating and ~2700yr validity. XOR hash triggers on PostgreSQL maintain subscription aggregate counts. Subscription status tracking uses `ntf_service_assoc` flag to distinguish service-associated from individually-subscribed queues. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md) (subscriber thread, service fallback), [Env.hs](modules/Simplex/Messaging/Notifications/Server/Env.md) (lazy credential generation, Weak ThreadId subscriber cleanup), [Store/Postgres.hs](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md) (XOR hash triggers, batch status updates, cursor-based pagination). + +- **NTF startup resubscription**: `resubscribe` runs as detached `forkIO` (not in `raceAny_` group), uses `mapConcurrently` across SMP servers, each with `subscribeLoop` using 100x database batch multiplier and cursor-based pagination. `ExitCode` exceptions from `exitFailure` on DB error propagate to main thread despite `forkIO`. `getServerNtfSubscriptions` claims subscriptions by batch-updating to `NSPending`. Spans [Server.hs](modules/Simplex/Messaging/Notifications/Server.md), [Store/Postgres.hs](modules/Simplex/Messaging/Notifications/Server/Store/Postgres.md). + +- **XFTP file upload pipeline**: Agent-side encryption (streaming 64KB blocks, fixed-size padding) → chunk size selection (75% threshold algorithm) → per-server chunk creation with ID collision retry (3 attempts) → recipient registration (recursive batching up to `maxRecipients` per FADD) → per-server upload (command + file body in single HTTP/2 streaming request) → file description generation (cross-product: M chunks × R replicas × N recipients → N descriptions). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, description generation), [Client.hs](modules/Simplex/FileTransfer/Client.md) (upload protocol), [Server.hs](modules/Simplex/FileTransfer/Server.md) (quota reservation with rollback, skipCommitted idempotency), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (streaming encryption with embedded header), [Description.hs](modules/Simplex/FileTransfer/Description.md) (validation, first-replica-only digest optimization). + +- **XFTP file download pipeline**: Description parsing (ValidFileDescription validation, YAML or web URI) → per-server chunk download with ephemeral DH key pair per download (forward secrecy) → size and digest verification before decryption → streaming decryption with auth tag verification (output deleted on failure) → redirect resolution (depth-1 chain: decrypt redirect YAML, validate size/digest, download actual file). Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md) (worker orchestration, redirect handling), [Client.hs](modules/Simplex/FileTransfer/Client.md) (ephemeral DH, chunk-proportional timeout), [Client/Main.hs](modules/Simplex/FileTransfer/Client/Main.md) (web URI decoding, parallel download with server grouping), [Crypto.hs](modules/Simplex/FileTransfer/Crypto.md) (dual decrypt paths, auth tag deletion), [Description.hs](modules/Simplex/FileTransfer/Description.md) (redirect file descriptions). + +- **XFTP handshake state machine**: Three-state session-cached handshake (`No entry` → `HandshakeSent` → `HandshakeAccepted`) per HTTP/2 session. Web clients use `xftp-web-hello` header and challenge-response identity proof; native clients use standard ALPN. SNI presence gates CORS headers, web serving, and SESSION error for unrecognized connections. Key reuse on re-hello preserves existing DH keys. Spans [Server.hs](modules/Simplex/FileTransfer/Server.md) (handshake logic, CORS, web serving), [Client.hs](modules/Simplex/FileTransfer/Client.md) (ALPN selection, cert chain validation), [Transport.hs](modules/Simplex/FileTransfer/Transport.md) (block size, version). + +- **XFTP storage lifecycle**: Quota reservation via atomic `stateTVar` before upload → rollback on failure (subtract + delete partial file) → physical file deleted before store cleanup (crash risk: store references missing file) → `RoundedSystemTime 3600` for privacy-preserving expiration timestamps → expiration with configurable throttling (100ms between files) → startup storage reconciliation (override stats from live store). Spans [Server.hs](modules/Simplex/FileTransfer/Server.md), [Server/Store.hs](modules/Simplex/FileTransfer/Server/Store.md), [Server/Env.hs](modules/Simplex/FileTransfer/Server/Env.md), [Server/StoreLog.hs](modules/Simplex/FileTransfer/Server/StoreLog.md) (error-resilient replay, compaction). + +- **XFTP worker architecture**: Five worker types in three categories: rcv (per-server download + local decryption), snd (local prepare/encrypt + per-server upload), del (per-server delete). TMVar-based connection sharing with async retry on temporary errors, permanent error cleanup (put Left + delete from TMap). `withRetryIntervalLimit` caps consecutive retries; exhausted temporary errors silently abandon work cycle (chunk stays pending). `assertAgentForeground` dual check (throw if inactive + wait if backgrounded) gates every chunk operation. Spans [Agent.hs](modules/Simplex/FileTransfer/Agent.md), [Client/Agent.hs](modules/Simplex/FileTransfer/Client/Agent.md). + +- **SessionVar protocol client lifecycle**: Protocol client connections (SMP, NTF, XFTP) use a lazy singleton pattern: `getSessVar` atomically checks TMap → `newProtocolClient` fills TMVar on success/failure → `waitForProtocolClient` reads with timeout. Error caching via `persistErrorInterval` prevents connection storms (failed connections cache the error with expiry; callers receive cached error without reconnecting). `removeSessVar` uses monotonic `sessionVarId` compare-and-swap to prevent stale disconnect callbacks from removing newer clients. SMP has additional complexity: `SMPConnectedClient` wraps client with per-connection proxied relay map, `updateClientService` synchronizes service credentials post-connect, disconnect callback moves subscriptions to pending with session-ID matching. XFTP always uses `NRMBackground` timing regardless of caller request. Spans [Session.md](modules/Simplex/Messaging/Session.md), [Agent/Client.md](modules/Simplex/Messaging/Agent/Client.md) (lifecycle, disconnect callbacks, reconnection workers), [Agent.md](modules/Simplex/Messaging/Agent.md) (subscriber loop consuming events). + +- **Dual-backend agent store**: The agent store (~3700 lines in AgentStore.hs) compiles for both SQLite and PostgreSQL via `#if defined(dbPostgres)` CPP guards. Key behavioral differences: PostgreSQL uses `FOR UPDATE` row locking on reads preceding writes (SQLite relies on single-writer model); PostgreSQL uses `IN ?` with `In` wrapper for batch queries (SQLite falls back to per-row `forM` loops); PostgreSQL uses `constraintViolation` (SQLite checks `SQL.ErrorConstraint`); `createWithRandomId'` uses savepoints on PostgreSQL (failed statement aborts entire transaction without them). One known bug: `checkConfirmedSndQueueExists_` uses `#if defined(dpPostgres)` (typo: `dp` not `db`), so the `FOR UPDATE` clause is never included on any backend. Spans [AgentStore.md](modules/Simplex/Messaging/Agent/Store/AgentStore.md), [SQLite.md](modules/Simplex/Messaging/Agent/Store/SQLite.md). + +- **Deferred message encryption**: Message bodies are NOT encrypted at enqueue time. `enqueueMessageB` advances the ratchet header and validates padding, but stores only the body reference (`sndMsgBodyId`) and encryption key. Actual encryption (`rcEncryptMsg`) happens at delivery time in `runSmpQueueMsgDelivery`. This enables body deduplication via `VRValue`/`VRRef` — identical bodies (common for group messages) share one database row, but each connection's delivery encrypts independently with its own ratchet. Confirmation and ratchet key messages bypass deferred encryption (pre-encrypted at enqueue time). Spans [Agent.md](modules/Simplex/Messaging/Agent.md) (enqueue + delivery), [AgentStore.md](modules/Simplex/Messaging/Agent/Store/AgentStore.md) (`snd_message_bodies` storage). + +- **NTF agent subscription lifecycle**: The agent-side notification subscription system uses a supervisor-worker architecture with three worker pools (NTF server, SMP server, token deletion). `NSCCreate` triggers a four-way partition (`partitionQueueSubActions`): new sub, reset sub (credential mismatch or null action), continue SMP work, continue NTF work. Workers coordinate with the supervisor via `updated_by_supervisor` flag — workers only update local fields when the flag is set, preventing overwrite of supervisor decisions. The null-action sentinel (`workerErrors` sets action to NULL on permanent failure) bridges worker failure recovery to supervisor-driven re-creation. `retrySubActions` uses a shrinking TVar — each iteration only retries subs with temporary errors, so batches get smaller over time. `rescheduleWork` handles time-scheduled health checks by forking a sleep thread that re-signals `doWork`. Spans [NtfSubSupervisor.md](modules/Simplex/Messaging/Agent/NtfSubSupervisor.md) (supervisor, worker pools), [AgentStore.md](modules/Simplex/Messaging/Agent/Store/AgentStore.md) (updated_by_supervisor, null-action sentinel), [Agent/Client.md](modules/Simplex/Messaging/Agent/Client.md) (worker framework). + +- **Session-aware SMP subscription management**: SMP queue subscriptions are tracked per transport session with session-ID validation at multiple points. `subscribeQueues` groups queues by transport session, subscribes concurrently, then validates `activeClientSession` post-RPC — if the client was replaced during the RPC, results are discarded and converted to temporary errors for retry. `removeClientAndSubs` (disconnect cleanup) only demotes subscriptions whose session ID matches the disconnecting client. Batch UP notifications are accumulated across transmissions and deduplicated against already-active subscriptions. When ALL results are temporary errors and no connections were already active, the SMP client is closed to force fresh connection. `maxPending` throttles concurrent pending subscriptions with STM retry backpressure. Spans [Agent/Client.md](modules/Simplex/Messaging/Agent/Client.md) (subscription state, session validation), [Agent.md](modules/Simplex/Messaging/Agent.md) (subscriber loop, processSMPTransmissions, UP accumulation). + +- **Agent message envelope**: Agent messages use a two-layer format — outer `AgentMsgEnvelope` (version + type tag C/M/I/R + payload) and inner `AgentMessage` (after double-ratchet decryption, tags I/D/R/M + AMessage). Tag characters deliberately overlap between layers (disambiguated by context). `AgentInvitation` uses only per-queue E2E encryption (no ratchet established yet); `AgentRatchetKey` uses per-queue E2E (can't use ratchet to renegotiate ratchet); `AgentConfirmation` uses double ratchet. PQ support *shrinks* message size budgets (ratchet header + reply link grow with SNTRUP761 keys). `AEvent` is a GADT indexed by `AEntity` — prevents file events on connection entities at the type level. Spans [Agent/Protocol.md](modules/Simplex/Messaging/Agent/Protocol.md) (types, encoding, size budgets), [Agent.md](modules/Simplex/Messaging/Agent.md) (four e2e key states dispatch, message processing). + +- **Ratchet synchronization protocol**: When the double ratchet gets out of sync (backup restoration, message loss), both parties exchange `AgentRatchetKey` messages with fresh DH keys. Role determination uses hash-ordering: `rkHash(k1, k2)` is computed by both sides — the party with the lower hash initializes the receiving ratchet, the other initializes sending and sends EREADY. This breaks symmetry when both parties simultaneously initiate. State machine: `RSOk`/`RSAllowed`/`RSRequired` → generate keys + reply; `RSStarted` → use stored keys; `RSAgreed` → error (reset to `RSRequired`). EREADY carries `lastExternalSndId` so the peer knows which messages used the old ratchet. `checkRatchetKeyHashExists` prevents processing the same key twice. Successful message decryption resets sync state to `RSOk` (the recovery signal). Spans [Agent.md](modules/Simplex/Messaging/Agent.md) (newRatchetKey, ereadyMsg, resetRatchetSync), [Agent/Protocol.md](modules/Simplex/Messaging/Agent/Protocol.md) (AgentRatchetKey type, cryptoErrToSyncState classification).