Files
simplexmq/spec/TOPICS.md
Evgeny @ SimpleX Chat 1cc4d98dd0 terms 2
2026-03-13 17:56:14 +00:00

23 KiB
Raw Blame History

Topic Candidates

Cross-cutting patterns noticed during module documentation. Each entry may become a topic doc in spec/ after all module docs are complete.

  • Exception handling strategy: catchOwn/catchAll/tryAllErrors pattern (defined in Util.hs) used across router, client, and agent modules. The three-category classification (synchronous, own-async, cancellation) and when to use which catch variant is not obvious from any single call site.

  • Padding schemes: Three different padding formats across the codebase — Crypto.hs uses 2-byte Word16 length prefix (max ~65KB), Crypto/Lazy.hs uses 8-byte Int64 prefix (file-sized), and both use '#' fill character. Ratchet header padding uses fixed sizes (88 or 2310 bytes). All use pad/unPad but with incompatible formats. The relationship between padding, encryption, and message size limits spans Crypto, Lazy, Ratchet, and the protocol layer.

  • NaCl construction variants: crypto_box, secret_box, and KEM hybrid secret all use the same XSalsa20+Poly1305 core (Crypto.hs xSalsa20), but with different key sources (DH, symmetric, SHA3_256(DH||KEM)). The lazy streaming variant (Lazy.hs) adds prepend-tag vs tail-tag placement. File.hs wraps lazy streaming with handle-based I/O. Full picture requires reading Crypto.hs, Lazy.hs, File.hs, and SNTRUP761.hs together.

  • Transport encryption layering: Three encryption layers overlap — TLS (Transport.hs), optional block encryption via sbcHkdf chains (Transport.hs tPutBlock/tGetBlock), and SMP protocol-level encryption. Block encryption is disabled for proxy connections (already encrypted), and absent for NTF protocol. The interaction of these layers with proxy version downgrade logic spans Transport.hs, Client.hs, and the SMP proxy module.

  • Certificate chain trust model: ChainCertificates (Shared.hs) defines 04 cert chain semantics, used by both Client.hs (validateCertificateChain) and Server.hs (validateClientCertificate, SNI credential switching). The 4-length case skipping index 2 (operator cert) and the FQHN-disabled x509validate are decisions that span the entire transport security model.

  • SMP proxy protocol flow: The PRXY/PFWD/RFWD proxy protocol involves Client.hs (proxySMPCommand with 10 error scenarios, forwardSMPTransmission with sessionSecret encryption), Protocol.hs (command types, version-dependent encoding), Transport.hs (proxiedSMPRelayVersion cap, proxyServer flag disabling block encryption). The double encryption (client-relay via PFWD + proxy-relay via RFWD), combined timeout (tcpConnect + tcpTimeout), nonce/reverseNonce pairing, and version downgrade logic are not visible from any single module.

  • Service certificate subscription model: Service subscriptions (SUBS/NSUBS) and per-queue subscriptions (SUB/NSUB) coexist with complex state transitions. Client/Agent.hs manages dual active/pending subscription maps with session-aware cleanup. Protocol.hs defines useServiceAuth (only NEW/SUB/NSUB). Client.hs implements authTransmission with dual signing (entity key over cert hash + transmission, service key over transmission only). Transport.hs handles the service certificate handshake extension (v16+). The full subscription lifecycle — from DBService credentials through handshake to service subscription to disconnect/reconnect — spans all four modules.

  • Two agent layers: Client/Agent.hs ("small agent") is used only in routers — SMP proxy and notification router — to manage client connections to other SMP routers. Agent.hs + Agent/Client.hs ("big agent") is used in client applications. Both manage SMP client connections with subscription tracking and reconnection, but the big agent adds the full messaging agent layer (connections, double ratchet, file transfer). When documenting Agent/Client.hs, Client/Agent.hs should be reviewed for shared patterns and differences.

  • Handshake protocol family: SMP (Transport.hs), NTF (Notifications/Transport.hs), and XFTP (FileTransfer/Transport.hs) all have handshake protocols with the same structure (version negotiation + session binding + key exchange) but different feature sets. NTF is a strict subset. XFTP doesn't use the TLS handshake at all (HTTP2 layer). The shared types (THandle, THandleParams, THandleAuth) mean changes to the handshake infrastructure affect all three protocols.

  • Router subscription architecture: The SMP router's subscription model spans Server.hs (serverThread split-STM lifecycle, tryDeliverMessage sync/async, ProhibitSub/ServerSub state machine), Env/STM.hs (SubscribedClients TVar-of-Maybe continuity, Client three-queue architecture), and Client/Agent.hs (small agent dual subscription model). The interaction between service subscriptions, direct queue subscriptions, notification subscriptions, and the serverThread subQ processing is not visible from any single module.

  • Duplex connection handshake: The SMP duplex connection procedure (standard 10-step and fast 7-step) spans Agent.hs (orchestration, state machine), Agent/Protocol.hs (message types: AgentConfirmation/AgentConnInfoReply/AgentInvitation/HELLO, queue status types), Client.hs (SMP command dispatch), Protocol.hs (SMP-level KEY/SKEY commands). The handshake involves two-layer encryption (per-queue E2E + double ratchet), version-dependent paths (v2+ duplex, v6+ sender auth key, v7+ ratchet on confirmation, v9+ fast handshake with SKEY), and the asymmetry between initiating and accepting parties (different message types, different confirmation processing). The protocol spec (agent-protocol.md) defines the procedure but the implementation details — error handling, state persistence across restarts, race conditions between confirmation and message delivery — are only visible by reading the code across these modules.

  • Connection links: Full connection links (URI format with #/? query parameters) and binary-encoded links (Encoding instances) serve different contexts — URIs for out-of-band sharing, binary for agent-to-agent messages. Each has independent version-conditional encoding with different backward-compat rules (URI parser adjusts agent version ranges for old contact links, binary parser patches queueMode for forward compat). The VersionI/VersionRangeI typeclasses convert between SMPQueueInfo (versioned, in confirmations) and SMPQueueUri (version-ranged, in links). Full picture requires Agent/Protocol.hs, Protocol.hs, and agent-protocol.md.

  • Short links: Short links are a compact representation for sharing via URLs, not a replacement for full connection links — both are used. Short links store encrypted link data on the router and encode only a router hostname, link type character, and key hash in the URL. The link data lifecycle (creation, encryption with key derivation, owner chain-of-trust validation, mutable user data updates) spans Agent/Protocol.hs (types, serialization, owner validation, router shortening/restoration), Agent.hs (link creation and resolution API), and the router-side link storage. The FixedLinkData/ConnLinkData split (immutable vs mutable), OwnerAuth chain validation, and PreparedLinkParams pre-computation are not visible from any single module.

  • Agent worker framework: getAgentWorker (lifecycle, restart rate limiting, crash recovery) + withWork/withWork_/withWorkItems (task retrieval with doWork flag atomics) defined in Agent/Client.hs, consumed by Agent.hs (async commands, message delivery), NtfSubSupervisor.hs (notification workers), FileTransfer/Agent.hs (XFTP workers), and simplex-chat. The framework separates two concerns: worker lifecycle (create-or-reuse, fork async, rate-limit restarts, escalate to CRITICAL) and task pattern (get next task, do task, as separate parameters). The doWork TMVar flag choreography (clear before query to prevent race) and the work-item-error vs store-error distinction are not obvious from any single consumer.

  • Agent operation suspension: Five AgentOpState TVars (RcvNetwork, MsgDelivery, SndNetwork, Database, NtfNetwork) with a cascade ordering: ending RcvNetwork suspends MsgDelivery, ending MsgDelivery suspends SndNetwork + Database, ending SndNetwork suspends Database. beginAgentOperation retries if suspended, endAgentOperation decrements and cascades. All DB access goes through withStore which brackets with AODatabase. This ensures graceful shutdown propagates through dependent operations. Defined in Agent/Client.hs, used by Agent.hs subscriber and worker loops.

  • Queue rotation protocol: Four agent messages (QADD → QKEY → QUSE → QTEST) on top of SMP commands, with asymmetric state machines on receiver side (RcvSwitchStatus: 4 states) and sender side (SndSwitchStatus: 2 states). Receiver initiates, creates new queue, sends QADD. Sender responds with QKEY. Receiver sends QUSE. Sender sends QTEST to complete. State types in Agent/Protocol.hs, orchestration in Agent.hs, queue creation/deletion in Agent/Client.hs. Protocol spec in agent-protocol.md. The fast variant (v9+ SMP with SKEY) skips the KEY command step.

  • Outside-STM lookup pattern: Multiple modules use the pattern of looking up TVar references outside STM (via readTVarIO/TM.lookupIO), then reading/modifying the TVar contents inside STM. This avoids transaction re-evaluation from unrelated map changes. Used in: Server.hs (serverThread client lookup, tryDeliverMessage subscriber lookup), Env/STM.hs (deleteSubcribedClient), Client/Agent.hs (removeClientAndSubs, reconnectSMPClient). The safety invariant is that the outer map entries (TVars) are never removed — only their contents change.

  • NTF token lifecycle: Token registration (TNEW) → verification push → NTConfirmed → TVFY → NTActive, with idempotent re-registration (DH secret check), TRPL (device token replacement reusing DH key), status repair for stuck tokens, and PPApnsNull test tokens suppressing stats. The lifecycle spans Server.hs (command handling, verification push delivery), Store/Postgres.hs (conditional status updates, duplicate registration cleanup), Types.hs (NtfTknStatus state machine), and Env.hs (push client lazy initialization).

  • NTF push delivery pipeline: Bounded TBQueue (pushQ) creates backpressure → ntfPush thread reads → checkActiveTkn gates PNMessage (but not PNVerification or PNCheckMessages) → APNS delivery with single retry on connection errors (new push client on retry) → PPTokenInvalid marks token NTInvalid. Spans Server.hs, APNS.hs (DER JWT signing, HTTP/2 serializing queue, fire-and-forget connection), Env.hs (push client caching with race tolerance).

  • NTF service subscription model: Service-level subscriptions (SUBS/NSUBS on SMP) vs individual queue subscriptions, with fallback from service to individual when CAServiceUnavailable. Service credentials are lazily generated per SMP router with 25h backdating and ~2700yr validity. XOR hash triggers on PostgreSQL maintain subscription aggregate counts. Subscription status tracking uses ntf_service_assoc flag to distinguish service-associated from individually-subscribed queues. Spans Server.hs (subscriber thread, service fallback), Env.hs (lazy credential generation, Weak ThreadId subscriber cleanup), Store/Postgres.hs (XOR hash triggers, batch status updates, cursor-based pagination).

  • NTF startup resubscription: resubscribe runs as detached forkIO (not in raceAny_ group), uses mapConcurrently across SMP routers, each with subscribeLoop using 100x database batch multiplier and cursor-based pagination. ExitCode exceptions from exitFailure on DB error propagate to main thread despite forkIO. getServerNtfSubscriptions claims subscriptions by batch-updating to NSPending. Spans Server.hs, Store/Postgres.hs.

  • XFTP file upload pipeline: Agent-side encryption (streaming 64KB blocks, fixed-size padding) → chunk size selection (75% threshold algorithm) → per-router data packet creation with ID collision retry (3 attempts) → recipient registration (recursive batching up to maxRecipients per FADD) → per-router data packet upload (command + data in single HTTP/2 streaming request) → file description generation (cross-product: M chunks × R replicas × N recipients → N descriptions). Spans Agent.hs (worker orchestration, description generation), Client.hs (upload protocol), Server.hs (quota reservation with rollback, skipCommitted idempotency), Crypto.hs (streaming encryption with embedded header), Description.hs (validation, first-replica-only digest optimization).

  • XFTP file download pipeline: Description parsing (ValidFileDescription validation, YAML or web URI) → per-router data packet download with ephemeral DH key pair per download (forward secrecy) → size and digest verification before decryption → streaming decryption with auth tag verification (output deleted on failure) → redirect resolution (depth-1 chain: decrypt redirect YAML, validate size/digest, download actual file). Spans Agent.hs (worker orchestration, redirect handling), Client.hs (ephemeral DH, size-proportional timeout), Client/Main.hs (web URI decoding, parallel download with router grouping), Crypto.hs (dual decrypt paths, auth tag deletion), Description.hs (redirect file descriptions).

  • XFTP handshake state machine: Three-state session-cached handshake (No entryHandshakeSentHandshakeAccepted) per HTTP/2 session. Web clients use xftp-web-hello header and challenge-response identity proof; native clients use standard ALPN. SNI presence gates CORS headers, web serving, and SESSION error for unrecognized connections. Key reuse on re-hello preserves existing DH keys. Spans Server.hs (handshake logic, CORS, web serving), Client.hs (ALPN selection, cert chain validation), Transport.hs (block size, version).

  • XFTP storage lifecycle: Quota reservation via atomic stateTVar before upload → rollback on failure (subtract + delete partial data packet) → stored data packet deleted before store cleanup (crash risk: store references missing data packet) → RoundedSystemTime 3600 for privacy-preserving expiration timestamps → expiration with configurable throttling (100ms between data packets) → startup storage reconciliation (override stats from live store). Spans Server.hs, Server/Store.hs, Server/Env.hs, Server/StoreLog.hs (error-resilient replay, compaction).

  • XFTP worker architecture: Five worker types in three categories: rcv (per-router data packet download + local decryption), snd (local prepare/encrypt + per-router data packet upload), del (per-router data packet delete). TMVar-based connection sharing with async retry on temporary errors, permanent error cleanup (put Left + delete from TMap). withRetryIntervalLimit caps consecutive retries; exhausted temporary errors silently abandon work cycle (chunk stays pending). assertAgentForeground dual check (throw if inactive + wait if backgrounded) gates every data packet operation. Spans Agent.hs, Client/Agent.hs.

  • SessionVar protocol client lifecycle: Protocol client connections (SMP, NTF, XFTP) use a lazy singleton pattern: getSessVar atomically checks TMap → newProtocolClient fills TMVar on success/failure → waitForProtocolClient reads with timeout. Error caching via persistErrorInterval prevents connection storms (failed connections cache the error with expiry; callers receive cached error without reconnecting). removeSessVar uses monotonic sessionVarId compare-and-swap to prevent stale disconnect callbacks from removing newer clients. SMP has additional complexity: SMPConnectedClient wraps client with per-connection proxied relay map, updateClientService synchronizes service credentials post-connect, disconnect callback moves subscriptions to pending with session-ID matching. XFTP always uses NRMBackground timing regardless of caller request. Spans Session.md, Agent/Client.md (lifecycle, disconnect callbacks, reconnection workers), Agent.md (subscriber loop consuming events).

  • Dual-backend agent store: The agent store (~3700 lines in AgentStore.hs) compiles for both SQLite and PostgreSQL via #if defined(dbPostgres) CPP guards. Key behavioral differences: PostgreSQL uses FOR UPDATE row locking on reads preceding writes (SQLite relies on single-writer model); PostgreSQL uses IN ? with In wrapper for batch queries (SQLite falls back to per-row forM loops); PostgreSQL uses constraintViolation (SQLite checks SQL.ErrorConstraint); createWithRandomId' uses savepoints on PostgreSQL (failed statement aborts entire transaction without them). One known bug: checkConfirmedSndQueueExists_ uses #if defined(dpPostgres) (typo: dp not db), so the FOR UPDATE clause is never included on any backend. Spans AgentStore.md, SQLite.md.

  • Deferred message encryption: Message bodies are NOT encrypted at enqueue time. enqueueMessageB advances the ratchet header and validates padding, but stores only the body reference (sndMsgBodyId) and encryption key. Actual encryption (rcEncryptMsg) happens at delivery time in runSmpQueueMsgDelivery. This enables body deduplication via VRValue/VRRef — identical bodies (common for group messages) share one database row, but each connection's delivery encrypts independently with its own ratchet. Confirmation and ratchet key messages bypass deferred encryption (pre-encrypted at enqueue time). Spans Agent.md (enqueue + delivery), AgentStore.md (snd_message_bodies storage).

  • NTF agent subscription lifecycle: The agent-side notification subscription system uses a supervisor-worker architecture with three worker pools (NTF router, SMP router, token deletion). NSCCreate triggers a four-way partition (partitionQueueSubActions): new sub, reset sub (credential mismatch or null action), continue SMP work, continue NTF work. Workers coordinate with the supervisor via updated_by_supervisor flag — workers only update local fields when the flag is set, preventing overwrite of supervisor decisions. The null-action sentinel (workerErrors sets action to NULL on permanent failure) bridges worker failure recovery to supervisor-driven re-creation. retrySubActions uses a shrinking TVar — each iteration only retries subs with temporary errors, so batches get smaller over time. rescheduleWork handles time-scheduled health checks by forking a sleep thread that re-signals doWork. Spans NtfSubSupervisor.md (supervisor, worker pools), AgentStore.md (updated_by_supervisor, null-action sentinel), Agent/Client.md (worker framework).

  • Session-aware SMP subscription management: SMP queue subscriptions are tracked per transport session with session-ID validation at multiple points. subscribeQueues groups queues by transport session, subscribes concurrently, then validates activeClientSession post-RPC — if the client was replaced during the RPC, results are discarded and converted to temporary errors for retry. removeClientAndSubs (disconnect cleanup) only demotes subscriptions whose session ID matches the disconnecting client. Batch UP notifications are accumulated across transmissions and deduplicated against already-active subscriptions. When ALL results are temporary errors and no connections were already active, the SMP client is closed to force fresh connection. maxPending throttles concurrent pending subscriptions with STM retry backpressure. Spans Agent/Client.md (subscription state, session validation), Agent.md (subscriber loop, processSMPTransmissions, UP accumulation).

  • Agent message envelope: Agent messages use a two-layer format — outer AgentMsgEnvelope (version + type tag C/M/I/R + payload) and inner AgentMessage (after double-ratchet decryption, tags I/D/R/M + AMessage). Tag characters deliberately overlap between layers (disambiguated by context). AgentInvitation uses only per-queue E2E encryption (no ratchet established yet); AgentRatchetKey uses per-queue E2E (can't use ratchet to renegotiate ratchet); AgentConfirmation uses double ratchet. PQ support shrinks message size budgets (ratchet header + reply link grow with SNTRUP761 keys). AEvent is a GADT indexed by AEntity — prevents file events on connection entities at the type level. Spans Agent/Protocol.md (types, encoding, size budgets), Agent.md (four e2e key states dispatch, message processing).

  • Ratchet synchronization protocol: When the double ratchet gets out of sync (backup restoration, message loss), both parties exchange AgentRatchetKey messages with fresh DH keys. Role determination uses hash-ordering: rkHash(k1, k2) is computed by both sides — the party with the lower hash initializes the receiving ratchet, the other initializes sending and sends EREADY. This breaks symmetry when both parties simultaneously initiate. State machine: RSOk/RSAllowed/RSRequired → generate keys + reply; RSStarted → use stored keys; RSAgreed → error (reset to RSRequired). EREADY carries lastExternalSndId so the peer knows which messages used the old ratchet. checkRatchetKeyHashExists prevents processing the same key twice. Successful message decryption resets sync state to RSOk (the recovery signal). Spans Agent.md (newRatchetKey, ereadyMsg, resetRatchetSync), Agent/Protocol.md (AgentRatchetKey type, cryptoErrToSyncState classification).