diff --git a/spec/TOPICS.md b/spec/TOPICS.md index 977107652..6ef029e96 100644 --- a/spec/TOPICS.md +++ b/spec/TOPICS.md @@ -19,3 +19,7 @@ - **Two agent layers**: Client/Agent.hs ("small agent") is used only in servers — SMP proxy and notification server — to manage client connections to other SMP servers. Agent.hs + Agent/Client.hs ("big agent") is used in client applications. Both manage SMP client connections with subscription tracking and reconnection, but the big agent adds the full messaging agent layer (connections, double ratchet, file transfer). When documenting Agent/Client.hs, Client/Agent.hs should be reviewed for shared patterns and differences. - **Handshake protocol family**: SMP (Transport.hs), NTF (Notifications/Transport.hs), and XFTP (FileTransfer/Transport.hs) all have handshake protocols with the same structure (version negotiation + session binding + key exchange) but different feature sets. NTF is a strict subset. XFTP doesn't use the TLS handshake at all (HTTP2 layer). The shared types (THandle, THandleParams, THandleAuth) mean changes to the handshake infrastructure affect all three protocols. + +- **Server subscription architecture**: The SMP server's subscription model spans Server.hs (serverThread split-STM lifecycle, tryDeliverMessage sync/async, ProhibitSub/ServerSub state machine), Env/STM.hs (SubscribedClients TVar-of-Maybe continuity, Client three-queue architecture), and Client/Agent.hs (small agent dual subscription model). The interaction between service subscriptions, direct queue subscriptions, notification subscriptions, and the serverThread subQ processing is not visible from any single module. + +- **Outside-STM lookup pattern**: Multiple modules use the pattern of looking up TVar references outside STM (via readTVarIO/TM.lookupIO), then reading/modifying the TVar contents inside STM. This avoids transaction re-evaluation from unrelated map changes. Used in: Server.hs (serverThread client lookup, tryDeliverMessage subscriber lookup), Env/STM.hs (deleteSubcribedClient), Client/Agent.hs (removeClientAndSubs, reconnectSMPClient). The safety invariant is that the outer map entries (TVars) are never removed — only their contents change. diff --git a/spec/modules/Simplex/Messaging/Server.md b/spec/modules/Simplex/Messaging/Server.md new file mode 100644 index 000000000..0ed6e43e1 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server.md @@ -0,0 +1,106 @@ +# Simplex.Messaging.Server + +> SMP server: client handling, subscription lifecycle, message delivery, proxy forwarding, control port. + +**Source**: [`Server.hs`](../../../../src/Simplex/Messaging/Server.hs) + +**Protocol spec**: [`protocol/simplex-messaging.md`](../../../../protocol/simplex-messaging.md) — SimpleX Messaging Protocol. + +## Overview + +The server runs as `raceAny_` over many threads — any thread exit stops the entire server. The thread set includes: one `serverThread` per subscription type (SMP, NTF), a notification delivery thread, a pending events thread, a proxy agent receiver, a SIGINT handler, plus per-transport listener threads and optional expiration/stats/prometheus/control-port threads. `E.finally` ensures `stopServer` runs on any exit. + +## serverThread — subscription lifecycle with split STM + +See comment on `serverThread`. It reads the subscription request from `subQ`, then looks up the client **outside** STM (via `getServerClient`), then enters an STM transaction (`updateSubscribers`) to compute which old subscriptions to end, then runs `endPreviousSubscriptions` in IO. If the client disconnects between lookup and transaction, `updateSubscribers` handles `Nothing` by still sending END/DELD to other subscribed clients. + +`checkAnotherClient` ensures END messages are only sent to clients **other than** the subscribing client — if `clntId == clientId`, the action is skipped. + +`removeWhenNoSubs` removes a client from `subClients` only when **both** queue and service subscriptions are empty — not after each individual unsubscription. + +## SubscribedClients — TVar-of-Maybe pattern + +See comment on `SubscribedClients` in Env/STM.hs. Subscription entries store `TVar (Maybe (Client s))` — the TVar's contents change between `Just client` and `Nothing` on disconnect/reconnect, allowing STM transactions reading the TVar to automatically re-evaluate when the subscriber changes. Entries **are** removed via `lookupDeleteSubscribedClient` (when subscriptions end) and `deleteSubcribedClient` (on client disconnect), though the source comment describes the original intent of never cleaning them up. + +`upsertSubscribedClient` returns the previously subscribed client only if it's a **different** client (checked via `sameClientId`). Same client → returns `Nothing` (no END needed). + +## ProhibitSub / ServerSub state machine + +`Sub.subThread` is either `ProhibitSub` or `ServerSub (TVar SubscriptionThread)`. GET creates `ProhibitSub`, preventing subsequent SUB on the same queue (`CMD PROHIBITED`). SUB creates `ServerSub NoSub`, preventing subsequent GET (`CMD PROHIBITED`). This is enforced per-connection — the state tracks which access pattern the client chose. + +`SubscriptionThread` transitions: `NoSub` → `SubPending` (sndQ full during delivery) → `SubThread (Weak ThreadId)` (delivery thread spawned). `SubPending` is set **before** the thread is spawned; the thread atomically upgrades to `SubThread` after forking. If the thread exits before upgrading, the `modifyTVar'` is a no-op (checks for `SubPending` specifically). + +## tryDeliverMessage — sync/async split delivery + +See comment on `tryDeliverMessage`. When a SEND arrives and the queue was empty: + +1. Look up subscribed client **outside STM** (avoids transaction cost when no subscriber exists) +2. In STM: check `delivered` is Nothing, check sndQ not full → deliver synchronously, return Nothing +3. If sndQ is full: set `SubPending`, return the client/sub/stateVar triple +4. Fork a delivery thread that waits for sndQ space, verifies `sameClient` (prevents delivery to reconnected client), then delivers and sets state to `NoSub` + +The `sameClient` check inside the delivery thread prevents a race: if the client reconnected between fork and delivery, the new client will receive the message via its own SUB. + +`newServiceDeliverySub` creates a transient subscription **only** for service-associated queues during message delivery — this is separate from the SUB-created subscriptions. + +## Constant-time authorization — dummy keys + +See comment on `verifyCmdAuthorization`. Always runs verification regardless of whether the queue exists. When the queue is missing (AUTH error), `dummyVerifyCmd` runs verification with hardcoded dummy keys (Ed25519, Ed448, X25519) and the result is discarded via `seq`. The time depends only on the authorization type provided, not on queue existence. This mitigates timing side-channel attacks that could reveal whether a queue ID exists. + +When the signature algorithm doesn't match the queue key, verification runs with a dummy key of the **provided** algorithm and the result is forced then discarded (`seq False`). + +## Service subscription — hash-based drift detection + +See comment on `sharedSubscribeService`. The client sends expected `(count, idsHash)`. The server reads the actual values from storage, then computes `subsChange = subtractServiceSubs currSubs subs'` — the **difference** between what the client's session currently tracks and the new values. This difference (not the absolute values) is passed to `serverThread` via `CSService` to adjust `totalServiceSubs`. Using differences prevents double-counting when a service resubscribes. + +Stats classification: exactly one of `srvSubOk`/`srvSubMore`/`srvSubFewer`/`srvSubDiff` is incremented per subscription. `count == -1` is a special case for old NTF servers. + +## Proxy forwarding — single transmission, no service identity + +See comment on `processForwardedCommand`. Only single forwarded transmissions are allowed — batches are rejected with `BLOCK`. The synthetic `THandleAuth` has `peerClientService = Nothing`, preventing forwarded clients from claiming service identity. Only SEND, SKEY, LKEY, and LGET are allowed through `rejectOrVerify`. + +Double encryption: response is encrypted first to the client (with `C.cbEncrypt` using `reverseNonce clientNonce`), then wrapped and encrypted to the proxy (with `C.cbEncryptNoPad` using `reverseNonce proxyNonce`). Using reversed nonces ensures request and response directions use distinct nonces. + +## Proxy concurrency limiter + +See `wait`/`signal` around `forkProxiedCmd`. `procThreads` TVar implements a counting semaphore via STM `retry`. When `used >= serverClientConcurrency`, the transaction retries until another thread finishes. No bound on wait time — under sustained proxy load, commands queue indefinitely. + +## sendPendingEvtsThread — atomic swap + +`swapTVar pendingEvents IM.empty` atomically takes all pending events and clears the map. Events accumulated during processing are captured in the next interval. `tryWriteTBQueue` is tried first (non-blocking); if the sndQ is full, a forked thread does the blocking write. This prevents the pending events thread from stalling on one slow client. + +## deliverNtfsThread — throwSTM for control flow + +See `withSubscribed`. When a service client unsubscribes between the TVar read and the flush, `throwSTM (userError "service unsubscribed")` aborts the STM transaction. This is caught by `tryAny` and logged as "cancelled" — it's a successful path, not an error. The `flushSubscribedNtfs` function also cancels via `throwSTM` if the client is no longer current or sndQ is full. + +## Batch subscription responses — SOK grouped with MSG + +See comment on `processSubBatch`. When batched SUB commands produce SOK responses plus messages, the first message is appended to the SOK batch (up to 4 SOKs per block) in a single transmission. Remaining messages go to `msgQ` for separate delivery. This ensures the client receives at least one message quickly with its subscription acknowledgments. + +## send thread — MVar fair lock + +The TLS handle is wrapped in an `MVar` (`newMVar h`). Both `send` (command responses from `sndQ`) and `sendMsg` (messages from `msgQ`) acquire this lock via `withMVar`. This ensures fair interleaving between response batches and individual messages, preventing either from starving the other. + +## Queue creation — ID oracle prevention + +See comment on queue creation with client-supplied IDs. When `clntIds = True`, the ID must equal `B.take 24 (C.sha3_384 (bs corrId))`. This prevents ID oracle attacks where an attacker could probe for queue existence by attempting to create a queue with a specific ID and observing DUPLICATE vs AUTH errors. + +## disconnectTransport — subscription-aware idle timeout + +See `noSubscriptions`. The idle client disconnect thread only checks expiration when the client has **no** subscriptions (not in `subClients` for either SMP or NTF subscribers). Subscribed clients are kept alive indefinitely regardless of inactivity — they're waiting for messages, not idle. + +## clientDisconnected — ordered cleanup + +On disconnect: (1) set `connected = False`, (2) atomically swap out all subscriptions, (3) cancel subscription threads, (4) if server is still active: delete client from server map, update queue and service subscribers. Service subscription cleanup (`updateServiceSubs`) subtracts the client's accumulated `(count, idsHash)` from `totalServiceSubs`. End threads are swapped out and killed. + +## Control port — single auth, no downgrade + +See `controlPortAuth`. Role is set on first `CPAuth` only (`CPRNone` case). Subsequent AUTH commands print the current role but do not change it — the message says "start new session to change." This prevents role downgrade attacks within a session. + +## withQueue_ — updatedAt time check + +Every queue command calls `withQueue_` which checks if `updatedAt` matches today's date. If not, `updateQueueTime` is called to update it. This means `updatedAt` is a daily resolution activity timestamp, not a per-command timestamp. The SEND path passes `queueNotBlocked = False` to still update the time even for blocked queues (though SEND fails on blocked queues separately). + +## foldrM in client command processing + +`foldrM process ([], [])` processes a batch of verified commands right-to-left, accumulating responses and messages. The responses list is built with `(:)`, so the final order matches the original command order. Messages from SUB are collected separately and passed as the second element of the `sndQ` tuple. diff --git a/spec/modules/Simplex/Messaging/Server/CLI.md b/spec/modules/Simplex/Messaging/Server/CLI.md new file mode 100644 index 000000000..a369b6981 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/CLI.md @@ -0,0 +1,31 @@ +# Simplex.Messaging.Server.CLI + +> CLI argument parsing, INI configuration reading, X.509 certificate generation, and utility functions. + +**Source**: [`CLI.hs`](../../../../../src/Simplex/Messaging/Server/CLI.hs) + +## strictIni / iniOnOff — error semantics + +`strictIni` calls `error` on missing INI keys — no structured error, no recovery. `readStrictIni` chains this with `read`, so both "key missing" and "key present but unparseable" produce exceptions indistinguishable by callers. + +`iniOnOff` returns `Maybe Bool`: "on" → `Just True`, "off" → `Just False`, missing key → `Nothing`, any other value → `error` (not a parse failure). This tri-valued logic drives the implicit-default pattern in [Main.md](./Main.md#restore_messages--implicit-default-propagation). + +## iniTransports — port reuse prevention + +SMP ports are parsed first. When explicit WebSocket ports are provided, they are filtered to exclude already-used SMP ports (`ports ws \\ smpPorts`). However, when "websockets" is "on" with no explicit port, it defaults to `["80"]` without filtering against SMP ports. This means if SMP is also on port 80, the default WebSocket configuration would conflict. + +## iniDBOptions — schema creation disabled at CLI + +When reading database options from INI, `createSchema` is always set to `False` regardless of INI content. This enforces a security invariant: database schemas must be created manually or by migration, never automatically by the server. + +## createServerX509_ — external tool dependency + +Certificate generation shells out to `openssl` commands via `readCreateProcess`, which throws `IOError` on non-zero exit codes. Failures are thus detected but propagate as uncaught exceptions — no structured error handling wraps the certificate generation sequence. + +## checkSavedFingerprint — startup invariant + +Fingerprint is extracted from the CA certificate and saved during init. On every server start, the saved fingerprint is compared against the current certificate. Mismatch → startup failure. See [Main.md#initializeserver--fingerprint-invariant](./Main.md#initializeserver--fingerprint-invariant). + +## genOnline — existing certificate dependency + +When `signAlgorithm_` or `commonName_` are not provided, `genOnline` reads them from the existing certificate. This creates a hidden dependency on current certificate state that's not visible from the function signature. Expects exactly one certificate in the PEM file. diff --git a/spec/modules/Simplex/Messaging/Server/Control.md b/spec/modules/Simplex/Messaging/Server/Control.md new file mode 100644 index 000000000..644fb786a --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Control.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.Control + +> Control port protocol types and encoding for server administration. + +**Source**: [`Control.hs`](../../../../../src/Simplex/Messaging/Server/Control.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/Env/STM.md b/spec/modules/Simplex/Messaging/Server/Env/STM.md new file mode 100644 index 000000000..d0e948120 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Env/STM.md @@ -0,0 +1,47 @@ +# Simplex.Messaging.Server.Env.STM + +> Server environment, configuration, client state, subscription types, and storage initialization. + +**Source**: [`Env/STM.hs`](../../../../../../src/Simplex/Messaging/Server/Env/STM.hs) + +## Overview + +This module defines the server's shared state (`Env`, `Server`, `Client`) and the subscription model types. Most non-obvious patterns are about concurrency safety — preventing STM contention while maintaining consistency. Key patterns are documented in [Server.md](../Server.md) where they're used; this doc covers patterns specific to the type definitions and initialization. + +## SubscribedClients — TVar-of-Maybe pattern + +See comment on `SubscribedClients`. Entries store `TVar (Maybe (Client s))` rather than the client directly. Three implications: + +1. STM transactions reading the TVar automatically re-evaluate when the subscriber changes (disconnect/reconnect) +2. IO lookups via `TM.lookupIO` can be done outside STM safely (the TVar reference itself is stable while it exists) +3. Reconnecting clients can reuse existing subscription slots without map-level contention + +Note: despite the source comment saying subscriptions "are not removed," the code does remove entries via `lookupDeleteSubscribedClient` (when subscriptions end) and `deleteSubcribedClient` (on client disconnect). The comment reflects the original design intent for mobile client continuity, but the current implementation does clean up. + +See also [Server.md#subscribedclients--tvar-of-maybe-pattern](../Server.md#subscribedclients--tvar-of-maybe-pattern). + +## deleteSubcribedClient — split transaction for contention avoidance + +See comment on `deleteSubcribedClient`. The TVar lookup is in a separate IO read from the client comparison and deletion. This is safe because the client is read in the same STM transaction as the deletion — if another client was inserted between lookup and delete, `sameClient` returns False and the delete is skipped. After setting the TVar to `Nothing`, the entry is also removed from the TMap. + +## insertServerClient — connected check + +`insertServerClient` checks `connected` inside the STM transaction before inserting. If the client was already marked disconnected (race with cleanup), the insert is skipped and returns `False`. This prevents resurrecting a disconnected client in the server map. + +## SupportedStore — compile-time storage validation + +Type family with `(Int ~ Bool, TypeError ...)` for invalid combinations. The unsatisfiable `Int ~ Bool` constraint forces GHC to emit the `TypeError` message. Valid: Memory+Memory, Memory+Journal, Postgres+Journal, Postgres+Postgres (with flag). Invalid: Memory+Postgres, Postgres+Memory. The `dbServerPostgres` CPP flag controls whether Postgres+Postgres is available. + +## newEnv — initialization order + +Store initialization order matters: (1) create message store (loads store log for STM backends), (2) create notification store (empty TMap), (3) generate TLS credentials, (4) compute server identity from fingerprint, (5) create stats, (6) create proxy agent. The store log load (`loadStoreLog`) calls `readWriteQueueStore` which reads the existing log, replays it to build state, then opens a new log for writing. `setStoreLog` attaches the write log to the store. + +HTTPS credentials are validated: must be at least 4096-bit RSA (`public_size >= 512` bytes). The check explicitly notes that Let's Encrypt ECDSA uses "insecure curve p256." + +## ServerSubscribers — dual subscriber tracking + +`ServerSubscribers` has two `SubscribedClients` maps: `queueSubscribers` (one entry per queue, for direct subscriptions) and `serviceSubscribers` (one entry per service, for service-certificate subscriptions). `totalServiceSubs` tracks the aggregate `(count, IdsHash)` across all services. `subClients` is an `IntSet` of all client IDs with any subscription (union of queue and service subscribers) — used for idle disconnect decisions. + +## endThreads — weak references with sequence counter + +See comment on `endThreads`. Forked client threads (delivery, proxy commands) are tracked in `IntMap (Weak ThreadId)` with a monotonically increasing `endThreadSeq`. On client disconnect, all threads are swapped out and killed. Weak references allow GC to collect threads that finished normally without explicit cleanup. diff --git a/spec/modules/Simplex/Messaging/Server/Expiration.md b/spec/modules/Simplex/Messaging/Server/Expiration.md new file mode 100644 index 000000000..a51e24c20 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Expiration.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.Expiration + +> Expiration configuration and epoch calculation. + +**Source**: [`Expiration.hs`](../../../../../src/Simplex/Messaging/Server/Expiration.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/Information.md b/spec/modules/Simplex/Messaging/Server/Information.md new file mode 100644 index 000000000..16f153154 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Information.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.Information + +> Server public information types (config, operator, hosting) for the server info page. + +**Source**: [`Information.hs`](../../../../../src/Simplex/Messaging/Server/Information.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/Main.md b/spec/modules/Simplex/Messaging/Server/Main.md new file mode 100644 index 000000000..aed538573 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Main.md @@ -0,0 +1,37 @@ +# Simplex.Messaging.Server.Main + +> Server CLI entry point: dispatches Init, Start, Delete, Journal, and Database commands. + +**Source**: [`Main.hs`](../../../../../src/Simplex/Messaging/Server/Main.hs) + +## Overview + +This is the CLI dispatcher for the SMP server. It parses INI configuration, validates storage mode combinations, and dispatches to the appropriate command handler. The most complex logic is storage configuration validation and migration between storage modes. + +## Storage mode compatibility — state machine + +`checkMsgStoreMode` and `iniStoreCfg` implement a state machine of valid storage mode combinations. Valid: Memory+Memory, Memory+Journal (deprecated), Postgres+Journal, Postgres+Postgres (with flag). Invalid: Memory+Postgres (queue store doesn't support it), Postgres+Memory (messages can't be in-memory with DB queues). Error messages guide the user toward migration commands. The validity is also enforced at the type level via `SupportedStore` in [Env/STM.md](./Env/STM.md#supportedstore--compile-time-storage-validation). + +## INI parsing — error context loss + +`readIniFile` errors are coerced to `String` without structured error information. When INI keys are missing or unparseable, `strictIni` calls `error` (see [CLI.md](./CLI.md#strictini--inionoff--error-semantics)). No line numbers or parse context are preserved. + +## restore_messages — implicit default propagation + +The `restore_messages` INI setting has three-valued logic: explicit "on" → restore, explicit "off" → skip, missing → inherits from `enable_store_log`. This implicit default is not captured in the type system — callers see `Maybe Bool` that silently resolves against another setting. + +## serverPublicInfo — validation with field dependencies + +`sourceCode` is required if ANY `ServerPublicInfo` field is present (in line with AGPLv3 license). `operator_country` requires `operator` to be set. `hosting_country` requires `hosting`. These constraints are enforced at parse time, not by the type system — they can be violated by programmatic construction. + +## initializeServer — fingerprint invariant + +During init, the CA certificate fingerprint is saved to a file. On every subsequent start, `checkSavedFingerprint` (in CLI.hs) validates that the current CA certificate matches the saved fingerprint. If the certificate is replaced without updating the fingerprint file, startup fails. This prevents silent key rotation. + +## Database import — non-atomic migration + +`importStoreLogToDatabase` reads the store log into memory, writes to database, then renames the original file with `.bak` suffix. If the function fails after partial database writes, the original file is still present but the database has partial data. No transactional guarantee across the file→DB boundary. + +## Journal store deprecation warning + +`SSCMemoryJournal` initialization prints a deprecation warning (see `newEnv` in Env/STM.hs). Journal message stores will be removed — migration path is: journal export → database import. diff --git a/spec/modules/Simplex/Messaging/Server/Main/Init.md b/spec/modules/Simplex/Messaging/Server/Main/Init.md new file mode 100644 index 000000000..665938ae8 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Main/Init.md @@ -0,0 +1,17 @@ +# Simplex.Messaging.Server.Main.Init + +> Server initialization: INI file content generation, default settings, and CLI option structures. + +**Source**: [`Main/Init.hs`](../../../../../../src/Simplex/Messaging/Server/Main/Init.hs) + +## iniFileContent — selective commenting + +`iniFileContent` uses `optDisabled`/`optDisabled'` to conditionally comment out INI settings. A setting appears commented when it was not explicitly provided or matches the default value. Consequence: regenerating the INI file after user changes will re-comment modified-to-default values, making it appear the user's change was reverted. + +## iniDbOpts — default fallback + +Database connection settings are uncommented only if they differ from `defaultDBOpts`. A custom connection string that matches the default will appear commented. + +## Control port passwords — independent generation + +Admin and user control port passwords are generated as two independent `randomBase64 18` calls during initialization. Despite `let pwd = ... in (,) <$> pwd <*> pwd` appearing to share a value, `pwd` is an IO action — applicative `<*>` executes it twice, producing two different random passwords. The INI template thus has distinct admin and user passwords. diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore.md b/spec/modules/Simplex/Messaging/Server/MsgStore.md new file mode 100644 index 000000000..625649170 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/MsgStore.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.MsgStore + +> Message log record type for store log serialization. + +**Source**: [`MsgStore.hs`](../../../../../src/Simplex/Messaging/Server/MsgStore.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore/Journal.md b/spec/modules/Simplex/Messaging/Server/MsgStore/Journal.md new file mode 100644 index 000000000..3c0ab6afc --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/MsgStore/Journal.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.MsgStore.Journal + +> **Deprecated** — will be removed. Migration path: `journal export` → in-memory, then `database import` → PostgreSQL. See deprecation warning in [Env/STM.hs](../../../../../../src/Simplex/Messaging/Server/Env/STM.hs) `SSCMemoryJournal` initialization. + +**Source**: [`Journal.hs`](../../../../../../src/Simplex/Messaging/Server/MsgStore/Journal.hs) + +No further documentation — this module is deprecated. diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md b/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md new file mode 100644 index 000000000..7262bde0a --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md @@ -0,0 +1,57 @@ +# Simplex.Messaging.Server.MsgStore.Postgres + +> PostgreSQL message store: server-side stored procedures for message operations, COPY protocol for bulk import. + +**Source**: [`Postgres.hs`](../../../../../../src/Simplex/Messaging/Server/MsgStore/Postgres.hs) + +## MsgQueue is unit type + +`type MsgQueue PostgresMsgStore = ()`. There is no message queue object for Postgres — all message operations go directly to the database via stored procedures. Functions like `getMsgQueue` return `pure ()`. + +## Partial interface — error stubs + +Multiple `MsgStoreClass` methods are `error "X not used"`: `withActiveMsgQueues`, `unsafeWithAllMsgQueues`, `logQueueStates`, `withIdleMsgQueue`, `getQueueMessages_`, `tryDeleteMsg_`, `setOverQuota_`, `getQueueSize_`, `unsafeRunStore`. These are required by the type class but not applicable to Postgres. Calling any at runtime crashes. Postgres overrides the default implementations of `tryPeekMsg`, `tryDelMsg`, `tryDelPeekMsg`, `deleteExpiredMsgs`, and `getQueueSize` with direct database calls. + +## writeMsg — quota logic in stored procedure + +`write_message(?,?,?,?,?,?,?)` is a PostgreSQL stored procedure that returns `(quota_written, was_empty)`. Quota enforcement happens in SQL, not in Haskell. This means quota logic is duplicated: STM store checks `canWrite` flag in Haskell, Postgres store checks in the database function. The two implementations must agree on quota semantics. + +## tryDelPeekMsg — variable row count + +The stored procedure `try_del_peek_msg` returns 0, 1, or 2 rows. For the 1-row case, the code checks whether the returned message's `messageId` matches the requested `msgId` to distinguish "deleted, no next message" from "delete failed, current message returned." This disambiguation is possible because the stored procedure always returns available messages even when deletion doesn't match. + +## uninterruptibleMask_ on all database operations + +All write operations (`writeMsg`, `tryDelMsg`, `tryDelPeekMsg`, `deleteExpiredMsgs`) and `isolateQueue` are wrapped in `E.uninterruptibleMask_`. This prevents async exceptions (e.g., client disconnect) from interrupting mid-transaction, which could leave database connections in an inconsistent state. + +## batchInsertMessages — COPY protocol + +Uses PostgreSQL's COPY FROM STDIN protocol (`DB.copy_` + `DB.putCopyData` + `DB.putCopyEnd`) for bulk message import, which is much faster than individual INSERTs. Messages are encoded to CSV format. Parse errors on individual records are logged and skipped — the import is error-tolerant. The entire operation runs in a single transaction (`withTransaction`). + +## exportDbMessages — batched I/O + +Accumulates rows in an `IORef` list (prepended for O(1) insert), flushing every 1000 records with `reverse` to restore order. Uses `DB.foldWithOptions_` with `fetchQuantity = Fixed 1000` to avoid loading all messages into memory. + +## updateQueueCounts — two-step reset + +Creates a temp table with aggregated message stats, then updates `msg_queues` in two steps: first zeros all queue counts, then applies actual stats from the temp table. The two-step approach handles queues with zero messages: they're reset by the first UPDATE but not touched by the second (no matching row in temp table). + +## toMessage — nanosecond precision lost + +`MkSystemTime ts 0` constructs timestamps with zero nanoseconds. Only whole seconds are stored in the database. Messages read from Postgres have coarser timestamps than messages in STM/Journal stores. + +## isolateQueue IS the transaction boundary + +`isolateQueue` for Postgres does `uninterruptibleMask_ $ withDB' op ... $ runReaderT a . DBTransaction`. Each `isolateQueue` call creates a fresh `DBTransaction` carrying the DB connection. This is how `tryPeekMsg_` (which uses `asks dbConn`) gets its connection. The `withQueueLock` is identity for Postgres, so `isolateQueue` provides no mutual exclusion — only the DB transaction provides isolation. + +## newMsgStore hardcodes useCache = False + +`newQueueStore @PostgresQueue (queueStoreCfg config, False)` — the Postgres message store always disables queue caching. All lookups go directly to the database. Contrast with the Journal+Postgres combination where caching is enabled. + +## deleteQueueSize — size before delete + +`deleteQueueSize` calls `getQueueSize` BEFORE `deleteStoreQueue`. The returned size is the count at query time — a concurrent `writeMsg` between the size query and the delete means the reported size is stale. This is acceptable because the size is used for statistics, not for correctness. + +## unsafeMaxLenBS + +`toMessage` uses `C.unsafeMaxLenBS` to bypass the `MaxLen` length check on message bodies read from the database. A TODO comment questions this choice. If the database contains oversized data, the length invariant is silently violated. diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore/STM.md b/spec/modules/Simplex/Messaging/Server/MsgStore/STM.md new file mode 100644 index 000000000..95423cdf1 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/MsgStore/STM.md @@ -0,0 +1,29 @@ +# Simplex.Messaging.Server.MsgStore.STM + +> In-memory STM message store: TQueue-based message queues with quota enforcement. + +**Source**: [`STM.hs`](../../../../../../src/Simplex/Messaging/Server/MsgStore/STM.hs) + +## withQueueLock is identity + +`withQueueLock _ _ = id` — STM queues need no locking since STM provides atomicity. Journal.hs overrides this with a `TMVar`-based in-memory lock (via `withLockWaitShared`). Any code calling `withQueueLock` transparently gets the right concurrency control for the backend. + +## writeMsg — quota with empty-queue override + +When `canWrite` is `False` (over quota) but the queue is empty, writing is still allowed. This handles the case where all messages were deleted or expired but the `canWrite` flag was not reset. When the quota is exceeded, the actual message content is replaced with a `MessageQuota` (preserving only `msgId` and `msgTs`) — the client receives a quota notification instead of the message. + +## getMsgQueue — lazy initialization + +The message queue TVar (`msgQueue'`) starts as `Nothing`. The queue is created on first `getMsgQueue` call (lazy initialization). This means queues that are created but never receive messages don't allocate a TQueue. `getPeekMsgQueue` returns `Nothing` if no message queue exists — callers handle this as "queue is empty." + +## deleteQueue_ — atomic swap prevents post-delete operations + +`swapTVar (msgQueue' q) Nothing` atomically retrieves the old message queue and sets to `Nothing`. Any subsequent `getMsgQueue` call would create a fresh empty queue, but the deleted queue's `queueRec` TVar is also set to `Nothing` by `deleteStoreQueue`, so all operations would fail with `AUTH` first. + +## tryDeleteMsg_ — blind dequeue, no msgId check + +`tryDeleteMsg_` does `tryReadTQueue` — removes whatever is at the head without verifying the message ID. The msgId check lives in the default `tryDelMsg` / `tryDelPeekMsg` implementations in `Types.hs`, which always call `tryPeekMsg_` first to verify. Calling `tryDeleteMsg_` directly would silently delete the wrong message if the head changed between peek and delete. Safe only because `isolateQueue` serializes all operations on the same queue. + +## getQueueMessages_ snapshot — invisible gap + +`getQueueMessages_ False` implements non-destructive read by flushing TQueue then writing back. This runs inside `atomically` (via `isolateQueue`), so the temporarily-empty state is never visible to other transactions. diff --git a/spec/modules/Simplex/Messaging/Server/MsgStore/Types.md b/spec/modules/Simplex/Messaging/Server/MsgStore/Types.md new file mode 100644 index 000000000..2fd4c79bf --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/MsgStore/Types.md @@ -0,0 +1,29 @@ +# Simplex.Messaging.Server.MsgStore.Types + +> Type class for message stores with injective type families and polymorphic isolation. + +**Source**: [`Types.hs`](../../../../../../src/Simplex/Messaging/Server/MsgStore/Types.hs) + +## Injective type families + +All associated types (`StoreMonad`, `MsgQueue`, `StoreQueue`, `QueueStore`, `MsgStoreConfig`) use injective type families (`| m -> s`). This means each associated type uniquely determines the store type, avoiding ambiguity at call sites. Without injectivity, most call sites would need explicit type applications. + +## isolateQueue — polymorphic isolation + +`isolateQueue` abstracts the concurrency model: STM store implements it as `liftIO . atomically` (single STM transaction), while Journal store acquires a TMVar-based in-memory lock (not a filesystem lock). All message operations go through `isolateQueue` or `withPeekMsgQueue` (which calls `isolateQueue`). This means the atomicity guarantee varies by backend — STM gives true atomicity, Journal gives mutual exclusion via lock. + +## tryDelPeekMsg — atomic delete-and-peek + +Deletes the current message AND peeks the next one in a single `isolateQueue` call. This atomicity is critical for the ACK flow: the server needs to know if there's a next message to deliver immediately after acknowledging the current one, without a window where a concurrent SEND could interleave. + +## withIdleMsgQueue — journal-specific lifecycle + +For Journal store, the message queue file handle is closed after the action if it was initially closed or idle longer than the configured interval. For STM store, this is effectively a no-op (always open, never "idle"). The return tuple `(Maybe a, Int)` provides both the action result and the queue size — the `Maybe` is `Nothing` when no message queue exists (no messages ever written). + +## unsafeWithAllMsgQueues — CLI-only + +Explicitly unsafe: iterates all queues including those not in active memory. Only safe before server start or in CLI commands. During normal operation, Journal store may have queues on disk but not loaded — this function would load them, interfering with the lazy-loading lifecycle. + +## snapshotTQueue visibility gap + +`getQueueMessages_ False` (non-destructive read) flushes the TQueue then writes all messages back. Between flush and rewrite, concurrent STM transactions would see an empty queue. Since this runs inside `atomically` for STM store, the gap is invisible to other transactions. For Journal store (where `StoreMonad` is IO-based), this is not used. diff --git a/spec/modules/Simplex/Messaging/Server/NtfStore.md b/spec/modules/Simplex/Messaging/Server/NtfStore.md new file mode 100644 index 000000000..b58a44fad --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/NtfStore.md @@ -0,0 +1,15 @@ +# Simplex.Messaging.Server.NtfStore + +> In-memory notification store: per-notifier message notification lists with expiration. + +**Source**: [`NtfStore.hs`](../../../../../src/Simplex/Messaging/Server/NtfStore.hs) + +## storeNtf — outside-STM lookup with STM fallback + +`storeNtf` uses `TM.lookupIO` outside STM, then falls back to `TM.lookup` inside STM if the notifier entry doesn't exist. This is the same outside-STM lookup pattern used in Server.hs and Client/Agent.hs — avoids transaction re-evaluation from unrelated map changes. The double-check inside STM prevents races when two messages arrive concurrently for a new notifier. + +## deleteExpiredNtfs — last-is-earliest optimization + +Notifications are prepended (cons), so the last element in the list is the earliest. `deleteExpiredNtfs` checks `last ntfs` first — if the earliest notification is not expired, none are, and the entire list is skipped without filtering. This avoids traversing notification lists that have no expired entries. + +The outer `readTVarIO` check for empty list avoids entering an STM transaction at all for notifiers with no notifications. diff --git a/spec/modules/Simplex/Messaging/Server/Prometheus.md b/spec/modules/Simplex/Messaging/Server/Prometheus.md new file mode 100644 index 000000000..11610ee23 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Prometheus.md @@ -0,0 +1,21 @@ +# Simplex.Messaging.Server.Prometheus + +> Prometheus text exposition format for server metrics, with histogram gap-filling and derived aggregations. + +**Source**: [`Prometheus.hs`](../../../../../src/Simplex/Messaging/Server/Prometheus.hs) + +## Histogram gap-filling + +`showTimeBuckets` uses `mapAccumL` over sorted bucket keys. When the gap between consecutive buckets exceeds 60 seconds, it inserts a synthetic bucket at `sec - 60` with the cumulative total up to that point. This fills sparse `TimeBuckets` maps into continuous Prometheus histograms. The 60-second gap threshold is hardcoded. + +## Bucket sum aggregation — filters by value, not key + +`showBucketSums` intends to aggregate buckets into fixed time periods: 0-60s, 60-300s, 300-1200s, 1200-3600s, 3600+s. However, `IM.filter` (from `Data.IntMap.Strict`) filters by **value** (count), not by key (time). The predicate `\sec -> minTime <= sec && sec < maxTime` is applied to count values, not to the IntMap keys that represent seconds. This means buckets are selected based on whether their count falls in the range, not based on their time boundary. The aggregation boundaries are also independent of the bucketing thresholds in `updateTimeBuckets` (Stats.hs), which uses 5s/10s/30s/60s quantization. + +## Non-standard Prometheus timestamp output + +The `mstr` function appends `tsEpoch ts` (millisecond-precision Unix timestamp) directly after metric values, which is valid Prometheus text exposition format. + +## Delivery histogram count/sum source + +`simplex_smp_delivery_ack_confirmed_time_count` is `_msgRecv + _msgRecvGet`. `simplex_smp_delivery_ack_confirmed_time_sum` is `sumTime` from `_msgRecvAckTimes`. The count is accumulated separately from the histogram — if there's a code path that increments `msgRecv` without calling `updateTimeBuckets`, count and sum diverge. diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore.md b/spec/modules/Simplex/Messaging/Server/QueueStore.md new file mode 100644 index 000000000..c906c9ecc --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/QueueStore.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.QueueStore + +> Core record types for queue storage: QueueRec, NtfCreds, ServiceRec, ServerEntityStatus. + +**Source**: [`QueueStore.hs`](../../../../../src/Simplex/Messaging/Server/QueueStore.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md b/spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md new file mode 100644 index 000000000..f97acaa2d --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md @@ -0,0 +1,97 @@ +# Simplex.Messaging.Server.QueueStore.Postgres + +> PostgreSQL queue store: cache-coherent TMap layer over database, double-checked locking, soft-delete lifecycle, COPY-based bulk import. + +**Source**: [`Postgres.hs`](../../../../../../src/Simplex/Messaging/Server/QueueStore/Postgres.hs) + +## addQueue_ — no in-memory duplicate check, relies on DB constraint + +See comment on `addQueue_`: "Not doing duplicate checks in maps as the probability of duplicates is very low." The STM implementation checks all four ID maps before insertion and returns `DUPLICATE_`. The Postgres implementation skips this and relies on `UniqueViolation` from the DB, which `handleDuplicate` maps to `AUTH`, not `DUPLICATE_`. The same logical error produces different error codes depending on the store backend. + +## addQueue_ — non-atomic cache updates + +After the successful SQL INSERT, each cache map (`queues`, `senders`, `notifiers`, `links`) is updated in its own `atomically` block. Between these updates, the cache is partially consistent — a concurrent `getQueue_` by sender ID could miss the queue during the window between the `queues` insert and the `senders` insert. The STM implementation updates all maps in a single `atomically` block. `E.uninterruptibleMask_` prevents async exceptions but not concurrent reads. + +## getQueue_ / SNotifier — one-shot cache eviction on read + +See comment on `getQueue_` for the SNotifier case. After a successful notifier lookup, the notifier ID is deleted from the `notifiers` TMap. This makes the notifier cache a one-shot cache: the first lookup uses the cache, subsequent lookups hit the database. Unique to SNotifier — SSender entries persist indefinitely. The batch path (`getQueues_` SNotifier) does NOT do this eviction, so single and batch paths have different cache side effects. + +## getQueue_ / loadNtfQueue — notifier lookups never cache the queue + +See comment on `loadNtfQueue`: "checking recipient map first, not creating lock in map, not caching queue." Notifier-initiated DB loads produce ephemeral queue objects created with `mkQ False` (no persistent lock). Two concurrent notifier lookups for the same queue create independent queue objects with separate `TVar`s. Contrast with `loadSndQueue_` which caches via `cacheQueue`. + +## cacheQueue — double-checked locking + +Classic pattern: (1) TMap lookup outside lock, (2) if miss, DB load + create queue + acquire `withQueueLock`, (3) second TMap check inside lock + `atomically`, (4) if another thread won the race, discard the freshly created queue. See comment on `cacheQueue` for the rationale about preventing duplicate file opens. For Journal storage, the losing thread's lock remains in `queueLocks` as a harmless orphan. For Postgres-only storage (`mkQueue` creates a TVar), no resource leak. + +## getQueues_ — snapshot-based cache with stale-read risk + +Both SRecipient and SNotifier paths start with `readTVarIO` snapshots of the relevant TMap(s), then partition requested IDs into "found" and "need DB load." Between snapshot and DB query, the cache can change. The `cacheRcvQueue` path handles this with a second check inside the lock. The SNotifier path does NOT cache — it uses the stale snapshot to decide `maybe (mkQ False rId qRec) pure (M.lookup rId qs)`, so concurrent loads can create duplicate ephemeral objects. + +## getQueues_ — error code asymmetry: INTERNAL vs AUTH + +When all IDs are found in cache but some map to `Left` (theoretically impossible), the error is `INTERNAL`. When some IDs needed DB loading and were missing, the error is `AUTH`. Same "not found" condition, different error codes depending on whether the DB was consulted. The `INTERNAL` branch is a defensive assertion against inconsistent TMap snapshots. + +## withDB — every operation runs in its own transaction + +`withDB` wraps each action in `withTransaction` (PostgreSQL `READ COMMITTED`). No multi-statement transactions in queue store operations (unlike `getEntityCounts` and `batchInsertQueues` which use `withTransaction` directly). SQL exceptions are caught, logged, and mapped to `STORE` with the exception text — which propagates to the SMP client over the wire. + +## withQueueRec — lock-mask-read pattern + +All mutating operations share: (1) `withQueueLock` (per-queue lock), (2) `E.uninterruptibleMask_` (no async exceptions mid-operation), (3) `readQueueRecIO` (check queue not deleted). If the TVar reads `Nothing`, the operation short-circuits with `AUTH` without touching the database. The TVar is the authoritative "is deleted" check; `assertUpdated` (zero rows → `AUTH`) catches cache-DB divergence as a secondary check. + +## deleteStoreQueue — two-phase soft delete + +Queue deletion is soft: `UPDATE ... SET deleted_at = ?`. The row remains in the database. `compactQueues` later does the hard delete: `DELETE ... WHERE deleted_at < ?` using the configurable `deletedTTL`. All queries include `AND deleted_at IS NULL` to exclude soft-deleted rows. The STM implementation has no equivalent — `compactQueues` returns `pure 0`. + +## deleteStoreQueue — non-atomic cache cleanup, links never cleaned + +The TVar is set to `Nothing` first, then secondary maps (`senders`, `notifiers`, `notifierLocks`) are cleaned in separate `atomically` blocks. Between these, secondary maps point to a dead queue (functionally correct — returns AUTH either way). The `links` map is never cleaned up here — link entries for deleted queues remain in memory indefinitely. + +## secureQueue — idempotency difference from STM + +Re-securing with the same key falls through the verify function to `pure ()`, then **still executes the SQL UPDATE and TVar write**. The STM implementation returns `Right ()` without TVar mutation when the same key is provided. Both implementations write a store log entry either way. The Postgres version performs an unnecessary DB round-trip, connection pool checkout, and TVar write that the STM version avoids. + +## addQueueNotifier — three-layer duplicate detection + +(1) **Cache check**: `checkCachedNotifier` acquires a per-notifier-ID lock via `notifierLocks`, then checks `TM.memberIO`. Returns `DUPLICATE_`. (2) **Queue lock**: Via `withQueueRec`, prevents concurrent modifications to the same queue. (3) **Database constraint**: `handleDuplicate` catches `UniqueViolation`, returns `AUTH`. Same duplicate, different error codes depending on whether cache was warm. The `notifierLocks` map grows unboundedly — locks are never removed except when the queue is deleted. + +## addQueueNotifier — always clears notification service + +The SQL UPDATE always sets `ntf_service_id = NULL` when adding/replacing a notifier. The previous notifier's service association is silently lost. The STM implementation additionally calls `removeServiceQueue` to update service-level tracking; the Postgres version does not. + +## rowToQueueRec — link data replaced with empty stubs + +The standard `queueRecQuery` does NOT select `fixed_data` and `user_data` columns. When converting to `QueueRec`, link data is stubbed: `(,(EncDataBytes "", EncDataBytes "")) <$> linkId_`. Actual link data is loaded on demand via `getQueueLinkData`. Any code reading `queueData` from a cached `QueueRec` without going through `getQueueLinkData` sees empty bytes. The separate `rowToQueueRecWithData` (used by `foldQueueRecs` with `withData = True`) includes real data. + +## getCreateService — serialization via serviceLocks + +Entire operation wrapped in `withLockMap (serviceLocks st) fp`, serializing all creation/lookup for the same certificate fingerprint. Inside the lock: SELECT by `service_cert_hash`, if not found attempt INSERT catching `UniqueViolation`. The `serviceLocks` map grows unboundedly — no cleanup mechanism. + +## batchInsertQueues — COPY protocol with manual CSV serialization + +Uses PostgreSQL's `COPY FROM STDIN WITH (FORMAT CSV)` for bulk import. Queue records manually serialized via `queueRecToText`/`renderField`. This must stay in sync with `insertQueueQuery` column order — a mismatch causes silent data corruption. The `renderField` function does not escape CSV metacharacters, which is safe only because field values (entity IDs, keys, DH secrets) are binary data without commas/quotes/newlines. Runs in a single transaction; row count queried in a separate transaction afterward. + +## withLog_ — fire-and-forget store log writes + +`withLog_` catches all exceptions via `catchAny` and logs a warning, but does not fail the operation. Store log writes are best-effort. Contrast with the STM `withLog'` where log failures can propagate as `STORE` errors. In the Postgres implementation, the store log can fall behind the database state since the DB is the authoritative persistence layer. + +## useCache flag — behavioral bifurcation + +`useCache :: Bool` creates two distinct code paths. When `False`: `addQueue_` skips all TMap updates, `getQueue_` always loads from DB, `addQueueNotifier` skips cache duplicate check, `deleteStoreQueue` skips cache cleanup. Notably, `loadQueueNoCache` still creates queues with `mkQ True` (persistent lock) even though caching is disabled — the lock is needed for `withQueueRec`'s `withQueueLock`. + +## getServiceQueueCountHash — behavioral divergence from STM + +Postgres returns `Right (0, mempty)` when the service is not found (via `maybeFirstRow'` default). STM returns `Left AUTH`. Same logical condition, different error handling. Callers that expect AUTH on missing service will silently get a zero count from Postgres. + +## deleteStoreQueue — cross-module lock contract + +See comment on `deleteStoreQueue`: "this method is called from JournalMsgStore deleteQueue that already locks the queue." Unlike other mutations that go through `withQueueRec` (which acquires the lock), `deleteStoreQueue` uses `E.uninterruptibleMask_ $ runExceptT` directly — no `withQueueLock`. The caller must hold the lock. + +## addQueueLinkData — immutable data protection + +When link data already exists with the same `lnkId`, the SQL UPDATE adds `AND (fixed_data IS NULL OR fixed_data = ?)` to prevent overwriting immutable (fixed) data. If the immutable portion doesn't match, `assertUpdated` triggers AUTH. This enforces the invariant that `fixed_data` can only be set once. + +## assertUpdated — AUTH is overloaded + +`assertUpdated` checks that non-zero rows were affected. Zero rows → `AUTH`. This is the same error code returned for "not found" (via `readQueueRecIO`) and "duplicate" (via `handleDuplicate`). The actual cause — stale cache, deleted queue, or constraint violation — is indistinguishable in logs. diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore/QueueInfo.md b/spec/modules/Simplex/Messaging/Server/QueueStore/QueueInfo.md new file mode 100644 index 000000000..b0ca64877 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/QueueStore/QueueInfo.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.QueueStore.QueueInfo + +> Data types for queue info display (control port), with JSON encoding. + +**Source**: [`QueueInfo.hs`](../../../../../../src/Simplex/Messaging/Server/QueueStore/QueueInfo.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore/STM.md b/spec/modules/Simplex/Messaging/Server/QueueStore/STM.md new file mode 100644 index 000000000..6ff8da3b5 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/QueueStore/STM.md @@ -0,0 +1,37 @@ +# Simplex.Messaging.Server.QueueStore.STM + +> In-memory STM queue store: queue CRUD with store log journaling and service tracking. + +**Source**: [`STM.hs`](../../../../../../src/Simplex/Messaging/Server/QueueStore/STM.hs) + +## addQueue_ — atomic multi-ID DUPLICATE check + +`addQueue_` checks ALL entity IDs (recipient, sender, notifier, link) for existence in a single STM transaction. If ANY already exist, returns `DUPLICATE_` without inserting anything. This prevents partial state where some IDs were inserted before the duplicate was detected on another. The `mkQ` callback runs outside STM before the check — the queue object is created optimistically and discarded if the check fails. + +## getCreateService — outside-STM with role validation + +`getCreateService` uses the outside-STM lookup pattern (`TM.lookupIO` then STM fallback). When a service cert already exists, `checkService` validates the role matches — a cert attempting to register with a different `SMPServiceRole` gets `SERVICE` error. A new service is only created if the ID is not already in `services` (prevents DUPLICATE). The `(serviceId, True/False)` return indicates whether the log should be written (only for new services). + +## IdsHash XOR in setServiceQueues_ + +Both `addServiceQueue` and `removeServiceQueue` use `setServiceQueues_`, which unconditionally XORs `queueIdHash qId` into `idsHash`. Since XOR is self-inverse, removal cancels addition. However, the XOR is applied blindly — there is no `S.member` guard. If `addServiceQueue` were called twice for the same `qId`, the XOR would self-cancel while the `Set` (via `S.insert` idempotency) retains the element, making hash and Set inconsistent. Similarly, `removeServiceQueue` on a non-member XORs a phantom ID into the hash. Correctness relies on callers maintaining the invariant: each `qId` is added exactly once and removed at most once per service. + +## withLog — uninterruptibleMask_ for log integrity + +Store log writes are wrapped in `E.uninterruptibleMask_` — cannot be interrupted by async exceptions during the write. This prevents partial log records that would corrupt the store log file during replay. Synchronous exceptions are caught by `E.try` and converted to `STORE` error (logged, not crashed). + +## secureQueue — idempotent replay + +If `senderKey` already matches the provided key, returns `Right ()`. A different key returns `Left AUTH`. This idempotency is essential for store log replay where the same `SecureQueue` record may be applied multiple times. + +## getQueues_ — map snapshot for batch consistency + +Batch queue lookups (`getQueues_`) read the entire TVar map once with `readTVarIO`, then look up each queue ID in the pure `Map`. This provides a consistent snapshot (all lookups see the same map state) and is more efficient than per-queue IO lookups for large batches. + +## closeQueueStore — non-atomic shutdown + +`closeQueueStore` clears TMaps in separate `atomically` calls, not one transaction. Concurrent operations during shutdown could see partially cleared state. This is acceptable because the store log is closed first, and the server should not be processing new requests during shutdown. + +## addQueueLinkData — conditional idempotency + +Re-adding link data with the same `lnkId` and matching first component of `QueueLinkData` succeeds (idempotent replay). Different `lnkId` or mismatched data returns `AUTH`. This handles store log replay where the same `CreateLink` may be applied multiple times. diff --git a/spec/modules/Simplex/Messaging/Server/QueueStore/Types.md b/spec/modules/Simplex/Messaging/Server/QueueStore/Types.md new file mode 100644 index 000000000..173cbe967 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/QueueStore/Types.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.QueueStore.Types + +> Type classes for queue store and stored queue operations. + +**Source**: [`Types.hs`](../../../../../../src/Simplex/Messaging/Server/QueueStore/Types.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Server/Stats.md b/spec/modules/Simplex/Messaging/Server/Stats.md new file mode 100644 index 000000000..056dc4a88 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Stats.md @@ -0,0 +1,39 @@ +# Simplex.Messaging.Server.Stats + +> Server statistics: counters, rolling period tracking, delivery time histograms, proxy stats, service stats. + +**Source**: [`Stats.hs`](../../../../../src/Simplex/Messaging/Server/Stats.hs) + +## Overview + +All stats are `IORef`-based, not STM — individual increments are atomic (`atomicModifyIORef'_`) but multi-field reads are not transactional. `getServerStatsData` reads 30+ IORefs sequentially — the resulting snapshot is temporally smeared, not a point-in-time atomic view. + +## PeriodStats — rolling window with boundary-only reset + +`PeriodStats` maintains three `IORef IntSet` (day, week, month). `updatePeriodStats` hashes the entity ID and inserts into all three periods. `periodStatCounts` resets a period's IntSet **only** when the period boundary is reached (day 1 of that period). At non-boundary times, it returns `""` (empty string) — the data is kept accumulating but not reported. + +See comment on `periodCount`. At day boundary (`periodCount 1 ref`), the day set is atomically swapped to empty and its size returned. Week resets on Monday (day 1 of week), month on the 1st. Periods are independent — day reset does NOT affect week/month accumulation. Each period counts unique queue hashes that were active during that period. + +## Disabled metrics — performance trade-offs + +See comments on `qSubNoMsg` and `subscribedQueues` in the source. `qSubNoMsg` is disabled because counting "subscription with no message" creates too many STM transactions. `subscribedQueues` is disabled because maintaining PeriodStats-style IntSets for all subscribed queues uses too much memory. Both fields are omitted from the stats output entirely. The parser handles old log files that contain these fields: `qSubNoMsg` is silently skipped via `skipInt`, and `subscribedQueues` is parsed but replaced with empty data. + +## TimeBuckets — ceil-aligned bucketing with precision loss + +`updateTimeBuckets` quantizes delivery-to-acknowledgment times into sparse buckets. Exact for 0-5s, then ceil-aligned: 6-30s → 5s buckets, 31-60s → 10s, 61-180s → 30s, 180+s → 60s. The `toBucket` formula uses `- ((- n) \`div\` m) * m` for ceiling division. `sumTime` and `maxTime` preserve exact values; only the histogram is lossy. + +## Serialization backward compatibility — silent data coercion + +The `strP` parser for `ServerStatsData` handles multiple format generations. Old format `qDeleted=` is read as `(value, 0, 0)` — `qDeletedNew` and `qDeletedSecured` default to 0. `qSubNoMsg` is parsed and silently discarded (`skipInt`). `subscribedQueues` is parsed but replaced with empty data. Data loaded from old formats is coerced, not reconstructed — precision is permanently lost. + +## Serialization typo — internally consistent + +The field `_srvAssocUpdated` is serialized as `"assocUpdatedt="` (extra 't') in `ServiceStatsData` encoding. The parser expects the same misspelling. Both sides are consistent, so it works — but external systems expecting `assocUpdated=` will fail to parse. + +## atomicSwapIORef for stats logging + +In `logServerStats` (Server.hs), each counter is read and reset via `atomicSwapIORef ref 0`. This is lock-free but means counters are zeroed after each logging interval — values represent delta since last log, not cumulative totals. `qCount` and `msgCount` are exceptions: they're read-only (via `readIORef`) because they track absolute current values, not deltas. + +## setPeriodStats — not thread safe + +See comment on `setPeriodStats`. Uses `writeIORef` (not atomic). Only safe during server startup when no other threads are running. If called concurrently, period data could be corrupted. diff --git a/spec/modules/Simplex/Messaging/Server/StoreLog.md b/spec/modules/Simplex/Messaging/Server/StoreLog.md new file mode 100644 index 000000000..cef1bdfb2 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/StoreLog.md @@ -0,0 +1,36 @@ +# Simplex.Messaging.Server.StoreLog + +> Append-only log for queue state changes: write, read/replay, compaction, crash recovery, backup retention. + +**Source**: [`StoreLog.hs`](../../../../../src/Simplex/Messaging/Server/StoreLog.hs) + +## writeStoreLogRecord — atomicity via manual write + +See comment in `writeStoreLogRecord`. `hPutStrLn` breaks writes larger than 1024 bytes into multiple system calls on `LineBuffered` handles, which could interleave with concurrent writes. The solution is manual `B.hPut` (single call for the complete record + newline) plus `hFlush`. `E.uninterruptibleMask_` prevents async exceptions between write and flush — ensures a complete record is always written. + +## readWriteStoreLog — crash recovery state machine + +The `.start` temp backup file provides crash recovery during compaction. The sequence: + +1. Read existing log, replay into memory +2. Rename log to `.start` (atomic rename = backup point) +3. Write compacted state to new file +4. Rename `.start` to timestamped backup, remove old backups + +If the server crashes during step 3, the next startup detects `.start` and restores from it instead of the incomplete new file. Any partially-written current file is preserved as `.bak`. The comment says "do not terminate" during compaction — there is no safe interrupt point between steps 2 and 4. + +## removeStoreLogBackups — layered retention policy + +Backup retention is layered: (1) keep all backups newer than 24 hours, (2) of the rest, keep at least 3, (3) of those eligible for deletion, only delete backups older than 21 days. This means a server with infrequent restarts accumulates many backups (only cleaned on startup), while a frequently-restarting server keeps a rolling window. Backup timestamps come from ISO 8601 suffixes parsed from filenames. + +## QueueRec StrEncoding — backward-compatible parsing + +The `strP` parser handles two field name generations: old format `sndSecure=` (boolean, mapping `True` → `QMMessaging`, `False` → `QMContact`) and new format `queue_mode=`. Missing queue mode defaults to `Nothing` with the comment "unknown queue mode, we cannot imply that it is contact address." `EntityActive` status is implicit — not written to the log, and parsed as default when `status=` is absent. + +## openReadStoreLog — creates file if missing + +`openReadStoreLog` creates an empty file if it doesn't exist. Callers never need to handle "file not found." + +## foldLogLines — EOF flag for batching + +The `action` callback receives a `Bool` indicating whether the current line is the last one. This allows consumers (like `readQueueStore`) to batch operations and flush only on the final line. diff --git a/spec/modules/Simplex/Messaging/Server/StoreLog/ReadWrite.md b/spec/modules/Simplex/Messaging/Server/StoreLog/ReadWrite.md new file mode 100644 index 000000000..c6fd7e745 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/StoreLog/ReadWrite.md @@ -0,0 +1,17 @@ +# Simplex.Messaging.Server.StoreLog.ReadWrite + +> Store log replay (read) and snapshot (write) for STM queue store. + +**Source**: [`ReadWrite.hs`](../../../../../../src/Simplex/Messaging/Server/StoreLog/ReadWrite.hs) + +## readQueueStore — error-tolerant replay + +Log replay (`readQueueStore`) processes each line independently. Parse errors are printed to stdout and skipped. Operation errors (e.g., queue not found during `SecureQueue` replay) are logged and skipped. A deleted queue encountered during replay (`queueRec` is `Nothing`) logs a warning but does not fail. This means a corrupted log line only loses that single operation, not the entire store. + +## NewService ID validation + +During replay, `getCreateService` may return a different `serviceId` than the one stored in the log (if the service cert already exists with a different ID). This is logged as an error but does not abort replay — the store continues with the ID it assigned. This handles the case where a store log was manually edited or partially corrupted. + +## writeQueueStore — services before queues + +`writeQueueStore` writes services first, then queues. Order matters: when the log is replayed, service IDs must already exist before queues reference them via `rcvServiceId`/`ntfServiceId`. diff --git a/spec/modules/Simplex/Messaging/Server/StoreLog/Types.md b/spec/modules/Simplex/Messaging/Server/StoreLog/Types.md new file mode 100644 index 000000000..491815282 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/StoreLog/Types.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Server.StoreLog.Types + +> GADT wrapper for file handles with type-level IOMode enforcement. + +**Source**: [`Types.hs`](../../../../../../src/Simplex/Messaging/Server/StoreLog/Types.hs) + +No non-obvious behavior. See source. Constructors are intentionally not exported — callers must use `openWriteStoreLog`/`openReadStoreLog`. diff --git a/spec/modules/Simplex/Messaging/Server/Web.md b/spec/modules/Simplex/Messaging/Server/Web.md new file mode 100644 index 000000000..716845aa5 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Server/Web.md @@ -0,0 +1,21 @@ +# Simplex.Messaging.Server.Web + +> Static site generation, serving (HTTP, HTTPS, HTTP/2), and template rendering for the server info page. + +**Source**: [`Web.hs`](../../../../../src/Simplex/Messaging/Server/Web.hs) + +## attachStaticFiles — reusing Warp internals for TLS connections + +`attachStaticFiles` receives already-established TLS connections (which passed TLS handshake and ALPN check in the SMP transport layer) and runs Warp's HTTP handler on them. It manually calls `WI.withII`, `WT.attachConn`, `WI.registerKillThread`, and `WI.serveConnection` — internal Warp APIs. This couples the server to Warp internals and could break on Warp library updates. + +## serveStaticPageH2 — path traversal protection + +The H2 static file server uses `canonicalizePath` to resolve symlinks and `..` components, then checks the resolved path is a prefix of `canonicalRoot`. The caller must pre-compute `canonicalRoot` via `canonicalizePath` for the check to work. Without pre-canonicalization, a symlink in the root itself could defeat the protection. + +## .well-known path rewriting + +Both WAI (`changeWellKnownPath`) and H2 (`rewriteWellKnownH2`) rewrite `/.well-known/` to `/well-known/` because `staticApp` does not serve hidden directories (dot-prefixed). The generated site uses `well-known/` as the physical directory. If one rewrite path is updated without the other, the served files diverge between HTTP/1.1 and HTTP/2. + +## section_ / item_ — template rendering + +`render` applies substitutions to HTML templates using `...` section markers and `${label}` item markers. When a substitution value is `Nothing`, the entire section (including content between markers) is removed. `section_` recurses to handle multiple occurrences of the same section. `item_` is a simple find-and-replace. The section end marker is mandatory — a missing end marker calls `error` (crashes). diff --git a/src/Simplex/Messaging/Server.hs b/src/Simplex/Messaging/Server.hs index 3d977dc8c..bce73d338 100644 --- a/src/Simplex/Messaging/Server.hs +++ b/src/Simplex/Messaging/Server.hs @@ -247,6 +247,7 @@ smpServer started cfg@ServerConfig {transports, transportConfig = tCfg, startOpt closeServer :: M s () closeServer = asks (smpAgent . proxyAgent) >>= liftIO . closeSMPClientAgent + -- spec: spec/modules/Simplex/Messaging/Server.md#serverthread--subscription-lifecycle-with-split-stm serverThread :: forall sub. String -> Server s -> @@ -1223,6 +1224,7 @@ disconnectTransport THandle {connection, params = THandleParams {sessionId}} rcv data VerificationResult s = VRVerified (Maybe (StoreQueue s, QueueRec)) | VRFailed ErrorType +-- spec: spec/modules/Simplex/Messaging/Server.md#constant-time-authorization--dummy-keys -- This function verifies queue command authorization, with the objective to have constant time between the three AUTH error scenarios: -- - the queue and party key exist, and the provided authorization has type matching queue key, but it is made with the different key. -- - the queue and party key exist, but the provided authorization has incorrect type. @@ -1982,6 +1984,7 @@ client -- If the queue is not full, then the thread is created where these checks are made: -- - it is the same subscribed client (in case it was reconnected it would receive message via SUB command) -- - nothing was delivered to this subscription (to avoid race conditions with the recipient). + -- spec: spec/modules/Simplex/Messaging/Server.md#trydelivermessage--syncasync-split-delivery tryDeliverMessage :: Message -> IO () tryDeliverMessage msg = -- the subscribed client var is read outside of STM to avoid transaction cost @@ -2063,6 +2066,7 @@ client encNMsgMeta = C.cbEncrypt rcvNtfDhSecret ntfNonce (smpEncode msgMeta) 128 pure $ MsgNtf {ntfMsgId = msgId, ntfTs = msgTs, ntfNonce, ntfEncMeta = fromRight "" encNMsgMeta} + -- spec: spec/modules/Simplex/Messaging/Server.md#proxy-forwarding--single-transmission-no-service-identity processForwardedCommand :: EncFwdTransmission -> M s BrokerMsg processForwardedCommand (EncFwdTransmission s) = fmap (either ERR RRES) . runExceptT $ do THAuthServer {serverPrivKey, sessSecret'} <- maybe (throwE $ transportErr TENoServerAuth) pure (thAuth thParams') diff --git a/src/Simplex/Messaging/Server/Env/STM.hs b/src/Simplex/Messaging/Server/Env/STM.hs index 574111c15..b4a275922 100644 --- a/src/Simplex/Messaging/Server/Env/STM.hs +++ b/src/Simplex/Messaging/Server/Env/STM.hs @@ -368,6 +368,7 @@ data ServerSubscribers s = ServerSubscribers pendingEvents :: TVar (IntMap (NonEmpty (EntityId, BrokerMsg))) } +-- spec: spec/modules/Simplex/Messaging/Server/Env/STM.md#subscribedclients--tvar-of-maybe-pattern -- not exported, to prevent accidental concurrent Map lookups inside STM transactions. -- Map stores TVars with pointers to the clients rather than client ID to allow reading the same TVar -- inside transactions to ensure that transaction is re-evaluated in case subscriber changes. diff --git a/src/Simplex/Messaging/Server/MsgStore/Postgres.hs b/src/Simplex/Messaging/Server/MsgStore/Postgres.hs index edf7f481c..1617c1c91 100644 --- a/src/Simplex/Messaging/Server/MsgStore/Postgres.hs +++ b/src/Simplex/Messaging/Server/MsgStore/Postgres.hs @@ -76,6 +76,7 @@ data PostgresQueue = PostgresQueue queueRec' :: TVar (Maybe QueueRec) } +-- spec: spec/modules/Simplex/Messaging/Server/MsgStore/Postgres.md#msgqueue-is-unit-type instance StoreQueueClass PostgresQueue where recipientId = recipientId' {-# INLINE recipientId #-} diff --git a/src/Simplex/Messaging/Server/MsgStore/Types.hs b/src/Simplex/Messaging/Server/MsgStore/Types.hs index acb661a40..12566ec2f 100644 --- a/src/Simplex/Messaging/Server/MsgStore/Types.hs +++ b/src/Simplex/Messaging/Server/MsgStore/Types.hs @@ -49,6 +49,7 @@ import Simplex.Messaging.Server.QueueStore import Simplex.Messaging.Server.QueueStore.Types import Simplex.Messaging.Util ((<$$>), ($>>=)) +-- spec: spec/modules/Simplex/Messaging/Server/MsgStore/Types.md#injective-type-families--unambiguous-type-resolution class (Monad (StoreMonad s), QueueStoreClass (StoreQueue s) (QueueStore s)) => MsgStoreClass s where type StoreMonad s = (m :: Type -> Type) | m -> s type MsgStoreConfig s = c | c -> s diff --git a/src/Simplex/Messaging/Server/QueueStore/Postgres.hs b/src/Simplex/Messaging/Server/QueueStore/Postgres.hs index a8c8c040a..bba58e35b 100644 --- a/src/Simplex/Messaging/Server/QueueStore/Postgres.hs +++ b/src/Simplex/Messaging/Server/QueueStore/Postgres.hs @@ -169,6 +169,7 @@ instance StoreQueueClass q => QueueStoreClass q (PostgresQueueStore q) where (SRMessaging, SRNotifier) pure EntityCounts {queueCount, notifierCount, rcvServiceCount, ntfServiceCount, rcvServiceQueuesCount, ntfServiceQueuesCount} + -- spec: spec/modules/Simplex/Messaging/Server/QueueStore/Postgres.md#addqueue_--no-in-memory-duplicate-check-relies-on-db-constraint -- this implementation assumes that the lock is already taken by addQueue -- and relies on unique constraints in the database to prevent duplicate IDs. addQueue_ :: PostgresQueueStore q -> (RecipientId -> QueueRec -> IO q) -> RecipientId -> QueueRec -> IO (Either ErrorType q) diff --git a/src/Simplex/Messaging/Server/QueueStore/STM.hs b/src/Simplex/Messaging/Server/QueueStore/STM.hs index 3a236076c..110a9cd33 100644 --- a/src/Simplex/Messaging/Server/QueueStore/STM.hs +++ b/src/Simplex/Messaging/Server/QueueStore/STM.hs @@ -116,6 +116,7 @@ instance StoreQueueClass q => QueueStoreClass q (STMQueueStore q) where serviceCount role = M.foldl' (\ !n s -> if serviceRole (serviceRec s) == role then n + 1 else n) 0 serviceQueuesCount serviceSel = foldM (\n s -> (n +) . S.size . fst <$> readTVarIO (serviceSel s)) 0 + -- spec: spec/modules/Simplex/Messaging/Server/QueueStore/STM.md#addqueue_--atomic-multi-id-duplicate-check addQueue_ :: STMQueueStore q -> (RecipientId -> QueueRec -> IO q) -> RecipientId -> QueueRec -> IO (Either ErrorType q) addQueue_ st mkQ rId qr@QueueRec {senderId = sId, notifier, queueData, rcvServiceId} = do sq <- mkQ rId qr diff --git a/src/Simplex/Messaging/Server/StoreLog.hs b/src/Simplex/Messaging/Server/StoreLog.hs index 4ceb3cddd..8c69b4063 100644 --- a/src/Simplex/Messaging/Server/StoreLog.hs +++ b/src/Simplex/Messaging/Server/StoreLog.hs @@ -96,6 +96,7 @@ data SLRTag | NewService_ | QueueService_ +-- spec: spec/modules/Simplex/Messaging/Server/StoreLog.md#queuerec-strencoding--backward-compatible-parsing instance StrEncoding QueueRec where strEncode QueueRec {recipientKeys, rcvDhSecret, rcvServiceId, senderId, senderKey, queueMode, queueData, notifier, status, updatedAt} = B.concat @@ -242,6 +243,7 @@ closeStoreLog = \case where close_ h = hClose h `catchAny` \e -> logError ("STORE: closeStoreLog, error closing, " <> tshow e) +-- spec: spec/modules/Simplex/Messaging/Server/StoreLog.md#writestorelogrecord--atomicity-via-manual-write writeStoreLogRecord :: StrEncoding r => StoreLog 'WriteMode -> r -> IO () writeStoreLogRecord (WriteStoreLog _ h) r = E.uninterruptibleMask_ $ do B.hPut h $ strEncode r `B.snoc` '\n' -- hPutStrLn makes write non-atomic for length > 1024 @@ -289,6 +291,7 @@ logNewService s = writeStoreLogRecord s . NewService logQueueService :: (PartyI p, ServiceParty p) => StoreLog 'WriteMode -> RecipientId -> SParty p -> Maybe ServiceId -> IO () logQueueService s rId party = writeStoreLogRecord s . QueueService rId (ASP party) +-- spec: spec/modules/Simplex/Messaging/Server/StoreLog.md#readwritestorelog--crash-recovery-state-machine readWriteStoreLog :: (FilePath -> s -> IO ()) -> (StoreLog 'WriteMode -> s -> IO ()) -> FilePath -> s -> IO (StoreLog 'WriteMode) readWriteStoreLog readStore writeStore f st = ifM