From 66d7efa61ea03a771e555dce4216f9d8caf6191d Mon Sep 17 00:00:00 2001 From: "Evgeny @ SimpleX Chat" <259188159+evgeny-simplex@users.noreply.github.com> Date: Wed, 11 Mar 2026 08:53:57 +0000 Subject: [PATCH] some modules documented --- spec/README.md | 97 +++--- spec/TOPICS.md | 5 + spec/encoding.md | 290 +++++++++--------- spec/modules/README.md | 155 ++++++++++ spec/modules/Simplex/Messaging/Compression.md | 17 + spec/modules/Simplex/Messaging/Encoding.md | 41 +++ .../Simplex/Messaging/Encoding/String.md | 40 +++ spec/modules/Simplex/Messaging/Parsers.md | 21 ++ .../Simplex/Messaging/ServiceScheme.md | 7 + spec/modules/Simplex/Messaging/Session.md | 15 + spec/modules/Simplex/Messaging/SystemTime.md | 13 + spec/modules/Simplex/Messaging/TMap.md | 17 + spec/modules/Simplex/Messaging/Util.md | 52 ++++ spec/modules/Simplex/Messaging/Version.md | 27 ++ .../Simplex/Messaging/Version/Internal.md | 7 + spec/rcv-services.md | 42 +-- spec/version.md | 62 ++-- 17 files changed, 631 insertions(+), 277 deletions(-) create mode 100644 spec/TOPICS.md create mode 100644 spec/modules/README.md create mode 100644 spec/modules/Simplex/Messaging/Compression.md create mode 100644 spec/modules/Simplex/Messaging/Encoding.md create mode 100644 spec/modules/Simplex/Messaging/Encoding/String.md create mode 100644 spec/modules/Simplex/Messaging/Parsers.md create mode 100644 spec/modules/Simplex/Messaging/ServiceScheme.md create mode 100644 spec/modules/Simplex/Messaging/Session.md create mode 100644 spec/modules/Simplex/Messaging/SystemTime.md create mode 100644 spec/modules/Simplex/Messaging/TMap.md create mode 100644 spec/modules/Simplex/Messaging/Util.md create mode 100644 spec/modules/Simplex/Messaging/Version.md create mode 100644 spec/modules/Simplex/Messaging/Version/Internal.md diff --git a/spec/README.md b/spec/README.md index 7154aa957..c993f108d 100644 --- a/spec/README.md +++ b/spec/README.md @@ -2,66 +2,73 @@ > How does the code work? What does each function do? What are the security invariants? +## Structure + +Spec has two levels: + +### `spec/modules/` — Per-module documentation + +Mirrors the `src/Simplex/` directory structure exactly. Each `.hs` file has a corresponding `.md` file at the same relative path. Contains only information that is **not obvious from reading the code** and cannot fit in a one-line source comment: + +- Non-obvious behavior (subtle invariants, ordering dependencies, concurrency assumptions) +- Usage considerations (when to use X vs Y, common mistakes, caller obligations) +- Relationships to other modules not visible from imports +- Security notes specific to this module + +**Not included**: type signatures, code snippets, function-by-function prose that restates the source. If reading the code tells you everything, the module doc says so briefly. + +Function references use fully qualified names with markdown links: +``` +[Simplex.Messaging.Server.subscribeServiceMessages](./modules/Simplex/Messaging/Server.md#subscribeServiceMessages) +``` + +Source code links back via comments: +```haskell +-- spec: spec/modules/Simplex/Messaging/Server.md#subscribeServiceMessages +subscribeServiceMessages :: ... +``` + +### `spec/` root — Topic documentation + +Cross-module documentation that follows a feature, mechanism, or concern across the entire stack. Topics answer "how does X work end-to-end?" rather than "what does this file do?" + +Topics reference module docs rather than restating implementation details. They focus on: +- End-to-end data flow across modules +- Cross-cutting security analysis and invariants +- Design rationale, risks, test gaps +- Version gates and compatibility concerns + +Some topics may migrate to `product/` if they are primarily about user-visible behavior and guarantees rather than implementation mechanics. + +### `spec/security-invariants.md` — All security invariants + +Cross-referenced from both module docs and topic docs. + ## Conventions -Each spec file documents: -1. **Purpose** — What this component does -2. **Protocol reference** — Link to `protocol/` file (where applicable) -3. **Types** — Key data types with field descriptions -4. **Functions** — Every exported function with call graph -5. **Security notes** — Trust assumptions, validation requirements - -Function documentation format: +Module doc entry format: ``` -### Module.functionName +## functionName **Purpose**: ... -**Calls**: Module.a, Module.b -**Called by**: Module.c +**Calls**: [Module.a](./modules/path.md#a), [Module.b](./modules/path.md#b) +**Called by**: [Module.c](./modules/path.md#c) **Invariant**: SI-XX **Security**: ... ``` ## Index -### Protocol Implementation -- [smp-protocol.md](smp-protocol.md) — SMP commands, types, encoding -- [xftp-protocol.md](xftp-protocol.md) — XFTP commands, chunk operations -- [ntf-protocol.md](ntf-protocol.md) — NTF commands, token/subscription lifecycle -- [xrcp-protocol.md](xrcp-protocol.md) — XRCP session handshake, commands -- [agent-protocol.md](agent-protocol.md) — Agent connection procedures, queue rotation +### Topics -### Cryptography -- [crypto.md](crypto.md) — All primitives: Ed25519, X25519, NaCl, AES-GCM, SHA, HKDF -- [crypto-ratchet.md](crypto-ratchet.md) — Double ratchet + PQDR -- [crypto-tls.md](crypto-tls.md) — TLS setup, certificate chains, validation - -### Transport -- [transport.md](transport.md) — Transport abstraction, handshake, block padding -- [transport-http2.md](transport-http2.md) — HTTP/2 framing, file streaming -- [transport-websocket.md](transport-websocket.md) — WebSocket adapter - -### Server Implementations -- [smp-server.md](smp-server.md) — SMP server -- [xftp-server.md](xftp-server.md) — XFTP server -- [ntf-server.md](ntf-server.md) — Notification server - -### Client Implementations -- [smp-client.md](smp-client.md) — SMP client, proxy relay -- [xftp-client.md](xftp-client.md) — XFTP client -- [agent.md](agent.md) — SMP agent, duplex connections - -### Storage -- [storage-server.md](storage-server.md) — Server storage backends -- [storage-agent.md](storage-agent.md) — Agent storage backends - -### Auxiliary +- [rcv-services.md](rcv-services.md) — Service certificates for high-volume SMP clients (bulk subscription) - [encoding.md](encoding.md) — Binary and string encoding - [version.md](version.md) — Version ranges and negotiation -- [remote-control.md](remote-control.md) — XRCP implementation - [compression.md](compression.md) — Zstd compression -### Cross-cutting Features -- [rcv-services.md](rcv-services.md) — Service certificates for high-volume SMP clients (bulk subscription) +### Modules + +See `spec/modules/` — mirrors `src/Simplex/` structure. ### Security + - [security-invariants.md](security-invariants.md) — All security invariants diff --git a/spec/TOPICS.md b/spec/TOPICS.md new file mode 100644 index 000000000..a0c1f4eaf --- /dev/null +++ b/spec/TOPICS.md @@ -0,0 +1,5 @@ +# Topic Candidates + +> Cross-cutting patterns noticed during module documentation. Each entry may become a topic doc in `spec/` after all module docs are complete. + +- **Exception handling strategy**: `catchOwn`/`catchAll`/`tryAllErrors` pattern (defined in Util.hs) used across server, client, and agent modules. The three-category classification (synchronous, own-async, cancellation) and when to use which catch variant is not obvious from any single call site. diff --git a/spec/encoding.md b/spec/encoding.md index 3a4fdcd27..f5501cfab 100644 --- a/spec/encoding.md +++ b/spec/encoding.md @@ -15,8 +15,6 @@ Both are typeclasses with `MINIMAL` pragmas requiring `encode` + (`decode` | `pa ## Binary Encoding (`Encoding` class) -**Source**: `Encoding.hs:38-52` - ```haskell class Encoding a where smpEncode :: a -> ByteString @@ -26,31 +24,29 @@ class Encoding a where ### Length-prefix conventions -| Type | Prefix | Max size | Source | -|------|--------|----------|--------| -| `ByteString` | 1-byte length (Word8 as Char) | 255 bytes | `Encoding.hs:102-106` | -| `Large` (newtype) | 2-byte length (Word16 big-endian) | 65535 bytes | `Encoding.hs:135-143` | -| `Tail` (newtype) | None — consumes rest of input | Unlimited | `Encoding.hs:126-132` | -| Lists (`smpEncodeList`) | 1-byte count prefix, then concatenated items | 255 items | `Encoding.hs:155-159` | -| `NonEmpty` | Same as list (fails on count=0) | 255 items | `Encoding.hs:173-178` | +| Type | Prefix | Max size | +|------|--------|----------| +| `ByteString` | 1-byte length (Word8 as Char) | 255 bytes | +| `Large` (newtype) | 2-byte length (Word16 big-endian) | 65535 bytes | +| `Tail` (newtype) | None — consumes rest of input | Unlimited | +| Lists (`smpEncodeList`) | 1-byte count prefix, then concatenated items | 255 items | +| `NonEmpty` | Same as list (fails on count=0) | 255 items | ### Scalar types -| Type | Encoding | Bytes | Source | -|------|----------|-------|--------| -| `Char` | Raw byte | 1 | `Encoding.hs:54-58` | -| `Bool` | `'T'` / `'F'` (0x54 / 0x46) | 1 | `Encoding.hs:60-70` | -| `Word16` | Big-endian | 2 | `Encoding.hs:72-76` | -| `Word32` | Big-endian | 4 | `Encoding.hs:78-82` | -| `Int64` | Two big-endian Word32s (high then low) | 8 | `Encoding.hs:84-99` | -| `SystemTime` | `systemSeconds` as Int64 (nanoseconds dropped) | 8 | `Encoding.hs:145-149` | -| `Text` | UTF-8 then ByteString encoding (1-byte length prefix) | 1 + len | `Encoding.hs:161-165` | -| `String` | `B.pack` then ByteString encoding | 1 + len | `Encoding.hs:167-171` | +| Type | Encoding | Bytes | +|------|----------|-------| +| `Char` | Raw byte | 1 | +| `Bool` | `'T'` / `'F'` (0x54 / 0x46) | 1 | +| `Word16` | Big-endian | 2 | +| `Word32` | Big-endian | 4 | +| `Int64` | Two big-endian Word32s (high then low) | 8 | +| `SystemTime` | `systemSeconds` as Int64 (nanoseconds dropped) | 8 | +| `Text` | UTF-8 then ByteString encoding (1-byte length prefix) | 1 + len | +| `String` | `B.pack` then ByteString encoding | 1 + len | ### `Maybe a` -**Source**: `Encoding.hs:116-124` - ``` Nothing → '0' (0x30) Just x → '1' (0x31) ++ smpEncode x @@ -60,23 +56,19 @@ Tags are ASCII characters `'0'`/`'1'`, not binary 0x00/0x01. ### Tuples -**Source**: `Encoding.hs:180-220` - Tuples (2 through 8) encode as simple concatenation — no length prefix, no separator. Fields are parsed sequentially using each component's `smpP`. This works because each component's parser knows how many bytes to consume (via its own length prefix or fixed size). ### Combinators -| Function | Signature | Purpose | Source | -|----------|-----------|---------|--------| -| `_smpP` | `Parser a` | Space-prefixed parser (`A.space *> smpP`) | `Encoding.hs:151-152` | -| `smpEncodeList` | `[a] -> ByteString` | 1-byte count + concatenated items | `Encoding.hs:155-156` | -| `smpListP` | `Parser [a]` | Parse count then that many items | `Encoding.hs:158-159` | -| `lenEncode` | `Int -> Char` | Int to single-byte length char | `Encoding.hs:108-110` | +| Function | Signature | Purpose | +|----------|-----------|---------| +| `_smpP` | `Parser a` | Space-prefixed parser (`A.space *> smpP`) | +| `smpEncodeList` | `[a] -> ByteString` | 1-byte count + concatenated items | +| `smpListP` | `Parser [a]` | Parse count then that many items | +| `lenEncode` | `Int -> Char` | Int to single-byte length char | ## String Encoding (`StrEncoding` class) -**Source**: `Encoding/String.hs:56-67` - ```haskell class StrEncoding a where strEncode :: a -> ByteString @@ -88,39 +80,35 @@ Key difference from `Encoding`: the default `strP` parses base64url input first, ### Instance conventions -| Type | Encoding | Source | -|------|----------|--------| -| `ByteString` | base64url (non-empty required) | `String.hs:70-76` | -| `Word16`, `Word32` | Decimal string | `String.hs:114-124` | -| `Int`, `Int64` | Signed decimal | `String.hs:138-148` | -| `Char`, `Bool` | Delegates to `Encoding` (`smpEncode`/`smpP`) | `String.hs:126-136` | -| `Maybe a` | Empty string = `Nothing`, otherwise `strEncode a` | `String.hs:108-112` | -| `Text` | UTF-8 bytes, parsed until space/newline | `String.hs:97-99` | -| `SystemTime` | `systemSeconds` as Int64 (decimal) | `String.hs:150-152` | -| `UTCTime` | ISO 8601 string | `String.hs:154-156` | -| `CertificateChain` | Comma-separated base64url blobs | `String.hs:158-162` | -| `Fingerprint` | base64url of fingerprint bytes | `String.hs:164-168` | +| Type | Encoding | +|------|----------| +| `ByteString` | base64url (non-empty required) | +| `Word16`, `Word32` | Decimal string | +| `Int`, `Int64` | Signed decimal | +| `Char`, `Bool` | Delegates to `Encoding` (`smpEncode`/`smpP`) | +| `Maybe a` | Empty string = `Nothing`, otherwise `strEncode a` | +| `Text` | UTF-8 bytes, parsed until space/newline | +| `SystemTime` | `systemSeconds` as Int64 (decimal) | +| `UTCTime` | ISO 8601 string | +| `CertificateChain` | Comma-separated base64url blobs | +| `Fingerprint` | base64url of fingerprint bytes | ### Collection encoding -| Type | Separator | Source | -|------|-----------|--------| -| Lists (`strEncodeList`) | Comma `,` | `String.hs:171-175` | -| `NonEmpty` | Comma (fails on empty) | `String.hs:178-180` | -| `Set a` | Comma | `String.hs:182-184` | -| `IntSet` | Comma | `String.hs:186-188` | -| Tuples (2-6) | Space (` `) | `String.hs:193-221` | +| Type | Separator | +|------|-----------| +| Lists (`strEncodeList`) | Comma `,` | +| `NonEmpty` | Comma (fails on empty) | +| `Set a` | Comma | +| `IntSet` | Comma | +| Tuples (2-6) | Space (` `) | ### `Str` newtype -**Source**: `String.hs:84-89` - Raw string (not base64url-encoded). Parses until space, consumes trailing space. Used for string-valued protocol fields that should not be base64-encoded. ### `TextEncoding` class -**Source**: `String.hs:51-53` - ```haskell class TextEncoding a where textEncode :: a -> Text @@ -131,14 +119,14 @@ Separate from `StrEncoding` — operates on `Text` rather than `ByteString`. Use ### JSON bridge functions -| Function | Purpose | Source | -|----------|---------|--------| -| `strToJSON` | `StrEncoding a => a -> J.Value` via `decodeLatin1 . strEncode` | `String.hs:229-231` | -| `strToJEncoding` | Same, for Aeson encoding | `String.hs:233-235` | -| `strParseJSON` | `StrEncoding a => String -> J.Value -> JT.Parser a` — parse JSON string via `strP` | `String.hs:237-238` | -| `textToJSON` | `TextEncoding a => a -> J.Value` | `String.hs:240-242` | -| `textToEncoding` | Same, for Aeson encoding | `String.hs:244-246` | -| `textParseJSON` | `TextEncoding a => String -> J.Value -> JT.Parser a` | `String.hs:248-249` | +| Function | Purpose | +|----------|---------| +| `strToJSON` | `StrEncoding a => a -> J.Value` via `decodeLatin1 . strEncode` | +| `strToJEncoding` | Same, for Aeson encoding | +| `strParseJSON` | `StrEncoding a => String -> J.Value -> JT.Parser a` — parse JSON string via `strP` | +| `textToJSON` | `TextEncoding a => a -> J.Value` | +| `textToEncoding` | Same, for Aeson encoding | +| `textParseJSON` | `TextEncoding a => String -> J.Value -> JT.Parser a` | ## Parsers @@ -146,45 +134,43 @@ Separate from `StrEncoding` — operates on `Text` rather than `ByteString`. Use ### Core parsing functions -| Function | Signature | Purpose | Source | -|----------|-----------|---------|--------| -| `parseAll` | `Parser a -> ByteString -> Either String a` | Parse consuming all input (fails if bytes remain) | `Parsers.hs:64-65` | -| `parse` | `Parser a -> e -> ByteString -> Either e a` | `parseAll` with custom error type (discards error string) | `Parsers.hs:61-62` | -| `parseE` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | `parseAll` lifted into `ExceptT` | `Parsers.hs:67-68` | -| `parseE'` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | Like `parseE` but allows trailing input | `Parsers.hs:70-71` | -| `parseRead1` | `Read a => Parser a` | Parse a word then `readMaybe` it | `Parsers.hs:76-77` | -| `parseString` | `(ByteString -> Either String a) -> String -> a` | Parse from `String` (errors with `error`) | `Parsers.hs:89-90` | +| Function | Signature | Purpose | +|----------|-----------|---------| +| `parseAll` | `Parser a -> ByteString -> Either String a` | Parse consuming all input (fails if bytes remain) | +| `parse` | `Parser a -> e -> ByteString -> Either e a` | `parseAll` with custom error type (discards error string) | +| `parseE` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | `parseAll` lifted into `ExceptT` | +| `parseE'` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | Like `parseE` but allows trailing input | +| `parseRead1` | `Read a => Parser a` | Parse a word then `readMaybe` it | +| `parseString` | `(ByteString -> Either String a) -> String -> a` | Parse from `String` (errors with `error`) | ### `base64P` -**Source**: `Parsers.hs:44-53` - Standard base64 parser (not base64url — uses `+`/`/` alphabet). Takes alphanumeric + `+`/`/` characters, optional `=` padding, then decodes. Contrast with `base64urlP` in `Encoding/String.hs` which uses `-`/`_` alphabet. ### JSON options helpers Platform-conditional JSON encoding for cross-platform compatibility (Haskell ↔ Swift). -| Function | Purpose | Source | -|----------|---------|--------| -| `enumJSON` | All-nullary constructors as strings, with tag modifier | `Parsers.hs:101-106` | -| `sumTypeJSON` | Platform-conditional: `taggedObjectJSON` on non-Darwin, `singleFieldJSON` on Darwin | `Parsers.hs:109-114` | -| `taggedObjectJSON` | `{"type": "Tag", "data": {...}}` format | `Parsers.hs:119-128` | -| `singleFieldJSON` | `{"Tag": value}` format | `Parsers.hs:137-149` | -| `defaultJSON` | Default options with `omitNothingFields = True` | `Parsers.hs:151-152` | +| Function | Purpose | +|----------|---------| +| `enumJSON` | All-nullary constructors as strings, with tag modifier | +| `sumTypeJSON` | Platform-conditional: `taggedObjectJSON` on non-Darwin, `singleFieldJSON` on Darwin | +| `taggedObjectJSON` | `{"type": "Tag", "data": {...}}` format | +| `singleFieldJSON` | `{"Tag": value}` format | +| `defaultJSON` | Default options with `omitNothingFields = True` | Pattern synonyms for JSON field names: -- `TaggedObjectJSONTag = "type"` (`Parsers.hs:131`) -- `TaggedObjectJSONData = "data"` (`Parsers.hs:134`) -- `SingleFieldJSONTag = "_owsf"` (`Parsers.hs:117`) +- `TaggedObjectJSONTag = "type"` +- `TaggedObjectJSONData = "data"` +- `SingleFieldJSONTag = "_owsf"` ### String helpers -| Function | Purpose | Source | -|----------|---------|--------| -| `fstToLower` | Lowercase first character | `Parsers.hs:92-94` | -| `dropPrefix` | Remove prefix string, lowercase remainder | `Parsers.hs:96-99` | -| `textP` | Parse rest of input as UTF-8 `String` | `Parsers.hs:154-155` | +| Function | Purpose | +|----------|---------| +| `fstToLower` | Lowercase first character | +| `dropPrefix` | Remove prefix string, lowercase remainder | +| `textP` | Parse rest of input as UTF-8 `String` | ## Auxiliary Types and Utilities @@ -198,19 +184,19 @@ type TMap k a = TVar (Map k a) STM-based concurrent map. Wraps `Data.Map.Strict` in a `TVar`. All mutations use `modifyTVar'` (strict) to prevent thunk accumulation. -| Function | Notes | Source | -|----------|-------|--------| -| `emptyIO` | IO allocation (`newTVarIO`) | `TMap.hs:32-34` | -| `singleton` | STM allocation | `TMap.hs:36-38` | -| `clear` | Reset to empty | `TMap.hs:40-42` | -| `lookup` / `lookupIO` | STM / non-transactional IO read | `TMap.hs:48-54` | -| `member` / `memberIO` | STM / non-transactional IO membership | `TMap.hs:56-62` | -| `insert` / `insertM` | Insert value / insert from STM action | `TMap.hs:64-70` | -| `delete` | Remove key | `TMap.hs:72-74` | -| `lookupInsert` | Atomic lookup-then-insert (returns old value) | `TMap.hs:76-78` | -| `lookupDelete` | Atomic lookup-then-delete | `TMap.hs:80-82` | -| `adjust` / `update` / `alter` / `alterF` | Standard Map operations lifted to STM | `TMap.hs:84-100` | -| `union` | Merge `Map` into `TMap` | `TMap.hs:102-104` | +| Function | Notes | +|----------|-------| +| `emptyIO` | IO allocation (`newTVarIO`) | +| `singleton` | STM allocation | +| `clear` | Reset to empty | +| `lookup` / `lookupIO` | STM / non-transactional IO read | +| `member` / `memberIO` | STM / non-transactional IO membership | +| `insert` / `insertM` | Insert value / insert from STM action | +| `delete` | Remove key | +| `lookupInsert` | Atomic lookup-then-insert (returns old value) | +| `lookupDelete` | Atomic lookup-then-delete | +| `adjust` / `update` / `alter` / `alterF` | Standard Map operations lifted to STM | +| `union` | Merge `Map` into `TMap` | `lookupIO`/`memberIO` use `readTVarIO` — single-read outside STM transaction, useful when you need a snapshot without composing with other STM operations. @@ -228,13 +214,13 @@ data SessionVar a = SessionVar } ``` -| Function | Purpose | Source | -|----------|---------|--------| -| `getSessVar` | Lookup or create session. Returns `Left new` or `Right existing` | `Session.hs:24-33` | -| `removeSessVar` | Delete session only if ID matches (prevents removing a replacement) | `Session.hs:35-39` | -| `tryReadSessVar` | Non-blocking read of session result | `Session.hs:41-42` | +| Function | Purpose | +|----------|---------| +| `getSessVar` | Lookup or create session. Returns `Left new` or `Right existing` | +| `removeSessVar` | Delete session only if ID matches (prevents removing a replacement) | +| `tryReadSessVar` | Non-blocking read of session result | -The ID-match check in `removeSessVar` (`sessionVarId v == sessionVarId v'`) prevents a race where: +The ID-match check in `removeSessVar` prevents a race where: 1. Thread A creates session #5, starts work 2. Thread B creates session #6 (replacing #5 in TMap) 3. Thread A finishes, tries to remove — ID mismatch, removal blocked @@ -250,7 +236,7 @@ data SrvLoc = SrvLoc HostName ServiceName URI scheme for SimpleX service addresses. `SSSimplex` encodes as `"simplex:"`, `SSAppServer` as `"https://host:port"`. -`simplexChat :: ServiceScheme` is the constant `SSAppServer (SrvLoc "simplex.chat" "")` (`ServiceScheme.hs:38-39`). +`simplexChat` is the constant `SSAppServer (SrvLoc "simplex.chat" "")`. ### SystemTime @@ -264,12 +250,12 @@ type SystemSeconds = RoundedSystemTime 1 -- second precision Phantom-typed time rounding. The `Nat` type parameter specifies rounding granularity in seconds. -| Function | Purpose | Source | -|----------|---------|--------| -| `getRoundedSystemTime` | Get current time rounded to `t` seconds | `SystemTime.hs:40-43` | -| `getSystemDate` | Alias for day-rounded time | `SystemTime.hs:45-47` | -| `getSystemSeconds` | Second-precision (no rounding needed, just drops nanoseconds) | `SystemTime.hs:49-51` | -| `roundedToUTCTime` | Convert back to `UTCTime` | `SystemTime.hs:53-55` | +| Function | Purpose | +|----------|---------| +| `getRoundedSystemTime` | Get current time rounded to `t` seconds | +| `getSystemDate` | Alias for day-rounded time | +| `getSystemSeconds` | Second-precision (no rounding needed, just drops nanoseconds) | +| `roundedToUTCTime` | Convert back to `UTCTime` | `RoundedSystemTime` derives `FromField`/`ToField` for SQLite storage and `FromJSON`/`ToJSON` for API serialization. @@ -281,62 +267,62 @@ Selected utilities used across the codebase: **Monadic combinators**: -| Function | Signature | Purpose | Source | -|----------|-----------|---------|--------| -| `<$?>` | `MonadFail m => (a -> Either String b) -> m a -> m b` | Lift fallible function into parser | `Util.hs:119-121` | -| `$>>=` | `(Monad m, Monad f, Traversable f) => m (f a) -> (a -> m (f b)) -> m (f b)` | Monadic bind through nested monad | `Util.hs:165-167` | -| `ifM` / `whenM` / `unlessM` | Monadic conditionals | `Util.hs:147-157` | -| `anyM` | Short-circuit `any` for monadic predicates (strict) | `Util.hs:159-161` | +| Function | Signature | Purpose | +|----------|-----------|---------| +| `<$?>` | `MonadFail m => (a -> Either String b) -> m a -> m b` | Lift fallible function into parser | +| `$>>=` | `(Monad m, Monad f, Traversable f) => m (f a) -> (a -> m (f b)) -> m (f b)` | Monadic bind through nested monad | +| `ifM` / `whenM` / `unlessM` | Monadic conditionals | | +| `anyM` | Short-circuit `any` for monadic predicates (strict) | | **Error handling**: -| Function | Purpose | Source | -|----------|---------|--------| -| `tryAllErrors` | Catch all exceptions (including async) into `ExceptT` | `Util.hs:273-275` | -| `catchAllErrors` | Same with handler | `Util.hs:281-283` | -| `tryAllOwnErrors` | Catch only "own" exceptions (re-throws async cancellation) | `Util.hs:322-324` | -| `catchAllOwnErrors` | Same with handler | `Util.hs:330-332` | -| `isOwnException` | `StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded` | `Util.hs:297-304` | -| `isAsyncCancellation` | Any `SomeAsyncException` except own exceptions | `Util.hs:306-310` | -| `catchThrow` | Catch exceptions, wrap in Left | `Util.hs:289-291` | -| `allFinally` | `tryAllErrors` + `final` + `except` (like `finally` for ExceptT) | `Util.hs:293-295` | +| Function | Purpose | +|----------|---------| +| `tryAllErrors` | Catch all exceptions (including async) into `ExceptT` | +| `catchAllErrors` | Same with handler | +| `tryAllOwnErrors` | Catch only "own" exceptions (re-throws async cancellation) | +| `catchAllOwnErrors` | Same with handler | +| `isOwnException` | `StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded` | +| `isAsyncCancellation` | Any `SomeAsyncException` except own exceptions | +| `catchThrow` | Catch exceptions, wrap in Left | +| `allFinally` | `tryAllErrors` + `final` + `except` (like `finally` for ExceptT) | The own-vs-async distinction is critical: `catchOwn`/`tryAllOwnErrors` never swallow async cancellation (`ThreadKilled`, `UserInterrupt`, etc.), only synchronous exceptions and resource exhaustion (`StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded`). **STM**: -| Function | Purpose | Source | -|----------|---------|--------| -| `tryWriteTBQueue` | Non-blocking bounded queue write, returns success | `Util.hs:256-261` | +| Function | Purpose | +|----------|---------| +| `tryWriteTBQueue` | Non-blocking bounded queue write, returns success | **Database result helpers**: -| Function | Purpose | Source | -|----------|---------|--------| -| `firstRow` | Extract first row with transform, or Left error | `Util.hs:346-347` | -| `maybeFirstRow` | Extract first row as Maybe | `Util.hs:349-350` | -| `firstRow'` | Like `firstRow` but transform can also fail | `Util.hs:355-356` | +| Function | Purpose | +|----------|---------| +| `firstRow` | Extract first row with transform, or Left error | +| `maybeFirstRow` | Extract first row as Maybe | +| `firstRow'` | Like `firstRow` but transform can also fail | **Collection utilities**: -| Function | Purpose | Source | -|----------|---------|--------| -| `groupOn` | `groupBy` using equality on projected key | `Util.hs:358-359` | -| `groupAllOn` | `groupOn` after `sortOn` (groups non-adjacent elements) | `Util.hs:372-373` | -| `toChunks` | Split list into `NonEmpty` chunks of size n | `Util.hs:376-380` | -| `packZipWith` | Optimized ByteString zipWith (direct memory access) | `Util.hs:236-254` | +| Function | Purpose | +|----------|---------| +| `groupOn` | `groupBy` using equality on projected key | +| `groupAllOn` | `groupOn` after `sortOn` (groups non-adjacent elements) | +| `toChunks` | Split list into `NonEmpty` chunks of size n | +| `packZipWith` | Optimized ByteString zipWith (direct memory access) | **Miscellaneous**: -| Function | Purpose | Source | -|----------|---------|--------| -| `safeDecodeUtf8` | Decode UTF-8 replacing errors with `'?'` | `Util.hs:382-386` | -| `bshow` / `tshow` | `show` to `ByteString` / `Text` | `Util.hs:123-129` | -| `threadDelay'` | `Int64` delay (handles overflow by looping) | `Util.hs:391-399` | -| `diffToMicroseconds` / `diffToMilliseconds` | `NominalDiffTime` conversion | `Util.hs:401-407` | -| `labelMyThread` | Label current thread for debugging | `Util.hs:409-410` | -| `encodeJSON` / `decodeJSON` | `ToJSON a => a -> Text` / `FromJSON a => Text -> Maybe a` | `Util.hs:415-421` | -| `traverseWithKey_` | `Map` traversal discarding results | `Util.hs:423-425` | +| Function | Purpose | +|----------|---------| +| `safeDecodeUtf8` | Decode UTF-8 replacing errors with `'?'` | +| `bshow` / `tshow` | `show` to `ByteString` / `Text` | +| `threadDelay'` | `Int64` delay (handles overflow by looping) | +| `diffToMicroseconds` / `diffToMilliseconds` | `NominalDiffTime` conversion | +| `labelMyThread` | Label current thread for debugging | +| `encodeJSON` / `decodeJSON` | `ToJSON a => a -> Text` / `FromJSON a => Text -> Maybe a` | +| `traverseWithKey_` | `Map` traversal discarding results | ## Security notes diff --git a/spec/modules/README.md b/spec/modules/README.md new file mode 100644 index 000000000..1d18b32e3 --- /dev/null +++ b/spec/modules/README.md @@ -0,0 +1,155 @@ +# How to Document a Module + +> Read this before writing any module doc. It defines what goes in, what stays out, and why. + +## Purpose + +Module docs exist for one reason: to capture knowledge that **cannot be obtained by reading the source code**. If reading the `.hs` file tells you everything you need to know, the module doc should be brief or empty. + +These docs are an investment — their value compounds over time as multiple people (and LLMs) work on the code. Optimize for long-term value, not for looking thorough today. + +## Process + +**Read every line of the source file.** The non-obvious filter applies to what you *write*, not to what you *read*. Without reading each line, you will produce documentation from inferences rather than facts. Many non-obvious behaviors only become visible when you see a specific line of code and recognize that its implications would surprise a reader who doesn't have the surrounding context. + +## File structure + +Module docs mirror `src/Simplex/` exactly. Same subfolder structure, `.hs` replaced with `.md`: + +``` +src/Simplex/Messaging/Server.hs → spec/modules/Simplex/Messaging/Server.md +src/Simplex/Messaging/Crypto.hs → spec/modules/Simplex/Messaging/Crypto.md +src/Simplex/FileTransfer/Agent.hs → spec/modules/Simplex/FileTransfer/Agent.md +``` + +## What to include + +### 1. Non-obvious behavior +Things that would surprise a competent Haskell developer reading the code for the first time: +- Subtle invariants maintained across function calls +- Ordering dependencies ("must call X before Y because...") +- Concurrency assumptions ("this TVar is only written from thread Z") +- Implicit contracts between caller and callee not captured by types + +### 2. Usage considerations +- When to use function X vs function Y +- Common mistakes callers make +- Caller obligations not enforced by the type system +- Performance characteristics that affect usage decisions + +### 3. Cross-module relationships +- Dependencies on other modules' behavior not visible from import lists +- Assumptions about how other modules use this one +- Coordination patterns (e.g., "Server.hs reads this TVar, Agent.hs writes it") + +### 4. Security notes +- Trust boundaries this module enforces or relies on +- What happens if inputs are malicious +- Which functions are security-critical and why (reference SI-XX invariants) + +### 5. Design rationale +- Why the code is structured this way (when not obvious) +- Alternatives considered and rejected +- Known limitations and their justification + +## What NOT to include + +- **Type signatures** — the code has them +- **Code snippets** — if you're pasting code, you're making a stale copy +- **Function-by-function prose that restates the implementation** — "this function takes X and returns Y by doing Z" adds nothing +- **Line numbers** — they're brittle and break on every edit +- **Comments that fit in one line in source** — put those in the source file instead as `-- spec:` comments + +## Format + +Each module doc has a header, then entries for functions/types that need documentation. + +```markdown +# Module.Name + +> One-line description of what this module does. + +**Source**: [`Path/To/Module.hs`](relative link to source) + +## Overview + +[Only if the module's purpose or architecture is non-obvious. +Skip for simple modules.] + +## functionName + +**Purpose**: [What this does that isn't obvious from the name and type] +**Calls**: [Qualified.Name.a](link), [Qualified.Name.b](link) +**Called by**: [Qualified.Name.c](link) +**Invariant**: SI-XX +**Security**: [What this function ensures for the threat model] + +[Free-form notes about non-obvious behavior, gotchas, etc.] + +## anotherFunction + +... +``` + +**For trivial modules** (< 100 LOC, no non-obvious behavior): + +```markdown +# Module.Name + +> One-line description. + +**Source**: [`Path/To/Module.hs`](relative link to source) + +No non-obvious behavior. See source. +``` + +This is valuable — it confirms someone looked and found nothing to document. + +## Linking conventions + +### Module doc → other module docs +Use fully qualified names as link text: +```markdown +[Simplex.Messaging.Server.subscribeServiceMessages](./Simplex/Messaging/Server.md#subscribeServiceMessages) +``` + +### Module doc → topic docs +```markdown +See [rcv-services](../rcv-services.md) for the end-to-end service subscription flow. +``` + +### Source → module doc +Comment above function in source: +```haskell +-- spec: spec/modules/Simplex/Messaging/Server.md#subscribeServiceMessages +-- Delivers buffered messages for all service queues after SUBS (SI-SVC-07) +subscribeServiceMessages :: ... +``` + +Only add `-- spec:` comments where the module doc actually has something to say. Don't add links to "No non-obvious behavior" docs. + +## Topic candidate tracking + +While documenting modules, you will notice cross-cutting patterns — behaviors that span multiple modules and can't be understood from any single one. Note these in `spec/TOPICS.md` for later. Don't write the topic doc during module work; just record: + +```markdown +- **Queue rotation**: Agent.hs initiates, Client.hs sends commands, Server.hs processes, + Protocol.hs defines types. End-to-end flow not obvious from any single module. +``` + +## Quality bar + +Before finishing a module doc, ask: +1. Does every entry document something NOT in the source code? +2. Would removing any entry lose information? If not, remove it. +3. Are cross-module relationships captured that imports alone don't reveal? +4. Are security-critical functions flagged with invariant IDs? +5. Is this doc short enough that someone will actually read it? + +If any answer reveals a problem, fix it and repeat from question 1. Only finish when a full pass produces no changes. + +## Exclusions + +- **Individual migration files** (M20XXXXXX_*.hs): Self-describing SQL. No per-migration docs. +- **Auto-generated files** (GitCommit.hs): Skip. +- **Pure boilerplate** (Prometheus.hs metrics, Web/Embedded.hs static files): Document only if non-obvious. diff --git a/spec/modules/Simplex/Messaging/Compression.md b/spec/modules/Simplex/Messaging/Compression.md new file mode 100644 index 000000000..67c7317da --- /dev/null +++ b/spec/modules/Simplex/Messaging/Compression.md @@ -0,0 +1,17 @@ +# Simplex.Messaging.Compression + +> Zstd compression with passthrough for short messages. + +**Source**: [`Compression.hs`](../../../../src/Simplex/Messaging/Compression.hs) + +## compress1 + +Messages <= 180 bytes are wrapped as `Passthrough` (no compression). The threshold is empirically derived from real client data — messages above 180 bytes rapidly gain compression ratio. + +## decompress1 + +**Security**: decompression bomb protection. Requires `decompressedSize` to be present in the zstd frame header AND within the caller-specified `limit`. If the compressed data doesn't declare its decompressed size (non-standard zstd frames), decompression is refused entirely. This prevents memory exhaustion from malicious compressed payloads. + +## Wire format + +Tag byte `'0'` (0x30) = passthrough (1-byte length prefix, raw data). Tag byte `'1'` (0x31) = compressed (2-byte `Large` length prefix, zstd data). The passthrough path uses the standard `ByteString` encoding (255-byte limit); the compressed path uses `Large` (65535-byte limit). diff --git a/spec/modules/Simplex/Messaging/Encoding.md b/spec/modules/Simplex/Messaging/Encoding.md new file mode 100644 index 000000000..f485aeaa4 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Encoding.md @@ -0,0 +1,41 @@ +# Simplex.Messaging.Encoding + +> Binary wire-format encoding for SMP protocol transmission. + +**Source**: [`Encoding.hs`](../../../../src/Simplex/Messaging/Encoding.hs) + +## Overview + +`Encoding` is the binary wire format — fixed-size or length-prefixed, no delimiters between fields. Contrast with [Simplex.Messaging.Encoding.String](./Encoding/String.md) which is the human-readable, space-delimited, base64url format used in URIs and logs. + +The two encoding classes share some instances (`Char`, `Bool`, `SystemTime`) but differ fundamentally: `Encoding` is self-delimiting via length prefixes, `StrEncoding` is delimiter-based (spaces, commas). + +## ByteString instance + +**Length prefix is 1 byte.** Maximum encodable length is 255 bytes. If a ByteString exceeds 255 bytes, the length silently wraps via `w2c . fromIntegral` — a 300-byte string encodes length as 44 (300 mod 256). Callers must ensure ByteStrings fit in 255 bytes, or use `Large` for longer values. + +**Security**: silent truncation means a caller encoding untrusted input without length validation could produce a malformed message where the decoder reads fewer bytes than were intended, then misparses the remainder as the next field. + +## Large + +2-byte length prefix (`Word16`). Use for ByteStrings that may exceed 255 bytes. Maximum 65535 bytes. + +## Maybe instance + +Tags are ASCII characters `'0'` (0x30) and `'1'` (0x31), not bytes 0x00/0x01. `Nothing` encodes as the single byte 0x30; `Just x` encodes as 0x31 followed by `smpEncode x`. + +## Tail + +Consumes all remaining input. Must be the last field in any composite encoding — placing it elsewhere silently eats subsequent fields. + +## Tuple instances + +Sequential concatenation with no separators. Works because each element's encoding is self-delimiting (length-prefixed ByteString, fixed-size Word16/Word32/Int64/Char, etc.). If an element type isn't self-delimiting, the tuple won't round-trip. + +## SystemTime + +Only seconds are encoded (as Int64); nanoseconds are discarded on encode and set to 0 on decode. + +## smpEncodeList / smpListP + +1-byte length prefix for lists — same 255-item limit as ByteString's 255-byte limit. diff --git a/spec/modules/Simplex/Messaging/Encoding/String.md b/spec/modules/Simplex/Messaging/Encoding/String.md new file mode 100644 index 000000000..60ac9e496 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Encoding/String.md @@ -0,0 +1,40 @@ +# Simplex.Messaging.Encoding.String + +> Human-readable, URI-friendly string encoding for SMP and agent protocols. + +**Source**: [`Encoding/String.hs`](../../../../../src/Simplex/Messaging/Encoding/String.hs) + +## Overview + +`StrEncoding` is the human-readable counterpart to [Simplex.Messaging.Encoding](../Encoding.md)'s binary `Encoding`. Key differences: + +| Aspect | `Encoding` (binary) | `StrEncoding` (string) | +|--------|---------------------|------------------------| +| ByteString | 1-byte length prefix, raw bytes | base64url encoded | +| Tuple separator | none (self-delimiting) | space-delimited | +| List separator | 1-byte count prefix | comma-separated | +| Default parser fallback | `smpP` via `parseAll` | `strP` via `base64urlP` | + +## ByteString instance + +Encodes as base64url. The parser (`strP`) only accepts non-empty strings — empty base64url input fails. + +## String instance + +Inherits from ByteString via `B.pack` / `B.unpack`. Only Char8 (Latin-1) characters round-trip; `B.pack` truncates unicode codepoints above 255. The source comment warns about this. + +## strToJSON / strParseJSON + +`strToJSON` uses `decodeLatin1`, not `decodeUtf8'`. This preserves arbitrary byte sequences (e.g., base64url-encoded binary data) as JSON strings without UTF-8 validation errors, but means the JSON representation is Latin-1, not UTF-8. + +## Default strP fallback + +If only `strDecode` is defined (no custom `strP`), the default parser runs `base64urlP` first, then passes the decoded bytes to `strDecode`. This means the type's own `strDecode` receives raw bytes, not the base64url text. Easy to confuse when implementing a new instance. + +## listItem + +Items are delimited by `,`, ` `, or `\n`. List items cannot contain these characters in their `strEncode` output. No escaping mechanism exists. + +## Str newtype + +Plain text (no base64). Delimited by spaces. `strP` consumes the trailing space — this is unusual and means `Str` parsing has a side effect on the input position that other `StrEncoding` parsers don't. diff --git a/spec/modules/Simplex/Messaging/Parsers.md b/spec/modules/Simplex/Messaging/Parsers.md new file mode 100644 index 000000000..d6b054378 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Parsers.md @@ -0,0 +1,21 @@ +# Simplex.Messaging.Parsers + +> Attoparsec helpers and Aeson JSON encoding options. + +**Source**: [`Parsers.hs`](../../../../src/Simplex/Messaging/Parsers.hs) + +## sumTypeJSON (platform-dependent JSON encoding) + +On Darwin with the `swiftJSON` CPP flag, `sumTypeJSON` uses `ObjectWithSingleField` encoding with tag `"_owsf"`. On all other platforms, it uses `TaggedObject` encoding with `"type"` / `"data"` keys. + +This means the same Haskell type produces **different JSON** on macOS/iOS vs Linux. Cross-platform JSON interchange must use `taggedObjectJSON` or `singleFieldJSON` directly, not `sumTypeJSON`. + +The `_owsf` tag enables Swift clients to convert between the two encodings — it's a marker that the value was encoded as ObjectWithSingleField rather than TaggedObject. + +## parseE vs parseE' + +`parseE` requires full input consumption (`endOfInput`). `parseE'` does not — it succeeds if the parser matches a prefix. Using `parseE'` where `parseE` is needed silently ignores trailing input. + +## base64P + +Parses standard base64 (`+` and `/`), not base64url (`-` and `_`). Contrast with `base64urlP` in [Simplex.Messaging.Encoding.String](./Encoding/String.md) which parses URL-safe base64. diff --git a/spec/modules/Simplex/Messaging/ServiceScheme.md b/spec/modules/Simplex/Messaging/ServiceScheme.md new file mode 100644 index 000000000..409e8854d --- /dev/null +++ b/spec/modules/Simplex/Messaging/ServiceScheme.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.ServiceScheme + +> URI scheme for SimpleX service addresses. + +**Source**: [`ServiceScheme.hs`](../../../../src/Simplex/Messaging/ServiceScheme.hs) + +No non-obvious behavior. See source. diff --git a/spec/modules/Simplex/Messaging/Session.md b/spec/modules/Simplex/Messaging/Session.md new file mode 100644 index 000000000..22c5c90ca --- /dev/null +++ b/spec/modules/Simplex/Messaging/Session.md @@ -0,0 +1,15 @@ +# Simplex.Messaging.Session + +> Atomic get-or-create session variables with identity-safe removal. + +**Source**: [`Session.hs`](../../../../src/Simplex/Messaging/Session.hs) + +## getSessVar + +Returns `Left newVar` if the key was absent (variable created), `Right existingVar` if already present. The new variable gets an atomically incremented `sessionVarId` from the shared counter, and its `sessionVar` TMVar starts empty. + +The caller uses the `Left`/`Right` distinction to decide whether to populate the TMVar (new session) or wait on the existing one. + +## removeSessVar + +Only removes if the stored variable's `sessionVarId` matches the one being removed. This is a compare-and-swap pattern: between the time a caller obtained a `SessionVar` and the time it tries to remove it, another thread may have replaced it with a new session (via `getSessVar`). Without the ID check, the stale caller would remove the new session. diff --git a/spec/modules/Simplex/Messaging/SystemTime.md b/spec/modules/Simplex/Messaging/SystemTime.md new file mode 100644 index 000000000..92bf8e546 --- /dev/null +++ b/spec/modules/Simplex/Messaging/SystemTime.md @@ -0,0 +1,13 @@ +# Simplex.Messaging.SystemTime + +> Type-level precision timestamps for date bucketing and expiration. + +**Source**: [`SystemTime.hs`](../../../../src/Simplex/Messaging/SystemTime.hs) + +## getRoundedSystemTime + +Rounds **down** (truncation): `(seconds / precision) * precision`. A timestamp at 23:59:59 with `SystemDate` (precision 86400) rounds to the start of the current day, not the nearest day. + +## roundedToUTCTime + +Sets nanoseconds to 0. Any `RoundedSystemTime` converted to `UTCTime` and back to `SystemTime` will differ from the original `getSystemTime` value. diff --git a/spec/modules/Simplex/Messaging/TMap.md b/spec/modules/Simplex/Messaging/TMap.md new file mode 100644 index 000000000..f994adab1 --- /dev/null +++ b/spec/modules/Simplex/Messaging/TMap.md @@ -0,0 +1,17 @@ +# Simplex.Messaging.TMap + +> STM-safe concurrent map (`TVar (Map k a)`). + +**Source**: [`TMap.hs`](../../../../src/Simplex/Messaging/TMap.hs) + +## lookupInsert / lookupDelete + +Atomic swap operations using `stateTVar` + `alterF`. `lookupInsert` returns the previous value (if any) while inserting the new one; `lookupDelete` returns the value while removing it. Both are single STM operations — no window between lookup and modification. + +## union + +Left-biased: the passed-in `Map` wins on key conflicts. `union additions tmap` overwrites existing keys in `tmap` with values from `additions`. + +## alterF + +The STM action `f` runs inside the same STM transaction. If `f` retries, the entire `alterF` retries. If `f` has side effects via other TVars, they compose atomically with the map modification. diff --git a/spec/modules/Simplex/Messaging/Util.md b/spec/modules/Simplex/Messaging/Util.md new file mode 100644 index 000000000..3b9fd3777 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Util.md @@ -0,0 +1,52 @@ +# Simplex.Messaging.Util + +> Shared utility functions: exception handling, monadic combinators, data helpers. + +**Source**: [`Util.hs`](../../../../src/Simplex/Messaging/Util.hs) + +## Overview + +Most of this module is straightforward. The exception handling scheme is the part that warrants documentation — the naming is misleading and the semantics are subtle. + +## Exception handling scheme + +Three categories of exceptions, two catch strategies: + +| Category | Examples | `catchAll` | `catchOwn` | +|----------|----------|------------|------------| +| Synchronous | IOError, protocol errors | caught | caught | +| "Own" async | StackOverflow, HeapOverflow, AllocationLimitExceeded | caught | caught | +| Async cancellation | ThreadKilled, all other SomeAsyncException | caught | **re-thrown** | + +### isOwnException + +Classifies `StackOverflow`, `HeapOverflow`, and `AllocationLimitExceeded` as "own" — exceptions caused by this thread's resource usage, not by external cancellation. Despite being `AsyncException`, these should be caught like synchronous exceptions because they reflect the thread's own failure. + +### isAsyncCancellation + +True for any `SomeAsyncException` that is NOT an own exception. These represent external cancellation (e.g., `cancel`, `killThread`) and must be re-thrown to preserve structured concurrency guarantees. + +### catchOwn / catchOwn' + +Despite the name, these catch **all exceptions except async cancellations** — including synchronous exceptions. The name suggests "catch only own exceptions" but the actual semantics are "catch non-cancellation exceptions." This is the standard pattern for exception-safe cleanup in concurrent Haskell. + +### tryAllErrors vs tryAllOwnErrors + +- `tryAllErrors` / `catchAllErrors`: catch everything including async cancellations. Use when you need to convert any failure into an error value (e.g., returning error responses on a connection). +- `tryAllOwnErrors` / `catchAllOwnErrors`: catch everything except async cancellations. Use in normal business logic where cancellation should propagate. + +### AnyError typeclass + +Bridges `SomeException` into application error types via `fromSomeException`. All the `tryAll*` / `catchAll*` functions require this constraint. + +## raceAny_ + +Runs all actions concurrently, waits for any one to complete, then cancels all others. Uses nested `withAsync` — earlier-launched actions are canceled last (LIFO unwinding). + +## threadDelay' + +Handles `Int64` delays exceeding `maxBound :: Int` (~2147 seconds on 32-bit) by looping in chunks. Necessary because `threadDelay` takes `Int`, not `Int64`. + +## toChunks + +Precondition: `n > 0` (comment-only, not enforced). Passing `n = 0` causes infinite loop. diff --git a/spec/modules/Simplex/Messaging/Version.md b/spec/modules/Simplex/Messaging/Version.md new file mode 100644 index 000000000..67bbf1b4f --- /dev/null +++ b/spec/modules/Simplex/Messaging/Version.md @@ -0,0 +1,27 @@ +# Simplex.Messaging.Version + +> Version negotiation with proof-carrying compatibility checks. + +**Source**: [`Version.hs`](../../../../src/Simplex/Messaging/Version.hs) + +## Overview + +The module's central design: `Compatible` and `VRange` constructors are not exported. The only way to obtain a `Compatible` value is through the negotiation functions, and the only way to construct a `VersionRange` is through `mkVersionRange` (which validates) or parsing. This makes "compatibility was checked" a compile-time guarantee — code that holds a `Compatible a` has proof that negotiation succeeded. + +See [Simplex.Messaging.Version.Internal](./Version/Internal.md) for why the `Version` constructor is separated. + +## mkVersionRange + +Uses `error` if `min > max`. Safe only for compile-time constants. Runtime construction must use `safeVersionRange`, which returns `Nothing` on invalid input. + +## compatibleVersion vs compatibleVRange + +`compatibleVersion` selects a single version: `min(max1, max2)` — the highest mutually-supported version. `compatibleVRange` returns the full intersection range: `(max(min1,min2), min(max1,max2))`. The intersection is used when both sides need to remember the agreed range for future version-gated behavior, not just the single negotiated version. + +## compatibleVRange' + +Different from `compatibleVRange`: caps the range's *maximum* at a given version, rather than intersecting two ranges. Returns `Nothing` if the cap is below the range's minimum. Used when a peer reports a specific version and you need to constrain your range accordingly. + +## VersionI / VersionRangeI typeclasses + +Allow extension types that wrap `Version` or `VersionRange` (e.g., types carrying additional handshake parameters alongside the version) to participate in negotiation without unwrapping. The associated types (`VersionT`, `VersionRangeT`) map between the version and range forms of the extension type. diff --git a/spec/modules/Simplex/Messaging/Version/Internal.md b/spec/modules/Simplex/Messaging/Version/Internal.md new file mode 100644 index 000000000..9fe8cffe9 --- /dev/null +++ b/spec/modules/Simplex/Messaging/Version/Internal.md @@ -0,0 +1,7 @@ +# Simplex.Messaging.Version.Internal + +> Exports the `Version` constructor for internal use. + +**Source**: [`Version/Internal.hs`](../../../../../src/Simplex/Messaging/Version/Internal.hs) + +This module exists solely to split the `Version` constructor export. `Version.hs` exports `Version` as an opaque type (no constructor); `Version/Internal.hs` exports the `Version` constructor for modules that need to fabricate version values (protocol constants, parsers, tests). Application code should not import this module. diff --git a/spec/rcv-services.md b/spec/rcv-services.md index b0d97d9f7..6518059f2 100644 --- a/spec/rcv-services.md +++ b/spec/rcv-services.md @@ -33,10 +33,10 @@ Service client SMP Server ## Version gates -| Constant | Value | Gate | Source | -|----------|-------|------|--------| -| `serviceCertsSMPVersion` | 16 | Service handshake, `SOK`, `useServiceAuth` | Transport.hs:214 | -| `rcvServiceSMPVersion` | 19 | `SUBS`/`NSUBS` parameters, `SOKS`/`ENDS` idsHash, messaging service role in handshake | Transport.hs:223 | +| Constant | Value | Gate | +|----------|-------|------| +| `serviceCertsSMPVersion` | 16 | Service handshake, `SOK`, `useServiceAuth` | +| `rcvServiceSMPVersion` | 19 | `SUBS`/`NSUBS` parameters, `SOKS`/`ENDS` idsHash, messaging service role in handshake | The two-version split means: - v16-18 servers accept service certificates and per-queue `SUB` with service auth, but `SUBS`/`NSUBS` send no count/hash parameters (bare command tag only). @@ -55,14 +55,12 @@ The two-version split means: data SMPServiceRole = SRMessaging | SRNotifier | SRProxy -- Wire: "M" | "N" | "P" ``` -Source: Transport.hs:594 ### Party (service-related constructors) ```haskell data Party = ... | RecipientService | NotifierService | ... ``` -Source: Protocol.hs:335-346 The `ServiceParty` type family constrains to `RecipientService | NotifierService` only: ```haskell @@ -71,7 +69,6 @@ type family ServiceParty (p :: Party) :: Constraint where ServiceParty NotifierService = () ServiceParty p = (Int ~ Bool, TypeError ...) -- compile-time error ``` -Source: Protocol.hs:430-434 ### IdsHash @@ -89,7 +86,6 @@ instance Monoid IdsHash where queueIdHash :: QueueId -> IdsHash queueIdHash = IdsHash . C.md5Hash . unEntityId ``` -Source: Protocol.hs:1501-1526 **Key property**: XOR is self-inverse, so `addServiceSubs` and `subtractServiceSubs` both use `<>` (XOR) for the hash component: ```haskell @@ -98,7 +94,6 @@ subtractServiceSubs (n', idsHash') (n, idsHash) | n > n' = (n - n', idsHash <> idsHash') | otherwise = (0, mempty) ``` -Source: Protocol.hs:1528-1534 ### ServiceSub / ServiceSubResult / ServiceSubError @@ -116,7 +111,6 @@ data ServiceSubError | SSErrorQueueCount {expectedQueueCount, subscribedQueueCount :: Int64} | SSErrorQueueIdsHash {expectedQueueIdsHash, subscribedQueueIdsHash :: IdsHash} ``` -Source: Protocol.hs:1476-1499 `serviceSubResult` compares expected vs actual, returning the first mismatch (priority: serviceId > count > idsHash). @@ -128,7 +122,6 @@ data STMService = STMService serviceRcvQueues :: TVar (Set RecipientId, IdsHash), serviceNtfQueues :: TVar (Set NotifierId, IdsHash) } ``` -Source: QueueStore/STM.hs:64-68 Tracks the set of queue IDs and their cumulative XOR hash per service, per role (receive vs notify). @@ -142,8 +135,6 @@ Standard SMP handshake is two messages: server sends `SMPServerHandshake`, clien 2. **Client -> Server**: `SMPClientHandshake` with `clientService :: Maybe SMPClientHandshakeService` 3. **Server -> Client**: `SMPServerHandshakeResponse {serviceId}` or `SMPServerHandshakeError {handshakeError}` -Source: Transport.hs:752-791 (server), Transport.hs:796-848 (client) - ### SMPClientHandshakeService ```haskell @@ -151,7 +142,6 @@ data SMPClientHandshakeService = SMPClientHandshakeService { serviceRole :: SMPServiceRole, serviceCertKey :: CertChainPubKey } ``` -Source: Transport.hs:582-585 The `serviceCertKey` contains the TLS client certificate chain and a proof-of-possession: the service's Ed25519 session key signed by the service's X.509 signing key (`C.signX509 serviceSignKey $ C.publicToX509 k`). @@ -164,14 +154,10 @@ The `serviceCertKey` contains the TLS client certificate chain and a proof-of-po 5. Call `getService` callback (QueueStore.getCreateService) to get/create ServiceId 6. Send `SMPServerHandshakeResponse {serviceId}` back to client -Source: Transport.hs:775-791 - ### Client-side reception (`getClientService`) Client receives either `SMPServerHandshakeResponse {serviceId}` (success) or `SMPServerHandshakeError {handshakeError}` (failure). On success, stores `THClientService {serviceId, serviceRole, serviceCertHash, serviceKey}`. -Source: Transport.hs:843-847 - ### Version-gated service role filtering (`mkClientService`) ```haskell @@ -179,7 +165,6 @@ mkClientService v (ServiceCredentials {serviceRole, ...}, (k, _)) | serviceRole == SRMessaging && v < rcvServiceSMPVersion = Nothing | otherwise = Just SMPClientHandshakeService {..} ``` -Source: Transport.hs:838-842 Messaging services are suppressed below v19. Notifier services are sent at v16+. @@ -192,7 +177,6 @@ data ServiceCredentials = ServiceCredentials serviceCertHash :: XV.Fingerprint, serviceSignKey :: C.APrivateSignKey } ``` -Source: Transport.hs:587-592 ## Protocol layer: commands and messages @@ -216,7 +200,6 @@ useServiceAuth = \case Cmd _ NSUB -> True _ -> False ``` -Source: Protocol.hs:1737-1742 For these commands, `tEncodeAuth` appends both the primary queue key signature and an optional service Ed25519 signature. `SUBS`/`NSUBS` use the ServiceId as entity and are signed only by the service session key. @@ -237,21 +220,18 @@ For these commands, `tEncodeAuth` appends both the primary queue key signature a v >= 19: tag SP count idsHash v < 19: tag (bare, no parameters) ``` -Source: Protocol.hs:1769-1771, 1787-1789 **SOKS/ENDS encoding:** ``` v >= 19: tag SP count idsHash v < 19: tag SP count (no idsHash) ``` -Source: Protocol.hs:1951-1953 **SOKS/ENDS decoding:** ``` v >= 19: tag -> resp <$> _smpP <*> smpP (count + idsHash) v < 19: tag -> resp <$> _smpP <*> pure mempty (count only, mempty hash) ``` -Source: Protocol.hs:1996-1998 ## Server layer @@ -267,7 +247,6 @@ data Client s = Client ntfServiceSubsCount :: TVar (Int64, IdsHash), -- running (count, hash) for notifier queues ... } ``` -Source: Env/STM.hs:437-456 Server-global state: ```haskell @@ -279,7 +258,6 @@ data ServerSubscribers s = ServerSubscribers subClients :: TVar IntSet, pendingEvents :: TVar (IntMap (NonEmpty (EntityId, BrokerMsg))) } ``` -Source: Env/STM.hs:362-369 ### ClientSub events @@ -289,7 +267,6 @@ data ClientSub | CSDeleted QueueId (Maybe ServiceId) -- prev service ID | CSService ServiceId (Int64, IdsHash) -- service subscription change ``` -Source: Env/STM.hs:426-429 These are enqueued into `subQ` and processed by `serverThread` (the subscription event loop). @@ -526,7 +503,6 @@ subscribeService c party n idsHash = case smpClientService c of SNotifierService -> NSUBS n idsHash Nothing -> throwE PCEServiceUnavailable ``` -Source: Client.hs:921-934 Entity is `serviceId`, auth key is the service session key (Ed25519). The client passes its expected count and hash; the server returns its own. @@ -551,7 +527,6 @@ This prevents MITM service substitution inside TLS: an attacker cannot replace t (fp <> t, Just $ C.sign' serviceKey t) _ -> (t, Nothing) ``` -Source: Client.hs:1398-1401 ### Service runtime accessors @@ -562,7 +537,6 @@ smpClientService = thAuth . thParams >=> clientService smpClientServiceId :: SMPClient -> Maybe ServiceId smpClientServiceId = fmap (\THClientService {serviceId} -> serviceId) . smpClientService ``` -Source: Client.hs:936-942 ### Configuration @@ -632,8 +606,6 @@ data SessSubs = SessSubs activeServiceSub :: TVar (Maybe ServiceSub), pendingServiceSub :: TVar (Maybe ServiceSub) } ``` -Source: TSessionSubs.hs:59-65 - Key operations: - `setPendingServiceSub`: stores expected ServiceSub before SUBS is sent - `setActiveServiceSub`: promotes to active after SOKS, validates session ID @@ -657,8 +629,6 @@ CREATE TABLE client_services( service_queue_ids_hash BLOB NOT NULL DEFAULT x'00000000000000000000000000000000' ); ``` -Source: Agent/Store/SQLite/Migrations/M20260115_service_certs.hs:11-23 - ### `rcv_queues.rcv_service_assoc` Boolean column added to `rcv_queues`. When set, the queue is associated with the service for this server. SQLite triggers automatically maintain `service_queue_count` and `service_queue_ids_hash` on insert/delete/update of `rcv_queues` rows. @@ -676,8 +646,6 @@ Triggers: `tr_rcv_queue_insert`, `tr_rcv_queue_delete`, `tr_rcv_queue_update_rem | `removeRcvServiceAssocs` | Remove service association for all queues on a server | | `unassocUserServerRcvQueueSubs` | Remove association and return queues for re-subscription | -Source: AgentStore.hs:419-494, 2378-2414 - ### Service ID nullification on cert change `INSERT ... ON CONFLICT DO UPDATE SET ... service_id = NULL` (AgentStore.hs:429) — when service credentials are updated (new cert), the stored `service_id` is cleared, forcing a new handshake to get a fresh ServiceId. @@ -709,8 +677,6 @@ On first use per SMP server, `mkDbService` (Env.hs:126-142) generates a self-sig | `CAServiceSubError` | Log error (non-fatal; fatal errors go to `CAServiceUnavailable`) | | `CAServiceUnavailable` | **Critical recovery path**: calls `removeServiceAndAssociations`, wipes service creds, resubscribes all queues individually | -Source: Server.hs:567-602 - ### `removeServiceAndAssociations` (Store/Postgres.hs:620-652) Nuclear recovery: clears `ntf_service_id`, `ntf_service_cert*`, resets `smp_notifier_count`/`smp_notifier_ids_hash`, and removes all `ntf_service_assoc` flags from subscriptions. Used when the service subscription is irrecoverably broken (e.g., ServiceId mismatch after cert rotation). diff --git a/spec/version.md b/spec/version.md index 6d9a23c09..19ad786fe 100644 --- a/spec/version.md +++ b/spec/version.md @@ -14,8 +14,6 @@ The `Compatible` newtype can only be constructed internally (constructor is not ### `Version v` -**Source**: `Version/Internal.hs:11-12` - ```haskell newtype Version v = Version Word16 ``` @@ -31,8 +29,6 @@ The constructor is exported from `Version.Internal` but not from `Version`, so a ### `VersionRange v` -**Source**: `Version.hs:46-50` - ```haskell data VersionRange v = VRange { minVersion :: Version v @@ -42,16 +38,14 @@ data VersionRange v = VRange Invariant: `minVersion <= maxVersion` (enforced by smart constructors). -The `VRange` constructor is not exported — only the pattern synonym `VersionRange` (read-only, `Version.hs:41-44`) is public. +The `VRange` constructor is not exported — only the pattern synonym `VersionRange` (read-only) is public. -- `Encoding`: two Word16s concatenated (4 bytes total, `Version.hs:80-84`) -- `StrEncoding`: `"min-max"` or `"v"` if min == max (`Version.hs:86-93`) +- `Encoding`: two Word16s concatenated (4 bytes total) +- `StrEncoding`: `"min-max"` or `"v"` if min == max - JSON: `{"minVersion": n, "maxVersion": n}` ### `VersionScope v` -**Source**: `Version.hs:64` - ```haskell class VersionScope v ``` @@ -67,8 +61,6 @@ This prevents accidentally mixing version ranges from different protocols in neg ### `Compatible a` -**Source**: `Version.hs:117-122` - ```haskell newtype Compatible a = Compatible_ a @@ -80,8 +72,6 @@ Proof that compatibility was checked. The `Compatible_` constructor is not expor ### `VersionI` / `VersionRangeI` type classes -**Source**: `Version.hs:95-115` - Multi-param typeclasses with functional dependencies for generic version/range operations. Allow extension types that wrap `Version` or `VersionRange` to participate in negotiation: ```haskell @@ -103,76 +93,64 @@ Identity instances exist for `Version v` and `VersionRange v` themselves. ### Construction -| Function | Signature | Purpose | Source | -|----------|-----------|---------|--------| -| `mkVersionRange` | `Version v -> Version v -> VersionRange v` | Construct range, `error` if min > max | `Version.hs:67-70` | -| `safeVersionRange` | `Version v -> Version v -> Maybe (VersionRange v)` | Safe construction, `Nothing` if invalid | `Version.hs:72-75` | -| `versionToRange` | `Version v -> VersionRange v` | Singleton range (min == max) | `Version.hs:77-78` | +| Function | Signature | Purpose | +|----------|-----------|---------| +| `mkVersionRange` | `Version v -> Version v -> VersionRange v` | Construct range, `error` if min > max | +| `safeVersionRange` | `Version v -> Version v -> Maybe (VersionRange v)` | Safe construction, `Nothing` if invalid | +| `versionToRange` | `Version v -> VersionRange v` | Singleton range (min == max) | ### Compatibility checking -#### `isCompatible` +### isCompatible -**Source**: `Version.hs:124-125` +**Purpose**: Check if a single version falls within a range. ```haskell isCompatible :: VersionI v a => a -> VersionRange v -> Bool ``` -Check if a single version falls within a range. +### isCompatibleRange -#### `isCompatibleRange` - -**Source**: `Version.hs:127-130` +**Purpose**: Check if two version ranges overlap: `min1 <= max2 && min2 <= max1`. ```haskell isCompatibleRange :: VersionRangeI v a => a -> VersionRange v -> Bool ``` -Check if two version ranges overlap: `min1 <= max2 && min2 <= max1`. +### proveCompatible -#### `proveCompatible` - -**Source**: `Version.hs:132-133` +**Purpose**: If version is compatible, wrap in `Compatible` proof. Returns `Nothing` if out of range. ```haskell proveCompatible :: VersionI v a => a -> VersionRange v -> Maybe (Compatible a) ``` -If version is compatible, wrap in `Compatible` proof. Returns `Nothing` if out of range. - ### Negotiation -#### `compatibleVersion` +### compatibleVersion -**Source**: `Version.hs:135-140` +**Purpose**: Negotiate a single version from two ranges. Returns `min(max1, max2)` — the highest mutually-supported version. Returns `Nothing` if ranges don't overlap. ```haskell compatibleVersion :: VersionRangeI v a => a -> VersionRange v -> Maybe (Compatible (VersionT v a)) ``` -Negotiate a single version from two ranges. Returns `min(max1, max2)` — the highest mutually-supported version. Returns `Nothing` if ranges don't overlap. +### compatibleVRange -#### `compatibleVRange` - -**Source**: `Version.hs:143-148` +**Purpose**: Compute the intersection of two version ranges: `(max(min1,min2), min(max1,max2))`. Returns `Nothing` if the intersection is empty. ```haskell compatibleVRange :: VersionRangeI v a => a -> VersionRange v -> Maybe (Compatible a) ``` -Compute the intersection of two version ranges: `(max(min1,min2), min(max1,max2))`. Returns `Nothing` if the intersection is empty (i.e., ranges don't overlap). +### compatibleVRange' -#### `compatibleVRange'` - -**Source**: `Version.hs:151-156` +**Purpose**: Cap a version range's maximum at a given version. Returns `Nothing` if the cap is below the range's minimum. ```haskell compatibleVRange' :: VersionRangeI v a => a -> Version v -> Maybe (Compatible a) ``` -Cap a version range's maximum at a given version. Returns `Nothing` if the cap is below the range's minimum. - ## Protocol version constants Version constants for each protocol are defined in their respective Transport modules. For SMP, key gates include: