Files
simplexmq/spec/modules/README.md
T
Evgeny @ SimpleX Chat f7be44981a SMP router specs
2026-03-12 18:13:12 +00:00

198 lines
9.9 KiB
Markdown

# How to Document a Module
> Read this before writing any module doc. It defines what goes in, what stays out, and why.
## Purpose
Module docs exist for one reason: to capture knowledge that **cannot be obtained by reading the source code**. If reading the `.hs` file tells you everything you need to know, the module doc should be brief or empty.
These docs are an investment — their value compounds over time as multiple people (and LLMs) work on the code. Optimize for long-term value, not for looking thorough today.
## Process
**Read every line of the source file.** The non-obvious filter applies to what you *write*, not to what you *read*. Without reading each line, you will produce documentation from inferences rather than facts. Many non-obvious behaviors only become visible when you see a specific line of code and recognize that its implications would surprise a reader who doesn't have the surrounding context.
## File structure
Module docs mirror `src/Simplex/` exactly. Same subfolder structure, `.hs` replaced with `.md`:
```
src/Simplex/Messaging/Server.hs → spec/modules/Simplex/Messaging/Server.md
src/Simplex/Messaging/Crypto.hs → spec/modules/Simplex/Messaging/Crypto.md
src/Simplex/FileTransfer/Agent.hs → spec/modules/Simplex/FileTransfer/Agent.md
```
## What to include
### 1. Non-obvious behavior
Things that would surprise a competent Haskell developer reading the code for the first time:
- Subtle invariants maintained across function calls
- Ordering dependencies ("must call X before Y because...")
- Concurrency assumptions ("this TVar is only written from thread Z")
- Implicit contracts between caller and callee not captured by types
### 2. Usage considerations
- When to use function X vs function Y
- Common mistakes callers make
- Caller obligations not enforced by the type system
- Performance characteristics that affect usage decisions
### 3. Cross-module relationships
- Dependencies on other modules' behavior not visible from import lists
- Assumptions about how other modules use this one
- Coordination patterns (e.g., "Server.hs reads this TVar, Agent.hs writes it")
### 4. Security notes
- Trust boundaries this module enforces or relies on
- What happens if inputs are malicious
- Which functions are security-critical and why (reference SI-XX invariants)
### 5. Design rationale
- Why the code is structured this way (when not obvious)
- Alternatives considered and rejected
- Known limitations and their justification
## Non-obvious threshold
The guiding principle: **non-obvious state machines and flows require documentation; standard things don't.**
Document:
- Multi-step protocols and negotiation flows (e.g., KEM propose/accept round-trips)
- Monotonic or irreversible state transitions (e.g., PQ support can only be enabled, never disabled)
- Silent error behaviors (e.g., `verify` returns `False` on algorithm mismatch instead of an error)
- Design rationale for non-standard choices (e.g., why byte-reverse a nonce, why hash-then-encrypt for authenticators)
Do NOT document:
- Standard algorithm properties (e.g., Ed25519 public key derivable from private key)
- Well-known protocol mechanics (e.g., HKDF usage per RFC 5869, deterministic nonce derivation in double ratchet)
- Implementation details that follow directly from the type signatures
## What NOT to include
- **Type signatures** — the code has them
- **Code snippets** — if you're pasting code, you're making a stale copy
- **Function-by-function prose that restates the implementation** — "this function takes X and returns Y by doing Z" adds nothing
- **Line numbers** — they're brittle and break on every edit
- **Comments that fit in one line in source** — put those in the source file instead as `-- spec:` comments
- **Verbatim quotes of source comments** — reference them instead: "See comment on `functionName`." Then add only what the comment doesn't cover (cross-module implications, what breaks if violated). If the source comment says everything, the function doesn't need a doc entry.
- **Tables that reproduce code structure** — if the information is self-evident from reading the code's pattern matching or type definitions, it doesn't belong in the doc (e.g., per-command credential requirements, version-conditional encoding branches)
## Format
Each module doc has a header, then entries for functions/types that need documentation.
```markdown
# Module.Name
> One-line description of what this module does.
**Source**: [`Path/To/Module.hs`](relative link to source)
## Overview
[Only if the module's purpose or architecture is non-obvious.
Skip for simple modules.]
## functionName
**Purpose**: [What this does that isn't obvious from the name and type]
**Calls**: [Qualified.Name.a](link), [Qualified.Name.b](link)
**Called by**: [Qualified.Name.c](link)
**Invariant**: SI-XX
**Security**: [What this function ensures for the threat model]
[Free-form notes about non-obvious behavior, gotchas, etc.]
## anotherFunction
...
```
**For trivial modules** (< 100 LOC, no non-obvious behavior):
```markdown
# Module.Name
> One-line description.
**Source**: [`Path/To/Module.hs`](relative link to source)
No non-obvious behavior. See source.
```
This is valuable — it confirms someone looked and found nothing to document.
## Linking conventions
### Module doc → protocol docs
When a module implements or is governed by a protocol specification in `protocol/`, link to it near the top of the module doc (after the overview). Do not duplicate protocol content — just reference it:
```markdown
**Protocol spec**: [`protocol/pqdr.md`](../../../../protocol/pqdr.md) — Post-quantum resistant augmented double ratchet algorithm.
```
This is especially important for modules in transport, protocol, client, server, and agent layers where behavior is defined by the protocol spec rather than being self-evident from the code.
### Module doc → other module docs
Use fully qualified names as link text:
```markdown
[Simplex.Messaging.Server.subscribeServiceMessages](./Simplex/Messaging/Server.md#subscribeServiceMessages)
```
### Module doc → topic docs
```markdown
See [rcv-services](../rcv-services.md) for the end-to-end service subscription flow.
```
### Source → module doc
Add `-- spec:` comments as part of the module documentation work — when you document something non-obvious, add the link in source at the same time. Two levels:
**Module-level** (below the module declaration): when the Overview section has value.
```haskell
module Simplex.Messaging.Util (...) where
-- spec: spec/modules/Simplex/Messaging/Util.md
```
**Function-level** (above the function): when that function has a doc entry worth pointing to.
```haskell
-- spec: spec/modules/Simplex/Messaging/Util.md#catchOwn
-- Catches all exceptions except async cancellations (misleading name)
catchOwn :: ...
```
Only add `-- spec:` comments where the module doc actually says something the code doesn't. Don't add links to "No non-obvious behavior" docs or to entries that merely restate the source.
## Topic candidate tracking
While documenting modules, you will notice cross-cutting patterns — behaviors that span multiple modules and can't be understood from any single one. Note these in `spec/TOPICS.md` for later. Don't write the topic doc during module work; just record:
```markdown
- **Queue rotation**: Agent.hs initiates, Client.hs sends commands, Server.hs processes,
Protocol.hs defines types. End-to-end flow not obvious from any single module.
```
## Quality bar
Before finishing a module doc, ask:
1. Does every entry document something NOT in the source code?
2. Would removing any entry lose information? If not, remove it.
3. Are cross-module relationships captured that imports alone don't reveal?
4. Are security-critical functions flagged with invariant IDs?
5. Is this doc short enough that someone will actually read it?
If any answer reveals a problem, fix it and repeat from question 1. Only finish when a full pass produces no changes.
## Terminology — the spec as translation boundary
The protocol documents (`protocol/overview-tjr.md`, `protocol/simplex-messaging.md`, `protocol/agent-protocol.md`) define the canonical terminology. Code uses different names for some of the same concepts. The spec is where the translation happens.
The most important distinction: SimpleX protocol routers are referred to as "servers" in code. The term "server" was adopted historically because SimpleX routers were implemented as Linux-based software that is deployed in the same way as servers. But the similarity is entirely formal. Functionally, servers serve responses to the requests of their users - that is why the term "server" was adopted for computers and software that provide Internet services. SimpleX protocol routers don't serve responses - they route packets between endpoints, and they have no concept of a user. Functionally they are similar to Internet Protocol routers, but with a resource-based addressing scheme. Further, SimpleX protocol routers are hardware and software agnostic. SimpleX protocols are open and documented, so they can be implemented in any language and run on a different architecture. For example, [SimpleGo](https://simplego.dev) is a prototype implementation of the SimpleX protocol stack in C for a microcontroller architecture.
**The rule**: use protocol terms for concepts, code terms for identifiers. Write "router" when describing the network node's role, `SMPServer` or `Server.hs` when referencing code. Similarly, "router identity" for the concept (called "server key hash" or "fingerprint" in code). When the distinction matters, bridge explicitly: "the SMP router (implemented by the `Server` module)" or "the `SMPServer` type (representing a router address)."
## Exclusions
- **Individual migration files** (M20XXXXXX_*.hs): Self-describing SQL. No per-migration docs.
- **Auto-generated files** (GitCommit.hs): Skip.
- **Pure boilerplate** (Prometheus.hs metrics, Web/Embedded.hs static files): Document only if non-obvious.