mirror of
https://github.com/simplex-chat/simplexmq.git
synced 2026-05-12 10:14:47 +00:00
198 lines
9.9 KiB
Markdown
198 lines
9.9 KiB
Markdown
# How to Document a Module
|
|
|
|
> Read this before writing any module doc. It defines what goes in, what stays out, and why.
|
|
|
|
## Purpose
|
|
|
|
Module docs exist for one reason: to capture knowledge that **cannot be obtained by reading the source code**. If reading the `.hs` file tells you everything you need to know, the module doc should be brief or empty.
|
|
|
|
These docs are an investment — their value compounds over time as multiple people (and LLMs) work on the code. Optimize for long-term value, not for looking thorough today.
|
|
|
|
## Process
|
|
|
|
**Read every line of the source file.** The non-obvious filter applies to what you *write*, not to what you *read*. Without reading each line, you will produce documentation from inferences rather than facts. Many non-obvious behaviors only become visible when you see a specific line of code and recognize that its implications would surprise a reader who doesn't have the surrounding context.
|
|
|
|
## File structure
|
|
|
|
Module docs mirror `src/Simplex/` exactly. Same subfolder structure, `.hs` replaced with `.md`:
|
|
|
|
```
|
|
src/Simplex/Messaging/Server.hs → spec/modules/Simplex/Messaging/Server.md
|
|
src/Simplex/Messaging/Crypto.hs → spec/modules/Simplex/Messaging/Crypto.md
|
|
src/Simplex/FileTransfer/Agent.hs → spec/modules/Simplex/FileTransfer/Agent.md
|
|
```
|
|
|
|
## What to include
|
|
|
|
### 1. Non-obvious behavior
|
|
Things that would surprise a competent Haskell developer reading the code for the first time:
|
|
- Subtle invariants maintained across function calls
|
|
- Ordering dependencies ("must call X before Y because...")
|
|
- Concurrency assumptions ("this TVar is only written from thread Z")
|
|
- Implicit contracts between caller and callee not captured by types
|
|
|
|
### 2. Usage considerations
|
|
- When to use function X vs function Y
|
|
- Common mistakes callers make
|
|
- Caller obligations not enforced by the type system
|
|
- Performance characteristics that affect usage decisions
|
|
|
|
### 3. Cross-module relationships
|
|
- Dependencies on other modules' behavior not visible from import lists
|
|
- Assumptions about how other modules use this one
|
|
- Coordination patterns (e.g., "Server.hs reads this TVar, Agent.hs writes it")
|
|
|
|
### 4. Security notes
|
|
- Trust boundaries this module enforces or relies on
|
|
- What happens if inputs are malicious
|
|
- Which functions are security-critical and why (reference SI-XX invariants)
|
|
|
|
### 5. Design rationale
|
|
- Why the code is structured this way (when not obvious)
|
|
- Alternatives considered and rejected
|
|
- Known limitations and their justification
|
|
|
|
## Non-obvious threshold
|
|
|
|
The guiding principle: **non-obvious state machines and flows require documentation; standard things don't.**
|
|
|
|
Document:
|
|
- Multi-step protocols and negotiation flows (e.g., KEM propose/accept round-trips)
|
|
- Monotonic or irreversible state transitions (e.g., PQ support can only be enabled, never disabled)
|
|
- Silent error behaviors (e.g., `verify` returns `False` on algorithm mismatch instead of an error)
|
|
- Design rationale for non-standard choices (e.g., why byte-reverse a nonce, why hash-then-encrypt for authenticators)
|
|
|
|
Do NOT document:
|
|
- Standard algorithm properties (e.g., Ed25519 public key derivable from private key)
|
|
- Well-known protocol mechanics (e.g., HKDF usage per RFC 5869, deterministic nonce derivation in double ratchet)
|
|
- Implementation details that follow directly from the type signatures
|
|
|
|
## What NOT to include
|
|
|
|
- **Type signatures** — the code has them
|
|
- **Code snippets** — if you're pasting code, you're making a stale copy
|
|
- **Function-by-function prose that restates the implementation** — "this function takes X and returns Y by doing Z" adds nothing
|
|
- **Line numbers** — they're brittle and break on every edit
|
|
- **Comments that fit in one line in source** — put those in the source file instead as `-- spec:` comments
|
|
- **Verbatim quotes of source comments** — reference them instead: "See comment on `functionName`." Then add only what the comment doesn't cover (cross-module implications, what breaks if violated). If the source comment says everything, the function doesn't need a doc entry.
|
|
- **Tables that reproduce code structure** — if the information is self-evident from reading the code's pattern matching or type definitions, it doesn't belong in the doc (e.g., per-command credential requirements, version-conditional encoding branches)
|
|
|
|
## Format
|
|
|
|
Each module doc has a header, then entries for functions/types that need documentation.
|
|
|
|
```markdown
|
|
# Module.Name
|
|
|
|
> One-line description of what this module does.
|
|
|
|
**Source**: [`Path/To/Module.hs`](relative link to source)
|
|
|
|
## Overview
|
|
|
|
[Only if the module's purpose or architecture is non-obvious.
|
|
Skip for simple modules.]
|
|
|
|
## functionName
|
|
|
|
**Purpose**: [What this does that isn't obvious from the name and type]
|
|
**Calls**: [Qualified.Name.a](link), [Qualified.Name.b](link)
|
|
**Called by**: [Qualified.Name.c](link)
|
|
**Invariant**: SI-XX
|
|
**Security**: [What this function ensures for the threat model]
|
|
|
|
[Free-form notes about non-obvious behavior, gotchas, etc.]
|
|
|
|
## anotherFunction
|
|
|
|
...
|
|
```
|
|
|
|
**For trivial modules** (< 100 LOC, no non-obvious behavior):
|
|
|
|
```markdown
|
|
# Module.Name
|
|
|
|
> One-line description.
|
|
|
|
**Source**: [`Path/To/Module.hs`](relative link to source)
|
|
|
|
No non-obvious behavior. See source.
|
|
```
|
|
|
|
This is valuable — it confirms someone looked and found nothing to document.
|
|
|
|
## Linking conventions
|
|
|
|
### Module doc → protocol docs
|
|
When a module implements or is governed by a protocol specification in `protocol/`, link to it near the top of the module doc (after the overview). Do not duplicate protocol content — just reference it:
|
|
```markdown
|
|
**Protocol spec**: [`protocol/pqdr.md`](../../../../protocol/pqdr.md) — Post-quantum resistant augmented double ratchet algorithm.
|
|
```
|
|
|
|
This is especially important for modules in transport, protocol, client, server, and agent layers where behavior is defined by the protocol spec rather than being self-evident from the code.
|
|
|
|
### Module doc → other module docs
|
|
Use fully qualified names as link text:
|
|
```markdown
|
|
[Simplex.Messaging.Server.subscribeServiceMessages](./Simplex/Messaging/Server.md#subscribeServiceMessages)
|
|
```
|
|
|
|
### Module doc → topic docs
|
|
```markdown
|
|
See [rcv-services](../rcv-services.md) for the end-to-end service subscription flow.
|
|
```
|
|
|
|
### Source → module doc
|
|
|
|
Add `-- spec:` comments as part of the module documentation work — when you document something non-obvious, add the link in source at the same time. Two levels:
|
|
|
|
**Module-level** (below the module declaration): when the Overview section has value.
|
|
```haskell
|
|
module Simplex.Messaging.Util (...) where
|
|
-- spec: spec/modules/Simplex/Messaging/Util.md
|
|
```
|
|
|
|
**Function-level** (above the function): when that function has a doc entry worth pointing to.
|
|
```haskell
|
|
-- spec: spec/modules/Simplex/Messaging/Util.md#catchOwn
|
|
-- Catches all exceptions except async cancellations (misleading name)
|
|
catchOwn :: ...
|
|
```
|
|
|
|
Only add `-- spec:` comments where the module doc actually says something the code doesn't. Don't add links to "No non-obvious behavior" docs or to entries that merely restate the source.
|
|
|
|
## Topic candidate tracking
|
|
|
|
While documenting modules, you will notice cross-cutting patterns — behaviors that span multiple modules and can't be understood from any single one. Note these in `spec/TOPICS.md` for later. Don't write the topic doc during module work; just record:
|
|
|
|
```markdown
|
|
- **Queue rotation**: Agent.hs initiates, Client.hs sends commands, Server.hs processes,
|
|
Protocol.hs defines types. End-to-end flow not obvious from any single module.
|
|
```
|
|
|
|
## Quality bar
|
|
|
|
Before finishing a module doc, ask:
|
|
1. Does every entry document something NOT in the source code?
|
|
2. Would removing any entry lose information? If not, remove it.
|
|
3. Are cross-module relationships captured that imports alone don't reveal?
|
|
4. Are security-critical functions flagged with invariant IDs?
|
|
5. Is this doc short enough that someone will actually read it?
|
|
|
|
If any answer reveals a problem, fix it and repeat from question 1. Only finish when a full pass produces no changes.
|
|
|
|
## Terminology — the spec as translation boundary
|
|
|
|
The protocol documents (`protocol/overview-tjr.md`, `protocol/simplex-messaging.md`, `protocol/agent-protocol.md`) define the canonical terminology. Code uses different names for some of the same concepts. The spec is where the translation happens.
|
|
|
|
The most important distinction: SimpleX protocol routers are referred to as "servers" in code. The term "server" was adopted historically because SimpleX routers were implemented as Linux-based software that is deployed in the same way as servers. But the similarity is entirely formal. Functionally, servers serve responses to the requests of their users - that is why the term "server" was adopted for computers and software that provide Internet services. SimpleX protocol routers don't serve responses - they route packets between endpoints, and they have no concept of a user. Functionally they are similar to Internet Protocol routers, but with a resource-based addressing scheme. Further, SimpleX protocol routers are hardware and software agnostic. SimpleX protocols are open and documented, so they can be implemented in any language and run on a different architecture. For example, [SimpleGo](https://simplego.dev) is a prototype implementation of the SimpleX protocol stack in C for a microcontroller architecture.
|
|
|
|
**The rule**: use protocol terms for concepts, code terms for identifiers. Write "router" when describing the network node's role, `SMPServer` or `Server.hs` when referencing code. Similarly, "router identity" for the concept (called "server key hash" or "fingerprint" in code). When the distinction matters, bridge explicitly: "the SMP router (implemented by the `Server` module)" or "the `SMPServer` type (representing a router address)."
|
|
|
|
## Exclusions
|
|
|
|
- **Individual migration files** (M20XXXXXX_*.hs): Self-describing SQL. No per-migration docs.
|
|
- **Auto-generated files** (GitCommit.hs): Skip.
|
|
- **Pure boilerplate** (Prometheus.hs metrics, Web/Embedded.hs static files): Document only if non-obvious.
|