dandri/simplexmq

Fork 0

mirror of https://github.com/simplex-chat/simplexmq.git synced 2026-07-10 03:32:02 +00:00

Files

T

History

Evgeny @ SimpleX Chat c940f16f37 update agent specs

2026-03-13 10:14:47 +00:00

Simplex

update agent specs

2026-03-13 10:14:47 +00:00

README.md

SMP router specs

2026-03-12 18:13:12 +00:00

README.md

How to Document a Module

Read this before writing any module doc. It defines what goes in, what stays out, and why.

Purpose

Module docs exist for one reason: to capture knowledge that cannot be obtained by reading the source code. If reading the .hs file tells you everything you need to know, the module doc should be brief or empty.

These docs are an investment — their value compounds over time as multiple people (and LLMs) work on the code. Optimize for long-term value, not for looking thorough today.

Process

Read every line of the source file. The non-obvious filter applies to what you write, not to what you read. Without reading each line, you will produce documentation from inferences rather than facts. Many non-obvious behaviors only become visible when you see a specific line of code and recognize that its implications would surprise a reader who doesn't have the surrounding context.

File structure

Module docs mirror src/Simplex/ exactly. Same subfolder structure, .hs replaced with .md:

src/Simplex/Messaging/Server.hs    →  spec/modules/Simplex/Messaging/Server.md
src/Simplex/Messaging/Crypto.hs    →  spec/modules/Simplex/Messaging/Crypto.md
src/Simplex/FileTransfer/Agent.hs  →  spec/modules/Simplex/FileTransfer/Agent.md

What to include

1. Non-obvious behavior

Things that would surprise a competent Haskell developer reading the code for the first time:

Subtle invariants maintained across function calls
Ordering dependencies ("must call X before Y because...")
Concurrency assumptions ("this TVar is only written from thread Z")
Implicit contracts between caller and callee not captured by types

2. Usage considerations

When to use function X vs function Y
Common mistakes callers make
Caller obligations not enforced by the type system
Performance characteristics that affect usage decisions

3. Cross-module relationships

Dependencies on other modules' behavior not visible from import lists
Assumptions about how other modules use this one
Coordination patterns (e.g., "Server.hs reads this TVar, Agent.hs writes it")

4. Security notes

Trust boundaries this module enforces or relies on
What happens if inputs are malicious
Which functions are security-critical and why (reference SI-XX invariants)

5. Design rationale

Why the code is structured this way (when not obvious)
Alternatives considered and rejected
Known limitations and their justification

Non-obvious threshold

The guiding principle: non-obvious state machines and flows require documentation; standard things don't.

Document:

Multi-step protocols and negotiation flows (e.g., KEM propose/accept round-trips)
Monotonic or irreversible state transitions (e.g., PQ support can only be enabled, never disabled)
Silent error behaviors (e.g., verify returns False on algorithm mismatch instead of an error)
Design rationale for non-standard choices (e.g., why byte-reverse a nonce, why hash-then-encrypt for authenticators)

Do NOT document:

Standard algorithm properties (e.g., Ed25519 public key derivable from private key)
Well-known protocol mechanics (e.g., HKDF usage per RFC 5869, deterministic nonce derivation in double ratchet)
Implementation details that follow directly from the type signatures

What NOT to include

Type signatures — the code has them
Code snippets — if you're pasting code, you're making a stale copy
Function-by-function prose that restates the implementation — "this function takes X and returns Y by doing Z" adds nothing
Line numbers — they're brittle and break on every edit
Comments that fit in one line in source — put those in the source file instead as -- spec: comments
Verbatim quotes of source comments — reference them instead: "See comment on functionName." Then add only what the comment doesn't cover (cross-module implications, what breaks if violated). If the source comment says everything, the function doesn't need a doc entry.
Tables that reproduce code structure — if the information is self-evident from reading the code's pattern matching or type definitions, it doesn't belong in the doc (e.g., per-command credential requirements, version-conditional encoding branches)

Format

Each module doc has a header, then entries for functions/types that need documentation.

# Module.Name

> One-line description of what this module does.

**Source**: [`Path/To/Module.hs`](relative link to source)

## Overview

[Only if the module's purpose or architecture is non-obvious.
Skip for simple modules.]

## functionName

**Purpose**: [What this does that isn't obvious from the name and type]
**Calls**: [Qualified.Name.a](link), [Qualified.Name.b](link)
**Called by**: [Qualified.Name.c](link)
**Invariant**: SI-XX
**Security**: [What this function ensures for the threat model]

[Free-form notes about non-obvious behavior, gotchas, etc.]

## anotherFunction

...

For trivial modules (< 100 LOC, no non-obvious behavior):

# Module.Name

> One-line description.

**Source**: [`Path/To/Module.hs`](relative link to source)

No non-obvious behavior. See source.

This is valuable — it confirms someone looked and found nothing to document.

Linking conventions

Module doc → protocol docs

When a module implements or is governed by a protocol specification in protocol/, link to it near the top of the module doc (after the overview). Do not duplicate protocol content — just reference it:

**Protocol spec**: [`protocol/pqdr.md`](../../../../protocol/pqdr.md) — Post-quantum resistant augmented double ratchet algorithm.

This is especially important for modules in transport, protocol, client, server, and agent layers where behavior is defined by the protocol spec rather than being self-evident from the code.

Module doc → other module docs

Use fully qualified names as link text:

[Simplex.Messaging.Server.subscribeServiceMessages](./Simplex/Messaging/Server.md#subscribeServiceMessages)

Module doc → topic docs

See [rcv-services](../rcv-services.md) for the end-to-end service subscription flow.

Source → module doc

Add -- spec: comments as part of the module documentation work — when you document something non-obvious, add the link in source at the same time. Two levels:

Module-level (below the module declaration): when the Overview section has value.

module Simplex.Messaging.Util (...) where
-- spec: spec/modules/Simplex/Messaging/Util.md

Function-level (above the function): when that function has a doc entry worth pointing to.

-- spec: spec/modules/Simplex/Messaging/Util.md#catchOwn
-- Catches all exceptions except async cancellations (misleading name)
catchOwn :: ...

Only add -- spec: comments where the module doc actually says something the code doesn't. Don't add links to "No non-obvious behavior" docs or to entries that merely restate the source.

Topic candidate tracking

While documenting modules, you will notice cross-cutting patterns — behaviors that span multiple modules and can't be understood from any single one. Note these in spec/TOPICS.md for later. Don't write the topic doc during module work; just record:

- **Queue rotation**: Agent.hs initiates, Client.hs sends commands, Server.hs processes,
  Protocol.hs defines types. End-to-end flow not obvious from any single module.

Quality bar

Before finishing a module doc, ask:

Does every entry document something NOT in the source code?
Would removing any entry lose information? If not, remove it.
Are cross-module relationships captured that imports alone don't reveal?
Are security-critical functions flagged with invariant IDs?
Is this doc short enough that someone will actually read it?

If any answer reveals a problem, fix it and repeat from question 1. Only finish when a full pass produces no changes.

Terminology — the spec as translation boundary

The protocol documents (protocol/overview-tjr.md, protocol/simplex-messaging.md, protocol/agent-protocol.md) define the canonical terminology. Code uses different names for some of the same concepts. The spec is where the translation happens.

The most important distinction: SimpleX protocol routers are referred to as "servers" in code. The term "server" was adopted historically because SimpleX routers were implemented as Linux-based software that is deployed in the same way as servers. But the similarity is entirely formal. Functionally, servers serve responses to the requests of their users - that is why the term "server" was adopted for computers and software that provide Internet services. SimpleX protocol routers don't serve responses - they route packets between endpoints, and they have no concept of a user. Functionally they are similar to Internet Protocol routers, but with a resource-based addressing scheme. Further, SimpleX protocol routers are hardware and software agnostic. SimpleX protocols are open and documented, so they can be implemented in any language and run on a different architecture. For example, SimpleGo is a prototype implementation of the SimpleX protocol stack in C for a microcontroller architecture.

The rule: use protocol terms for concepts, code terms for identifiers. Write "router" when describing the network node's role, SMPServer or Server.hs when referencing code. Similarly, "router identity" for the concept (called "server key hash" or "fingerprint" in code). When the distinction matters, bridge explicitly: "the SMP router (implemented by the Server module)" or "the SMPServer type (representing a router address)."

Exclusions

Individual migration files (M20XXXXXX_*.hs): Self-describing SQL. No per-migration docs.
Auto-generated files (GitCommit.hs): Skip.
Pure boilerplate (Prometheus.hs metrics, Web/Embedded.hs static files): Document only if non-obvious.