mirror of
https://github.com/simplex-chat/simplexmq.git
synced 2026-03-31 09:46:11 +00:00
333 lines
13 KiB
Markdown
333 lines
13 KiB
Markdown
# Encoding
|
|
|
|
> Binary and string encoding used across all SimpleX protocols.
|
|
|
|
**Source files**: [`Encoding.hs`](../src/Simplex/Messaging/Encoding.hs), [`Encoding/String.hs`](../src/Simplex/Messaging/Encoding/String.hs), [`Parsers.hs`](../src/Simplex/Messaging/Parsers.hs)
|
|
|
|
## Overview
|
|
|
|
Two encoding layers serve different purposes:
|
|
|
|
- **`Encoding`** — Binary wire format for SMP protocol transmissions. Compact, no delimiters between fields. Used in all on-the-wire protocol messages.
|
|
- **`StrEncoding`** — Human-readable string format for configuration, URIs, logs, and JSON serialization. Uses base64url for binary data, decimal for numbers, comma-separated lists, space-separated tuples.
|
|
|
|
Both are typeclasses with `MINIMAL` pragmas requiring `encode` + (`decode` | `parser`), with the missing one derived from the other.
|
|
|
|
## Binary Encoding (`Encoding` class)
|
|
|
|
```haskell
|
|
class Encoding a where
|
|
smpEncode :: a -> ByteString
|
|
smpDecode :: ByteString -> Either String a -- default: parseAll smpP
|
|
smpP :: Parser a -- default: smpDecode <$?> smpP
|
|
```
|
|
|
|
### Length-prefix conventions
|
|
|
|
| Type | Prefix | Max size |
|
|
|------|--------|----------|
|
|
| `ByteString` | 1-byte length (Word8 as Char) | 255 bytes |
|
|
| `Large` (newtype) | 2-byte length (Word16 big-endian) | 65535 bytes |
|
|
| `Tail` (newtype) | None — consumes rest of input | Unlimited |
|
|
| Lists (`smpEncodeList`) | 1-byte count prefix, then concatenated items | 255 items |
|
|
| `NonEmpty` | Same as list (fails on count=0) | 255 items |
|
|
|
|
### Scalar types
|
|
|
|
| Type | Encoding | Bytes |
|
|
|------|----------|-------|
|
|
| `Char` | Raw byte | 1 |
|
|
| `Bool` | `'T'` / `'F'` (0x54 / 0x46) | 1 |
|
|
| `Word16` | Big-endian | 2 |
|
|
| `Word32` | Big-endian | 4 |
|
|
| `Int64` | Two big-endian Word32s (high then low) | 8 |
|
|
| `SystemTime` | `systemSeconds` as Int64 (nanoseconds dropped) | 8 |
|
|
| `Text` | UTF-8 then ByteString encoding (1-byte length prefix) | 1 + len |
|
|
| `String` | `B.pack` then ByteString encoding | 1 + len |
|
|
|
|
### `Maybe a`
|
|
|
|
```
|
|
Nothing → '0' (0x30)
|
|
Just x → '1' (0x31) ++ smpEncode x
|
|
```
|
|
|
|
Tags are ASCII characters `'0'`/`'1'`, not binary 0x00/0x01.
|
|
|
|
### Tuples
|
|
|
|
Tuples (2 through 8) encode as simple concatenation — no length prefix, no separator. Fields are parsed sequentially using each component's `smpP`. This works because each component's parser knows how many bytes to consume (via its own length prefix or fixed size).
|
|
|
|
### Combinators
|
|
|
|
| Function | Signature | Purpose |
|
|
|----------|-----------|---------|
|
|
| `_smpP` | `Parser a` | Space-prefixed parser (`A.space *> smpP`) |
|
|
| `smpEncodeList` | `[a] -> ByteString` | 1-byte count + concatenated items |
|
|
| `smpListP` | `Parser [a]` | Parse count then that many items |
|
|
| `lenEncode` | `Int -> Char` | Int to single-byte length char |
|
|
|
|
## String Encoding (`StrEncoding` class)
|
|
|
|
```haskell
|
|
class StrEncoding a where
|
|
strEncode :: a -> ByteString
|
|
strDecode :: ByteString -> Either String a -- default: parseAll strP
|
|
strP :: Parser a -- default: strDecode <$?> base64urlP
|
|
```
|
|
|
|
Key difference from `Encoding`: the default `strP` parses base64url input first, then applies `strDecode`. This means types that only implement `strDecode` will automatically accept base64url-encoded input.
|
|
|
|
### Instance conventions
|
|
|
|
| Type | Encoding |
|
|
|------|----------|
|
|
| `ByteString` | base64url (non-empty required) |
|
|
| `Word16`, `Word32` | Decimal string |
|
|
| `Int`, `Int64` | Signed decimal |
|
|
| `Char`, `Bool` | Delegates to `Encoding` (`smpEncode`/`smpP`) |
|
|
| `Maybe a` | Empty string = `Nothing`, otherwise `strEncode a` |
|
|
| `Text` | UTF-8 bytes, parsed until space/newline |
|
|
| `SystemTime` | `systemSeconds` as Int64 (decimal) |
|
|
| `UTCTime` | ISO 8601 string |
|
|
| `CertificateChain` | Comma-separated base64url blobs |
|
|
| `Fingerprint` | base64url of fingerprint bytes |
|
|
|
|
### Collection encoding
|
|
|
|
| Type | Separator |
|
|
|------|-----------|
|
|
| Lists (`strEncodeList`) | Comma `,` |
|
|
| `NonEmpty` | Comma (fails on empty) |
|
|
| `Set a` | Comma |
|
|
| `IntSet` | Comma |
|
|
| Tuples (2-6) | Space (` `) |
|
|
|
|
### `Str` newtype
|
|
|
|
Raw string (not base64url-encoded). Parses until space, consumes trailing space. Used for string-valued protocol fields that should not be base64-encoded.
|
|
|
|
### `TextEncoding` class
|
|
|
|
```haskell
|
|
class TextEncoding a where
|
|
textEncode :: a -> Text
|
|
textDecode :: Text -> Maybe a
|
|
```
|
|
|
|
Separate from `StrEncoding` — operates on `Text` rather than `ByteString`. Used for types that need Text representation (e.g., enum display names).
|
|
|
|
### JSON bridge functions
|
|
|
|
| Function | Purpose |
|
|
|----------|---------|
|
|
| `strToJSON` | `StrEncoding a => a -> J.Value` via `decodeLatin1 . strEncode` |
|
|
| `strToJEncoding` | Same, for Aeson encoding |
|
|
| `strParseJSON` | `StrEncoding a => String -> J.Value -> JT.Parser a` — parse JSON string via `strP` |
|
|
| `textToJSON` | `TextEncoding a => a -> J.Value` |
|
|
| `textToEncoding` | Same, for Aeson encoding |
|
|
| `textParseJSON` | `TextEncoding a => String -> J.Value -> JT.Parser a` |
|
|
|
|
## Parsers
|
|
|
|
**Source**: [`Parsers.hs`](../src/Simplex/Messaging/Parsers.hs)
|
|
|
|
### Core parsing functions
|
|
|
|
| Function | Signature | Purpose |
|
|
|----------|-----------|---------|
|
|
| `parseAll` | `Parser a -> ByteString -> Either String a` | Parse consuming all input (fails if bytes remain) |
|
|
| `parse` | `Parser a -> e -> ByteString -> Either e a` | `parseAll` with custom error type (discards error string) |
|
|
| `parseE` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | `parseAll` lifted into `ExceptT` |
|
|
| `parseE'` | `(String -> e) -> Parser a -> ByteString -> ExceptT e IO a` | Like `parseE` but allows trailing input |
|
|
| `parseRead1` | `Read a => Parser a` | Parse a word then `readMaybe` it |
|
|
| `parseString` | `(ByteString -> Either String a) -> String -> a` | Parse from `String` (errors with `error`) |
|
|
|
|
### `base64P`
|
|
|
|
Standard base64 parser (not base64url — uses `+`/`/` alphabet). Takes alphanumeric + `+`/`/` characters, optional `=` padding, then decodes. Contrast with `base64urlP` in `Encoding/String.hs` which uses `-`/`_` alphabet.
|
|
|
|
### JSON options helpers
|
|
|
|
Platform-conditional JSON encoding for cross-platform compatibility (Haskell ↔ Swift).
|
|
|
|
| Function | Purpose |
|
|
|----------|---------|
|
|
| `enumJSON` | All-nullary constructors as strings, with tag modifier |
|
|
| `sumTypeJSON` | Platform-conditional: `taggedObjectJSON` on non-Darwin, `singleFieldJSON` on Darwin |
|
|
| `taggedObjectJSON` | `{"type": "Tag", "data": {...}}` format |
|
|
| `singleFieldJSON` | `{"Tag": value}` format |
|
|
| `defaultJSON` | Default options with `omitNothingFields = True` |
|
|
|
|
Pattern synonyms for JSON field names:
|
|
- `TaggedObjectJSONTag = "type"`
|
|
- `TaggedObjectJSONData = "data"`
|
|
- `SingleFieldJSONTag = "_owsf"`
|
|
|
|
### String helpers
|
|
|
|
| Function | Purpose |
|
|
|----------|---------|
|
|
| `fstToLower` | Lowercase first character |
|
|
| `dropPrefix` | Remove prefix string, lowercase remainder |
|
|
| `textP` | Parse rest of input as UTF-8 `String` |
|
|
|
|
## Auxiliary Types and Utilities
|
|
|
|
### TMap
|
|
|
|
**Source**: [`TMap.hs`](../src/Simplex/Messaging/TMap.hs)
|
|
|
|
```haskell
|
|
type TMap k a = TVar (Map k a)
|
|
```
|
|
|
|
STM-based concurrent map. Wraps `Data.Map.Strict` in a `TVar`. All mutations use `modifyTVar'` (strict) to prevent thunk accumulation.
|
|
|
|
| Function | Notes |
|
|
|----------|-------|
|
|
| `emptyIO` | IO allocation (`newTVarIO`) |
|
|
| `singleton` | STM allocation |
|
|
| `clear` | Reset to empty |
|
|
| `lookup` / `lookupIO` | STM / non-transactional IO read |
|
|
| `member` / `memberIO` | STM / non-transactional IO membership |
|
|
| `insert` / `insertM` | Insert value / insert from STM action |
|
|
| `delete` | Remove key |
|
|
| `lookupInsert` | Atomic lookup-then-insert (returns old value) |
|
|
| `lookupDelete` | Atomic lookup-then-delete |
|
|
| `adjust` / `update` / `alter` / `alterF` | Standard Map operations lifted to STM |
|
|
| `union` | Merge `Map` into `TMap` |
|
|
|
|
`lookupIO`/`memberIO` use `readTVarIO` — single-read outside STM transaction, useful when you need a snapshot without composing with other STM operations.
|
|
|
|
### SessionVar
|
|
|
|
**Source**: [`Session.hs`](../src/Simplex/Messaging/Session.hs)
|
|
|
|
Race-safe session management using TMVar + monotonic ID.
|
|
|
|
```haskell
|
|
data SessionVar a = SessionVar
|
|
{ sessionVar :: TMVar a -- result slot
|
|
, sessionVarId :: Int -- monotonic ID from TVar counter
|
|
, sessionVarTs :: UTCTime -- creation timestamp
|
|
}
|
|
```
|
|
|
|
| Function | Purpose |
|
|
|----------|---------|
|
|
| `getSessVar` | Lookup or create session. Returns `Left new` or `Right existing` |
|
|
| `removeSessVar` | Delete session only if ID matches (prevents removing a replacement) |
|
|
| `tryReadSessVar` | Non-blocking read of session result |
|
|
|
|
The ID-match check in `removeSessVar` prevents a race where:
|
|
1. Thread A creates session #5, starts work
|
|
2. Thread B creates session #6 (replacing #5 in TMap)
|
|
3. Thread A finishes, tries to remove — ID mismatch, removal blocked
|
|
|
|
### ServiceScheme
|
|
|
|
**Source**: [`ServiceScheme.hs`](../src/Simplex/Messaging/ServiceScheme.hs)
|
|
|
|
```haskell
|
|
data ServiceScheme = SSSimplex | SSAppServer SrvLoc
|
|
data SrvLoc = SrvLoc HostName ServiceName
|
|
```
|
|
|
|
URI scheme for SimpleX service addresses. `SSSimplex` encodes as `"simplex:"`, `SSAppServer` as `"https://host:port"`.
|
|
|
|
`simplexChat` is the constant `SSAppServer (SrvLoc "simplex.chat" "")`.
|
|
|
|
### SystemTime
|
|
|
|
**Source**: [`SystemTime.hs`](../src/Simplex/Messaging/SystemTime.hs)
|
|
|
|
```haskell
|
|
newtype RoundedSystemTime (t :: Nat) = RoundedSystemTime { roundedSeconds :: Int64 }
|
|
type SystemDate = RoundedSystemTime 86400 -- day precision
|
|
type SystemSeconds = RoundedSystemTime 1 -- second precision
|
|
```
|
|
|
|
Phantom-typed time rounding. The `Nat` type parameter specifies rounding granularity in seconds.
|
|
|
|
| Function | Purpose |
|
|
|----------|---------|
|
|
| `getRoundedSystemTime` | Get current time rounded to `t` seconds |
|
|
| `getSystemDate` | Alias for day-rounded time |
|
|
| `getSystemSeconds` | Second-precision (no rounding needed, just drops nanoseconds) |
|
|
| `roundedToUTCTime` | Convert back to `UTCTime` |
|
|
|
|
`RoundedSystemTime` derives `FromField`/`ToField` for SQLite storage and `FromJSON`/`ToJSON` for API serialization.
|
|
|
|
### Util
|
|
|
|
**Source**: [`Util.hs`](../src/Simplex/Messaging/Util.hs)
|
|
|
|
Selected utilities used across the codebase:
|
|
|
|
**Monadic combinators**:
|
|
|
|
| Function | Signature | Purpose |
|
|
|----------|-----------|---------|
|
|
| `<$?>` | `MonadFail m => (a -> Either String b) -> m a -> m b` | Lift fallible function into parser |
|
|
| `$>>=` | `(Monad m, Monad f, Traversable f) => m (f a) -> (a -> m (f b)) -> m (f b)` | Monadic bind through nested monad |
|
|
| `ifM` / `whenM` / `unlessM` | Monadic conditionals | |
|
|
| `anyM` | Short-circuit `any` for monadic predicates (strict) | |
|
|
|
|
**Error handling**:
|
|
|
|
| Function | Purpose |
|
|
|----------|---------|
|
|
| `tryAllErrors` | Catch all exceptions (including async) into `ExceptT` |
|
|
| `catchAllErrors` | Same with handler |
|
|
| `tryAllOwnErrors` | Catch only "own" exceptions (re-throws async cancellation) |
|
|
| `catchAllOwnErrors` | Same with handler |
|
|
| `isOwnException` | `StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded` |
|
|
| `isAsyncCancellation` | Any `SomeAsyncException` except own exceptions |
|
|
| `catchThrow` | Catch exceptions, wrap in Left |
|
|
| `allFinally` | `tryAllErrors` + `final` + `except` (like `finally` for ExceptT) |
|
|
|
|
The own-vs-async distinction is critical: `catchOwn`/`tryAllOwnErrors` never swallow async cancellation (`ThreadKilled`, `UserInterrupt`, etc.), only synchronous exceptions and resource exhaustion (`StackOverflow`, `HeapOverflow`, `AllocationLimitExceeded`).
|
|
|
|
**STM**:
|
|
|
|
| Function | Purpose |
|
|
|----------|---------|
|
|
| `tryWriteTBQueue` | Non-blocking bounded queue write, returns success |
|
|
|
|
**Database result helpers**:
|
|
|
|
| Function | Purpose |
|
|
|----------|---------|
|
|
| `firstRow` | Extract first row with transform, or Left error |
|
|
| `maybeFirstRow` | Extract first row as Maybe |
|
|
| `firstRow'` | Like `firstRow` but transform can also fail |
|
|
|
|
**Collection utilities**:
|
|
|
|
| Function | Purpose |
|
|
|----------|---------|
|
|
| `groupOn` | `groupBy` using equality on projected key |
|
|
| `groupAllOn` | `groupOn` after `sortOn` (groups non-adjacent elements) |
|
|
| `toChunks` | Split list into `NonEmpty` chunks of size n |
|
|
| `packZipWith` | Optimized ByteString zipWith (direct memory access) |
|
|
|
|
**Miscellaneous**:
|
|
|
|
| Function | Purpose |
|
|
|----------|---------|
|
|
| `safeDecodeUtf8` | Decode UTF-8 replacing errors with `'?'` |
|
|
| `bshow` / `tshow` | `show` to `ByteString` / `Text` |
|
|
| `threadDelay'` | `Int64` delay (handles overflow by looping) |
|
|
| `diffToMicroseconds` / `diffToMilliseconds` | `NominalDiffTime` conversion |
|
|
| `labelMyThread` | Label current thread for debugging |
|
|
| `encodeJSON` / `decodeJSON` | `ToJSON a => a -> Text` / `FromJSON a => Text -> Maybe a` |
|
|
| `traverseWithKey_` | `Map` traversal discarding results |
|
|
|
|
## Security notes
|
|
|
|
- **Length prefix overflow**: `ByteString` encoding uses 1-byte length — silently truncates strings > 255 bytes. Callers must ensure size bounds before encoding. `Large` extends to 65535 bytes via Word16 prefix.
|
|
- **`Tail` unbounded**: `Tail` consumes all remaining input with no size check. Only safe when total message size is already bounded (e.g., within a padded SMP block).
|
|
- **base64 vs base64url**: `Parsers.base64P` uses standard alphabet (`+`/`/`), while `String.base64urlP` uses URL-safe alphabet (`-`/`_`). Mixing them causes silent decode failures.
|
|
- **`safeDecodeUtf8`**: Replaces invalid UTF-8 with `'?'` rather than failing. Suitable for logging/display, not for security-critical string comparison.
|