Files
simplexmq/spec/encoding.md
Evgeny @ SimpleX Chat 66d7efa61e some modules documented
2026-03-11 08:53:57 +00:00

13 KiB

Encoding

Binary and string encoding used across all SimpleX protocols.

Source files: Encoding.hs, Encoding/String.hs, Parsers.hs

Overview

Two encoding layers serve different purposes:

  • Encoding — Binary wire format for SMP protocol transmissions. Compact, no delimiters between fields. Used in all on-the-wire protocol messages.
  • StrEncoding — Human-readable string format for configuration, URIs, logs, and JSON serialization. Uses base64url for binary data, decimal for numbers, comma-separated lists, space-separated tuples.

Both are typeclasses with MINIMAL pragmas requiring encode + (decode | parser), with the missing one derived from the other.

Binary Encoding (Encoding class)

class Encoding a where
  smpEncode :: a -> ByteString
  smpDecode :: ByteString -> Either String a  -- default: parseAll smpP
  smpP :: Parser a                            -- default: smpDecode <$?> smpP

Length-prefix conventions

Type Prefix Max size
ByteString 1-byte length (Word8 as Char) 255 bytes
Large (newtype) 2-byte length (Word16 big-endian) 65535 bytes
Tail (newtype) None — consumes rest of input Unlimited
Lists (smpEncodeList) 1-byte count prefix, then concatenated items 255 items
NonEmpty Same as list (fails on count=0) 255 items

Scalar types

Type Encoding Bytes
Char Raw byte 1
Bool 'T' / 'F' (0x54 / 0x46) 1
Word16 Big-endian 2
Word32 Big-endian 4
Int64 Two big-endian Word32s (high then low) 8
SystemTime systemSeconds as Int64 (nanoseconds dropped) 8
Text UTF-8 then ByteString encoding (1-byte length prefix) 1 + len
String B.pack then ByteString encoding 1 + len

Maybe a

Nothing → '0' (0x30)
Just x  → '1' (0x31) ++ smpEncode x

Tags are ASCII characters '0'/'1', not binary 0x00/0x01.

Tuples

Tuples (2 through 8) encode as simple concatenation — no length prefix, no separator. Fields are parsed sequentially using each component's smpP. This works because each component's parser knows how many bytes to consume (via its own length prefix or fixed size).

Combinators

Function Signature Purpose
_smpP Parser a Space-prefixed parser (A.space *> smpP)
smpEncodeList [a] -> ByteString 1-byte count + concatenated items
smpListP Parser [a] Parse count then that many items
lenEncode Int -> Char Int to single-byte length char

String Encoding (StrEncoding class)

class StrEncoding a where
  strEncode :: a -> ByteString
  strDecode :: ByteString -> Either String a  -- default: parseAll strP
  strP :: Parser a                            -- default: strDecode <$?> base64urlP

Key difference from Encoding: the default strP parses base64url input first, then applies strDecode. This means types that only implement strDecode will automatically accept base64url-encoded input.

Instance conventions

Type Encoding
ByteString base64url (non-empty required)
Word16, Word32 Decimal string
Int, Int64 Signed decimal
Char, Bool Delegates to Encoding (smpEncode/smpP)
Maybe a Empty string = Nothing, otherwise strEncode a
Text UTF-8 bytes, parsed until space/newline
SystemTime systemSeconds as Int64 (decimal)
UTCTime ISO 8601 string
CertificateChain Comma-separated base64url blobs
Fingerprint base64url of fingerprint bytes

Collection encoding

Type Separator
Lists (strEncodeList) Comma ,
NonEmpty Comma (fails on empty)
Set a Comma
IntSet Comma
Tuples (2-6) Space ( )

Str newtype

Raw string (not base64url-encoded). Parses until space, consumes trailing space. Used for string-valued protocol fields that should not be base64-encoded.

TextEncoding class

class TextEncoding a where
  textEncode :: a -> Text
  textDecode :: Text -> Maybe a

Separate from StrEncoding — operates on Text rather than ByteString. Used for types that need Text representation (e.g., enum display names).

JSON bridge functions

Function Purpose
strToJSON StrEncoding a => a -> J.Value via decodeLatin1 . strEncode
strToJEncoding Same, for Aeson encoding
strParseJSON StrEncoding a => String -> J.Value -> JT.Parser a — parse JSON string via strP
textToJSON TextEncoding a => a -> J.Value
textToEncoding Same, for Aeson encoding
textParseJSON TextEncoding a => String -> J.Value -> JT.Parser a

Parsers

Source: Parsers.hs

Core parsing functions

Function Signature Purpose
parseAll Parser a -> ByteString -> Either String a Parse consuming all input (fails if bytes remain)
parse Parser a -> e -> ByteString -> Either e a parseAll with custom error type (discards error string)
parseE (String -> e) -> Parser a -> ByteString -> ExceptT e IO a parseAll lifted into ExceptT
parseE' (String -> e) -> Parser a -> ByteString -> ExceptT e IO a Like parseE but allows trailing input
parseRead1 Read a => Parser a Parse a word then readMaybe it
parseString (ByteString -> Either String a) -> String -> a Parse from String (errors with error)

base64P

Standard base64 parser (not base64url — uses +// alphabet). Takes alphanumeric + +// characters, optional = padding, then decodes. Contrast with base64urlP in Encoding/String.hs which uses -/_ alphabet.

JSON options helpers

Platform-conditional JSON encoding for cross-platform compatibility (Haskell ↔ Swift).

Function Purpose
enumJSON All-nullary constructors as strings, with tag modifier
sumTypeJSON Platform-conditional: taggedObjectJSON on non-Darwin, singleFieldJSON on Darwin
taggedObjectJSON {"type": "Tag", "data": {...}} format
singleFieldJSON {"Tag": value} format
defaultJSON Default options with omitNothingFields = True

Pattern synonyms for JSON field names:

  • TaggedObjectJSONTag = "type"
  • TaggedObjectJSONData = "data"
  • SingleFieldJSONTag = "_owsf"

String helpers

Function Purpose
fstToLower Lowercase first character
dropPrefix Remove prefix string, lowercase remainder
textP Parse rest of input as UTF-8 String

Auxiliary Types and Utilities

TMap

Source: TMap.hs

type TMap k a = TVar (Map k a)

STM-based concurrent map. Wraps Data.Map.Strict in a TVar. All mutations use modifyTVar' (strict) to prevent thunk accumulation.

Function Notes
emptyIO IO allocation (newTVarIO)
singleton STM allocation
clear Reset to empty
lookup / lookupIO STM / non-transactional IO read
member / memberIO STM / non-transactional IO membership
insert / insertM Insert value / insert from STM action
delete Remove key
lookupInsert Atomic lookup-then-insert (returns old value)
lookupDelete Atomic lookup-then-delete
adjust / update / alter / alterF Standard Map operations lifted to STM
union Merge Map into TMap

lookupIO/memberIO use readTVarIO — single-read outside STM transaction, useful when you need a snapshot without composing with other STM operations.

SessionVar

Source: Session.hs

Race-safe session management using TMVar + monotonic ID.

data SessionVar a = SessionVar
  { sessionVar   :: TMVar a    -- result slot
  , sessionVarId :: Int        -- monotonic ID from TVar counter
  , sessionVarTs :: UTCTime    -- creation timestamp
  }
Function Purpose
getSessVar Lookup or create session. Returns Left new or Right existing
removeSessVar Delete session only if ID matches (prevents removing a replacement)
tryReadSessVar Non-blocking read of session result

The ID-match check in removeSessVar prevents a race where:

  1. Thread A creates session #5, starts work
  2. Thread B creates session #6 (replacing #5 in TMap)
  3. Thread A finishes, tries to remove — ID mismatch, removal blocked

ServiceScheme

Source: ServiceScheme.hs

data ServiceScheme = SSSimplex | SSAppServer SrvLoc
data SrvLoc = SrvLoc HostName ServiceName

URI scheme for SimpleX service addresses. SSSimplex encodes as "simplex:", SSAppServer as "https://host:port".

simplexChat is the constant SSAppServer (SrvLoc "simplex.chat" "").

SystemTime

Source: SystemTime.hs

newtype RoundedSystemTime (t :: Nat) = RoundedSystemTime { roundedSeconds :: Int64 }
type SystemDate = RoundedSystemTime 86400    -- day precision
type SystemSeconds = RoundedSystemTime 1     -- second precision

Phantom-typed time rounding. The Nat type parameter specifies rounding granularity in seconds.

Function Purpose
getRoundedSystemTime Get current time rounded to t seconds
getSystemDate Alias for day-rounded time
getSystemSeconds Second-precision (no rounding needed, just drops nanoseconds)
roundedToUTCTime Convert back to UTCTime

RoundedSystemTime derives FromField/ToField for SQLite storage and FromJSON/ToJSON for API serialization.

Util

Source: Util.hs

Selected utilities used across the codebase:

Monadic combinators:

Function Signature Purpose
<$?> MonadFail m => (a -> Either String b) -> m a -> m b Lift fallible function into parser
$>>= (Monad m, Monad f, Traversable f) => m (f a) -> (a -> m (f b)) -> m (f b) Monadic bind through nested monad
ifM / whenM / unlessM Monadic conditionals
anyM Short-circuit any for monadic predicates (strict)

Error handling:

Function Purpose
tryAllErrors Catch all exceptions (including async) into ExceptT
catchAllErrors Same with handler
tryAllOwnErrors Catch only "own" exceptions (re-throws async cancellation)
catchAllOwnErrors Same with handler
isOwnException StackOverflow, HeapOverflow, AllocationLimitExceeded
isAsyncCancellation Any SomeAsyncException except own exceptions
catchThrow Catch exceptions, wrap in Left
allFinally tryAllErrors + final + except (like finally for ExceptT)

The own-vs-async distinction is critical: catchOwn/tryAllOwnErrors never swallow async cancellation (ThreadKilled, UserInterrupt, etc.), only synchronous exceptions and resource exhaustion (StackOverflow, HeapOverflow, AllocationLimitExceeded).

STM:

Function Purpose
tryWriteTBQueue Non-blocking bounded queue write, returns success

Database result helpers:

Function Purpose
firstRow Extract first row with transform, or Left error
maybeFirstRow Extract first row as Maybe
firstRow' Like firstRow but transform can also fail

Collection utilities:

Function Purpose
groupOn groupBy using equality on projected key
groupAllOn groupOn after sortOn (groups non-adjacent elements)
toChunks Split list into NonEmpty chunks of size n
packZipWith Optimized ByteString zipWith (direct memory access)

Miscellaneous:

Function Purpose
safeDecodeUtf8 Decode UTF-8 replacing errors with '?'
bshow / tshow show to ByteString / Text
threadDelay' Int64 delay (handles overflow by looping)
diffToMicroseconds / diffToMilliseconds NominalDiffTime conversion
labelMyThread Label current thread for debugging
encodeJSON / decodeJSON ToJSON a => a -> Text / FromJSON a => Text -> Maybe a
traverseWithKey_ Map traversal discarding results

Security notes

  • Length prefix overflow: ByteString encoding uses 1-byte length — silently truncates strings > 255 bytes. Callers must ensure size bounds before encoding. Large extends to 65535 bytes via Word16 prefix.
  • Tail unbounded: Tail consumes all remaining input with no size check. Only safe when total message size is already bounded (e.g., within a padded SMP block).
  • base64 vs base64url: Parsers.base64P uses standard alphabet (+//), while String.base64urlP uses URL-safe alphabet (-/_). Mixing them causes silent decode failures.
  • safeDecodeUtf8: Replaces invalid UTF-8 with '?' rather than failing. Suitable for logging/display, not for security-critical string comparison.