13 KiB
Encoding
Binary and string encoding used across all SimpleX protocols.
Source files: Encoding.hs, Encoding/String.hs, Parsers.hs
Overview
Two encoding layers serve different purposes:
Encoding— Binary wire format for SMP protocol transmissions. Compact, no delimiters between fields. Used in all on-the-wire protocol messages.StrEncoding— Human-readable string format for configuration, URIs, logs, and JSON serialization. Uses base64url for binary data, decimal for numbers, comma-separated lists, space-separated tuples.
Both are typeclasses with MINIMAL pragmas requiring encode + (decode | parser), with the missing one derived from the other.
Binary Encoding (Encoding class)
class Encoding a where
smpEncode :: a -> ByteString
smpDecode :: ByteString -> Either String a -- default: parseAll smpP
smpP :: Parser a -- default: smpDecode <$?> smpP
Length-prefix conventions
| Type | Prefix | Max size |
|---|---|---|
ByteString |
1-byte length (Word8 as Char) | 255 bytes |
Large (newtype) |
2-byte length (Word16 big-endian) | 65535 bytes |
Tail (newtype) |
None — consumes rest of input | Unlimited |
Lists (smpEncodeList) |
1-byte count prefix, then concatenated items | 255 items |
NonEmpty |
Same as list (fails on count=0) | 255 items |
Scalar types
| Type | Encoding | Bytes |
|---|---|---|
Char |
Raw byte | 1 |
Bool |
'T' / 'F' (0x54 / 0x46) |
1 |
Word16 |
Big-endian | 2 |
Word32 |
Big-endian | 4 |
Int64 |
Two big-endian Word32s (high then low) | 8 |
SystemTime |
systemSeconds as Int64 (nanoseconds dropped) |
8 |
Text |
UTF-8 then ByteString encoding (1-byte length prefix) | 1 + len |
String |
B.pack then ByteString encoding |
1 + len |
Maybe a
Nothing → '0' (0x30)
Just x → '1' (0x31) ++ smpEncode x
Tags are ASCII characters '0'/'1', not binary 0x00/0x01.
Tuples
Tuples (2 through 8) encode as simple concatenation — no length prefix, no separator. Fields are parsed sequentially using each component's smpP. This works because each component's parser knows how many bytes to consume (via its own length prefix or fixed size).
Combinators
| Function | Signature | Purpose |
|---|---|---|
_smpP |
Parser a |
Space-prefixed parser (A.space *> smpP) |
smpEncodeList |
[a] -> ByteString |
1-byte count + concatenated items |
smpListP |
Parser [a] |
Parse count then that many items |
lenEncode |
Int -> Char |
Int to single-byte length char |
String Encoding (StrEncoding class)
class StrEncoding a where
strEncode :: a -> ByteString
strDecode :: ByteString -> Either String a -- default: parseAll strP
strP :: Parser a -- default: strDecode <$?> base64urlP
Key difference from Encoding: the default strP parses base64url input first, then applies strDecode. This means types that only implement strDecode will automatically accept base64url-encoded input.
Instance conventions
| Type | Encoding |
|---|---|
ByteString |
base64url (non-empty required) |
Word16, Word32 |
Decimal string |
Int, Int64 |
Signed decimal |
Char, Bool |
Delegates to Encoding (smpEncode/smpP) |
Maybe a |
Empty string = Nothing, otherwise strEncode a |
Text |
UTF-8 bytes, parsed until space/newline |
SystemTime |
systemSeconds as Int64 (decimal) |
UTCTime |
ISO 8601 string |
CertificateChain |
Comma-separated base64url blobs |
Fingerprint |
base64url of fingerprint bytes |
Collection encoding
| Type | Separator |
|---|---|
Lists (strEncodeList) |
Comma , |
NonEmpty |
Comma (fails on empty) |
Set a |
Comma |
IntSet |
Comma |
| Tuples (2-6) | Space ( ) |
Str newtype
Raw string (not base64url-encoded). Parses until space, consumes trailing space. Used for string-valued protocol fields that should not be base64-encoded.
TextEncoding class
class TextEncoding a where
textEncode :: a -> Text
textDecode :: Text -> Maybe a
Separate from StrEncoding — operates on Text rather than ByteString. Used for types that need Text representation (e.g., enum display names).
JSON bridge functions
| Function | Purpose |
|---|---|
strToJSON |
StrEncoding a => a -> J.Value via decodeLatin1 . strEncode |
strToJEncoding |
Same, for Aeson encoding |
strParseJSON |
StrEncoding a => String -> J.Value -> JT.Parser a — parse JSON string via strP |
textToJSON |
TextEncoding a => a -> J.Value |
textToEncoding |
Same, for Aeson encoding |
textParseJSON |
TextEncoding a => String -> J.Value -> JT.Parser a |
Parsers
Source: Parsers.hs
Core parsing functions
| Function | Signature | Purpose |
|---|---|---|
parseAll |
Parser a -> ByteString -> Either String a |
Parse consuming all input (fails if bytes remain) |
parse |
Parser a -> e -> ByteString -> Either e a |
parseAll with custom error type (discards error string) |
parseE |
(String -> e) -> Parser a -> ByteString -> ExceptT e IO a |
parseAll lifted into ExceptT |
parseE' |
(String -> e) -> Parser a -> ByteString -> ExceptT e IO a |
Like parseE but allows trailing input |
parseRead1 |
Read a => Parser a |
Parse a word then readMaybe it |
parseString |
(ByteString -> Either String a) -> String -> a |
Parse from String (errors with error) |
base64P
Standard base64 parser (not base64url — uses +// alphabet). Takes alphanumeric + +// characters, optional = padding, then decodes. Contrast with base64urlP in Encoding/String.hs which uses -/_ alphabet.
JSON options helpers
Platform-conditional JSON encoding for cross-platform compatibility (Haskell ↔ Swift).
| Function | Purpose |
|---|---|
enumJSON |
All-nullary constructors as strings, with tag modifier |
sumTypeJSON |
Platform-conditional: taggedObjectJSON on non-Darwin, singleFieldJSON on Darwin |
taggedObjectJSON |
{"type": "Tag", "data": {...}} format |
singleFieldJSON |
{"Tag": value} format |
defaultJSON |
Default options with omitNothingFields = True |
Pattern synonyms for JSON field names:
TaggedObjectJSONTag = "type"TaggedObjectJSONData = "data"SingleFieldJSONTag = "_owsf"
String helpers
| Function | Purpose |
|---|---|
fstToLower |
Lowercase first character |
dropPrefix |
Remove prefix string, lowercase remainder |
textP |
Parse rest of input as UTF-8 String |
Auxiliary Types and Utilities
TMap
Source: TMap.hs
type TMap k a = TVar (Map k a)
STM-based concurrent map. Wraps Data.Map.Strict in a TVar. All mutations use modifyTVar' (strict) to prevent thunk accumulation.
| Function | Notes |
|---|---|
emptyIO |
IO allocation (newTVarIO) |
singleton |
STM allocation |
clear |
Reset to empty |
lookup / lookupIO |
STM / non-transactional IO read |
member / memberIO |
STM / non-transactional IO membership |
insert / insertM |
Insert value / insert from STM action |
delete |
Remove key |
lookupInsert |
Atomic lookup-then-insert (returns old value) |
lookupDelete |
Atomic lookup-then-delete |
adjust / update / alter / alterF |
Standard Map operations lifted to STM |
union |
Merge Map into TMap |
lookupIO/memberIO use readTVarIO — single-read outside STM transaction, useful when you need a snapshot without composing with other STM operations.
SessionVar
Source: Session.hs
Race-safe session management using TMVar + monotonic ID.
data SessionVar a = SessionVar
{ sessionVar :: TMVar a -- result slot
, sessionVarId :: Int -- monotonic ID from TVar counter
, sessionVarTs :: UTCTime -- creation timestamp
}
| Function | Purpose |
|---|---|
getSessVar |
Lookup or create session. Returns Left new or Right existing |
removeSessVar |
Delete session only if ID matches (prevents removing a replacement) |
tryReadSessVar |
Non-blocking read of session result |
The ID-match check in removeSessVar prevents a race where:
- Thread A creates session #5, starts work
- Thread B creates session #6 (replacing #5 in TMap)
- Thread A finishes, tries to remove — ID mismatch, removal blocked
ServiceScheme
Source: ServiceScheme.hs
data ServiceScheme = SSSimplex | SSAppServer SrvLoc
data SrvLoc = SrvLoc HostName ServiceName
URI scheme for SimpleX service addresses. SSSimplex encodes as "simplex:", SSAppServer as "https://host:port".
simplexChat is the constant SSAppServer (SrvLoc "simplex.chat" "").
SystemTime
Source: SystemTime.hs
newtype RoundedSystemTime (t :: Nat) = RoundedSystemTime { roundedSeconds :: Int64 }
type SystemDate = RoundedSystemTime 86400 -- day precision
type SystemSeconds = RoundedSystemTime 1 -- second precision
Phantom-typed time rounding. The Nat type parameter specifies rounding granularity in seconds.
| Function | Purpose |
|---|---|
getRoundedSystemTime |
Get current time rounded to t seconds |
getSystemDate |
Alias for day-rounded time |
getSystemSeconds |
Second-precision (no rounding needed, just drops nanoseconds) |
roundedToUTCTime |
Convert back to UTCTime |
RoundedSystemTime derives FromField/ToField for SQLite storage and FromJSON/ToJSON for API serialization.
Util
Source: Util.hs
Selected utilities used across the codebase:
Monadic combinators:
| Function | Signature | Purpose |
|---|---|---|
<$?> |
MonadFail m => (a -> Either String b) -> m a -> m b |
Lift fallible function into parser |
$>>= |
(Monad m, Monad f, Traversable f) => m (f a) -> (a -> m (f b)) -> m (f b) |
Monadic bind through nested monad |
ifM / whenM / unlessM |
Monadic conditionals | |
anyM |
Short-circuit any for monadic predicates (strict) |
Error handling:
| Function | Purpose |
|---|---|
tryAllErrors |
Catch all exceptions (including async) into ExceptT |
catchAllErrors |
Same with handler |
tryAllOwnErrors |
Catch only "own" exceptions (re-throws async cancellation) |
catchAllOwnErrors |
Same with handler |
isOwnException |
StackOverflow, HeapOverflow, AllocationLimitExceeded |
isAsyncCancellation |
Any SomeAsyncException except own exceptions |
catchThrow |
Catch exceptions, wrap in Left |
allFinally |
tryAllErrors + final + except (like finally for ExceptT) |
The own-vs-async distinction is critical: catchOwn/tryAllOwnErrors never swallow async cancellation (ThreadKilled, UserInterrupt, etc.), only synchronous exceptions and resource exhaustion (StackOverflow, HeapOverflow, AllocationLimitExceeded).
STM:
| Function | Purpose |
|---|---|
tryWriteTBQueue |
Non-blocking bounded queue write, returns success |
Database result helpers:
| Function | Purpose |
|---|---|
firstRow |
Extract first row with transform, or Left error |
maybeFirstRow |
Extract first row as Maybe |
firstRow' |
Like firstRow but transform can also fail |
Collection utilities:
| Function | Purpose |
|---|---|
groupOn |
groupBy using equality on projected key |
groupAllOn |
groupOn after sortOn (groups non-adjacent elements) |
toChunks |
Split list into NonEmpty chunks of size n |
packZipWith |
Optimized ByteString zipWith (direct memory access) |
Miscellaneous:
| Function | Purpose |
|---|---|
safeDecodeUtf8 |
Decode UTF-8 replacing errors with '?' |
bshow / tshow |
show to ByteString / Text |
threadDelay' |
Int64 delay (handles overflow by looping) |
diffToMicroseconds / diffToMilliseconds |
NominalDiffTime conversion |
labelMyThread |
Label current thread for debugging |
encodeJSON / decodeJSON |
ToJSON a => a -> Text / FromJSON a => Text -> Maybe a |
traverseWithKey_ |
Map traversal discarding results |
Security notes
- Length prefix overflow:
ByteStringencoding uses 1-byte length — silently truncates strings > 255 bytes. Callers must ensure size bounds before encoding.Largeextends to 65535 bytes via Word16 prefix. Tailunbounded:Tailconsumes all remaining input with no size check. Only safe when total message size is already bounded (e.g., within a padded SMP block).- base64 vs base64url:
Parsers.base64Puses standard alphabet (+//), whileString.base64urlPuses URL-safe alphabet (-/_). Mixing them causes silent decode failures. safeDecodeUtf8: Replaces invalid UTF-8 with'?'rather than failing. Suitable for logging/display, not for security-critical string comparison.