6 Commits

Author SHA1 Message Date
sh 209f7826cb smp-server: support namespaces (#1784)
* smp-server: namespaces resolver scaffolding

* smp-server: Names resolver hardening + cleanup

* smp-server: fuse parallel dispatchers

* smp-server: JSON wire format for NameRecord + Names.hs restructure

* smp-server: redact RpcAuth in Show

* smp-server: JSON wire fixups + spec rewrite + small cleanups

* plan: prepend implementation-diverged banner

* move SimplexName into shared module

* smp-server: name + contract whitelist on RSLV

* smp-server: address audit findings (canonical JSON, INI guards, SSRF, TLD case, shutdown)

* smp-server: round 2 audit fixes (label case, response cap, ipv6 link-local)

* smp-server: round 3 audit fixes (SSRF coverage, drop noop closeManager, CSV order)

* smp-server: round 4 audit fixes (0X-hex host, expanded IPv6 forms, pingEndpoint timeout)

* smp-server: hardcode TldRegistries (drop registry_tld_* INI keys)

* smp-server: round 6 audit fixes (IPv6 SSRF, redirects, ASCII labels)

- Reject IPv6 aliases of 169.254.169.254 (IPv4-compatible / IPv4-mapped /
  6to4 / NAT64) via numeric range check on parsed IPv6.
- Disable HTTP redirects on the Eth RPC request.
- Restrict SimplexName labels to ASCII (Cyrillic/Greek/full-width otherwise
  hash to different on-chain records and diverge from UTS-46 registrars).
- pingEndpoint: only JsonRpcErr means "reachable"; transport/decode failures
  fail startup. boundedIniInt: readMaybe over partial read.
- Add 127.0.0.0/8 and 0.0.0.0 to isLoopback.
- Replace hand-rolled hex helpers with Data.ByteArray.Encoding; raise
  managerConnCount to match rpcMaxConcurrency; hex Show for NameOwner.
- Fuse parallel http/https when into unless+case; drop reverse/re-reverse
  in mkDomain TLDWeb; first AbiInvariantViolated; Nothing <$ decodeAddress;
  forM_ (eitherToMaybe ...); >>= chain in NameOwner FromJSON.
- Drop dead imports/exports/pragmas and two restating comments.
- Tests: factor unsafeOwner/unsafeLink, addr1/2/3, testNamesConfig; add
  non-ASCII label rejection coverage.

* namespace: bound parser input to 253 bytes (DoS defense)

The bare-name fallback and bareDomain parser would otherwise consume
arbitrarily many non-space bytes via takeWhile1 before any validation
or length check. A crafted multi-megabyte token would be decoded as
UTF-8 and re-parsed in full before being rejected.

Introduce `boundedNonSpace` (scan with 253-byte cap) at the two
takeWhile1 sites. Inputs longer than 253 bytes leave residue that
parseOnly's implicit endOfInput rejects, so the parser fails fast
without ever allocating the full input.

The bound is the DNS full-domain limit, chosen for being a familiar
ceiling generous enough to cover any realistic SimpleX name (longest
plausible @user.subdomain.simplex stays well under 100 bytes). No
per-label cap — SimpleX names don't go through DNS label resolution
and there's no semantic reason to constrain individual labels.

* namespace: switch to Python HTTP resolver + agent plumbing (#1796)

* namespace: relax resolver_endpoint validation (path prefix, http without auth)

validateUrl gains two operator-friendly relaxations and a regression test:

- Allow a path prefix (e.g. https://gw.example.com:443/snrc) for a resolver
  behind a reverse-proxy sub-path; /resolve/<name> and /health are appended
  (HttpResolver already strips one trailing slash, so root and sub-path
  behave identically). Query/fragment/userinfo stay rejected.

- Off-loopback, reject only http WITH resolver_auth (the Authorization header
  would travel in cleartext). http without auth is now allowed (no secret to
  leak; resolver data is public — also lets dev setups reach a host resolver
  via http://host.docker.internal). https is always allowed, with or without
  auth. Plain http has no response integrity; intended for trusted/local
  networks only.

Exports validateUrl and adds validateUrlSpec (11 cases) to SMPNamesTests.

* namespace: NameRecord links as arrays (multi-link, cap 5)

* namespace: distinct RSLV error responses

RSLV collapsed every non-hit (no resolver, malformed name, not found,
backing-store failure) to ERR AUTH, so a client iterating its configured
servers could not tell "this router has no resolver, try the next" from
"name not registered, stop", and a transient backend error read as an
authoritative miss.

Names capability is runtime config, orthogonal to the linear SMP version
(a future v21 router without [NAMES] must still advertise v21), so it is
signalled by a command-time error like allowSMPProxy, not by the version
range:

  no resolver configured -> ERR CMD PROHIBITED  (client skips, tries next)
  backing-store failure   -> ERR INTERNAL        (transient: retry/surface)
  not found / malformed   -> ERR AUTH            (authoritative "no such name")

Update the protocol spec error table and add agent tests for the
no-resolver (CMD PROHIBITED) and backend-failure (INTERNAL) paths.

* refactor(names): server role + one error type

Addresses epoberezkin's review (PR #1784). Name resolution becomes a
server role like proxy; the agent owns resolution + server selection;
one error type flows through the whole stack.

- ServerRoles gains `names`; UserServers gains `nameSrvs` (opt-in list);
  resolveSimplexName drops the explicit server arg and picks a
  names-capable server via getNextServer.
- RSLV carries SimplexNameDomain (was RslvRequest): no JSON on the wire,
  contract dropped, name validated at parse (invalid -> CMD SYNTAX).
- Version check moves from the encoder to Client.hs (no ERR to server).
- ErrorType.NAME {nameErr :: NameErrorType} (+ AgentErrorType.NAME),
  wire- and JSON-encoded; resolver errors surface with diagnostics.
  Success response renamed NAME -> RNAME to free the collision.
- NameOwner -> EthAddress (record selector); NameRecord derives FromJSON
  and gains field-ordered Encoding; per-field caps removed.
- Remove newEnvWithNames / runSMPServerBlockingWithNames test seams;
  stub resolver folded into ServerConfig.namesResolverCall_.

* test(server): update stats backup line count

NameResolverStatsData adds 6 lines to the server stats backup (the
"rslvStats:" header plus the reqs/succ/notFound/resolverErrs/disabled
fields), so testRestoreMessages' expected stats-backup line count is
95 -> 101.

* feat(names): public-namespace resolution via RSLV/RNAME

SNRC names resolver role: RSLV command -> HTTP resolver -> RNAME record.
Agent owns server selection (ServerRoles.names); NAME error family; async,
concurrency-bounded resolution; length-prefixed extensible wire; spec.

* remove comments

Co-authored-by: Evgeny <evgeny@poberezkin.com>

* simplify

* move tests name

* simplify: text addresses, Tail JSON, drop admitRslv

* fix

* remove spaghetti

* reduce diff

* async again, refactor

* different threads limit for name resolutions

* remove comment

* FromField instance for SimplexNameInfo

* remove comments

* unStrJSON

* add sameConnShortLink

* remove scheme prefix

* remove unused import

* remove connecttarget tests

* remove comment

* comment

---------

Co-authored-by: Evgeny Poberezkin <evgeny@poberezkin.com>
Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com>
2026-06-30 22:54:55 +01:00
sh be58967a86 smp-server: fix subscriptions memory leak (#1820)
* docs: plan for service subscriptions memory leak

* fix: remove leaked service delivery subscriptions

The CSAEndServiceSub handler decremented subscription counters but did
not remove the per-queue delivery Sub from the service client's
subscriptions map. Over queue churn a long-lived service connection
accumulated orphaned Sub entries until disconnect, leaking memory.

Mirror CSAEndSub via endServiceQueueSub (reusing endSub) so the entry
is removed and its delivery thread cancelled.

Add a regression test with white-box access to the server Env via
runSMPServerBlocking_; verified failing without the fix.

* test: remove service subs leak regression test

Remove testServiceSubsRemovedOnQueueDelete and the test-only Env
exposure (runSMPServerBlocking_, withSmpServerConfigEnvOn,
serviceSubsMapSize), leaving only the server fix.

* refactor: simplify unsubPrev with applicative

Express the cancel-if-present logic as sequence_ (unsub_ <*> s_).
2026-06-30 15:38:45 +00:00
Evgeny Poberezkin e250a9ec9d Merge branch 'stable' 2026-06-16 06:50:14 +01:00
Evgeny 9f9b6c8e88 crypto: BBS scheme for anonymous credentials with multiple presentations (#1794)
* crypto: BBS scheme for anonymous credentials with multiple presentations

* verify

* add files to sources

* more files

* more files, use cabal 3.0

* fix path

* extensions

* switch libbbs to fork

* return either from keygen

* use only secret key to sign

* improve FFI

* simplify

* update libbbs to support iOS

* add commoncrypto flag

* bump libbbs

* reject input of wrong length

* ci: get submodules

---------

Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com>
2026-06-15 09:44:11 +01:00
Evgeny f0b7a4be73 messaging services (#1667)
* smp server: messaging services (#1565)

* smp server: refactor message delivery to always respond SOK to subscriptions

* refactor ntf subscribe

* cancel subscription thread and reduce service subscription count when queue is deleted

* subscribe rcv service, deliver sent messages to subscribed service

* subscribe rcv service to messages (TODO delivery on subscription)

* WIP

* efficient initial delivery of messages to subscribed service

* test: delivery to client with service certificate

* test: upgrade/downgrade to/from service subscriptions

* remove service association from agent API, add per-user flag to use the service

* agent client (WIP)

* service certificates in the client

* rfc about drift detection, and SALL to mark end of message delivery

* fix test

* fix test

* add function for postgresql message storage

* update migration

* servers: maintain xor-hash of all associated queue IDs in PostgreSQL (#1668)

* servers: maintain xor-hash of all associated queue IDs in PostgreSQL (#1615)

* ntf server: maintain xor-hash of all associated queue IDs via PostgreSQL triggers

* smp server: xor hash with triggers

* fix sql and using pgcrypto extension in tests

* track counts and hashes in smp/ntf servers via triggers, smp server stats for service subscription, update SMP protocol to pass expected count and hash in SSUB/NSSUB commands

* agent migrations with functions/triggers

* remove agent triggers

* try tracking service subs in the agent (WIP, does not compile)

* Revert "try tracking service subs in the agent (WIP, does not compile)"

This reverts commit 59e908100d.

* comment

* agent database triggers

* service subscriptions in the client

* test / fix client services

* update schema

* fix postgres migration

* update schema

* move schema test to the end

* use static function with SQLite to avoid dynamic wrapper

* agent: fail when per-connection transport isolation is used with services (#1670)

* agent: service subscription events (#1671)

* agent: use server keyhash when loading service record

* agent: process queue/service associations with delayed subscription results

* agent: service subscription events

* agent: finalize initial service subscriptions, remove associations on service ID changes (#1672)

* agent: remove service/queue associations when service ID changes

* agent: check that service ID in NEW response matches session ID in transport session

* agent subscription WIP

* test

* comment

* enable tests

* update queries

* agent: option to add SQLite aggregates to DB connection  (#1673)

* agent: add build_relations_vector function to sqlite

* update aggregate

* use static aggregate

* remove relations

---------

Co-authored-by: Evgeny Poberezkin <evgeny@poberezkin.com>

* add test, treat BAD_SERVICE as temp error, only remove queue associations on service errors

* add packZipWith for backward compatibility with GHC 8.10.7

---------

Co-authored-by: spaced4ndy <8711996+spaced4ndy@users.noreply.github.com>

* servers: service stats and logging, allow services without option (removed), report errors during service message delivery, remove threads when service subscription ended (#1676)

* smp server: always allow services without option

* smp server: maintain IDs hash in session subscription states

* smp server: service message delivery error handling

* ntf server: log subscription count and hash differences

* smp server: remove delivery threads when service subscription ended/client disconnected

* agent: remove service queue association when service ID changed, process ENDS event, test migrating to/from service (#1677)

* agent: remove service queue association when service ID changed

* agent: process ENDS event

* agent: send service subscription error event

* agent: test migrating to/from service subscriptions, fixes

* agent: always remove service when disabled, fix service subscriptions

* ntf server: use different client certs for each SMP server, remove support for store log (#1681)

* ntf server: remove support for store log

* ntf server: use different client certificates for each SMP server

* smp protocol: fix encoding for SOKS/ENDS responses (#1683)

* agent: create user with option to enable client service (#1684)

* agent: create user with option to enable client service

* handle HTTP2 errors

* do not catch async exceptions

* agent: minor fixes

* docs: update protocol (#1705)

* docs: agent threat model

* update protocol docs

* update RFCs (#1730)

* update RFCs

* update

* update overview

* update terminology

* original language in threat model

---------

Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com>

* docs: fix minor issues in protocols

* docs: add e2e encrypted message wire encoding to PQDR spec

* docs: add missing encodings and other protocol corrections

* docs: move implemented rfcs

* smp: service fixes (#1737)

* smp: deliver service subscription to correct client

* tests: more resilient to concurrency

* optimize PostgreSQL query

* fix service re-association after server "downgrade"

* correctly handle service removed from server (and ID changed)

* remove unused

---------

Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com>

* prometheus: fix metrics names (#1747)

* test: rcv service re-association on restart (#1746)

* agent: correct log message

* docs: update whitepaper

* smp: fix messaging client service issues (#1751)

* services: fix minor issues

* fix accounting for subscribed service queues, add prometheus stats

* fix uncorrelated subquery

* fix potential race condition when inserting service defensively, as it is also prevented by how client is created

---------

Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com>

* agent: refactor cleanup if no pending subs (#1757)

* smp server: batch processing of subscription messages (#1753)

* smp server: batch processing of subscription messages

* refactor

* empty line

* fix

---------

Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com>

* smp: batch queue association updates on subscriptions (#1760)

* smp: batch queue association updates on subscriptions

* refactor to fused batching

* simpler

* batch assoc functions

* clean up

* fix

---------

Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com>

* agent: use primary key index in setRcvServiceAssocs (#1783)

* agent: use primary key index in setRcvServiceAssocs

Previous WHERE rcv_id = ? did not match the (host, port, rcv_id)
primary key prefix and fell back to a table scan via
idx_rcv_queues_client_notice_id. With ~390k rows per queue, each
update in a 1350-row batch scanned the whole table, yielding ~290s
per batch and a multi-hour rcv-services migration.

* agent: pass SMPServer explicitly to setRcvServiceAssocs

Avoid extracting host/port from the first queue inside setRcvServiceAssocs.
The caller already has SMPServer in scope (from tSess) and the call chain
is short, so threading it through is simpler than inspecting the list.
Removes the empty-list guard from setRcvServiceAssocs (it remains in
processRcvServiceAssocs).

---------

Co-authored-by: spaced4ndy <8711996+spaced4ndy@users.noreply.github.com>
Co-authored-by: Evgeny @ SimpleX Chat <259188159+evgeny-simplex@users.noreply.github.com>
Co-authored-by: sh <37271604+shumvgolove@users.noreply.github.com>
2026-05-21 14:14:03 +01:00
sh 8833e5c1b5 xftp-server: support postgresql backend (#1755)
* xftp: add PostgreSQL backend design spec

* update doc

* adjust styling

* add implementation plan

* refactor: move usedStorage from FileStore to XFTPEnv

* refactor: add getUsedStorage, getFileCount, expiredFiles store functions

* refactor: change file store operations from STM to IO

* refactor: extract FileStoreClass typeclass, move STM impl to Store.STM

* refactor: make XFTPEnv and server polymorphic over FileStoreClass

* feat: add PostgreSQL store skeleton with schema migration

* feat: implement PostgresFileStore operations

* feat: add PostgreSQL INI config, store dispatch, startup validation

* feat: add database import/export CLI commands

* test: add PostgreSQL backend tests

* fix: map ForeignKeyViolation to AUTH in addRecipient

When a file is concurrently deleted while addRecipient runs, the FK
constraint on recipients.sender_id raises ForeignKeyViolation. Previously
this propagated as INTERNAL; now it returns AUTH (file not found).

* fix: only decrement usedStorage for uploaded files on expiration

expireServerFiles unconditionally subtracted file_size from usedStorage
for every expired file, including files that were never uploaded (no
file_path). Since reserve only increments usedStorage during upload,
expiring never-uploaded files caused usedStorage to drift negative.

* fix: handle setFilePath error in receiveServerFile

setFilePath result was discarded with void. If it failed (file deleted
concurrently, or double-upload where file_path IS NULL guard rejected
the second write), the server still reported FROk, incremented stats,
and left usedStorage permanently inflated. Now the error is checked:
on failure, reserved storage is released and AUTH is returned.

* fix: escape double quotes in COPY CSV status field

The status field (e.g. "blocked,reason=spam,notice={...}") is quoted in
CSV for COPY protocol, but embedded double quotes from BlockingInfo
notice (JSON) were not escaped. This could break CSV parsing during
import. Now double quotes are escaped as "" per CSV spec.

* fix: reject upload to blocked file in Postgres setFilePath

In Postgres mode, getFile returns a snapshot TVar for fileStatus. If a
file is blocked between getFile and setFilePath, the stale status check
passes but the upload should be rejected. Added status = 'active' to
the UPDATE WHERE clause so blocked files cannot receive uploads.

* fix: add CHECK constraint on file_size > 0

Prevents negative or zero file_size values at the database level.
Without this, corrupted data from import or direct DB access could
cause incorrect storage accounting (getUsedStorage sums file_size,
and expiredFiles casts to Word32 which wraps negative values).

* fix: check for existing data before database import

importFileStore now checks if the target database already contains
files and aborts with an error. Previously, importing into a non-empty
database would fail mid-COPY on duplicate primary keys, leaving the
database in a partially imported state.

* fix: clean up disk file when setFilePath fails in receiveServerFile

When setFilePath fails (file deleted or blocked concurrently, or
duplicate upload), the uploaded file was left orphaned on disk with
no DB record pointing to it. Now the file is removed on failure,
matching the cleanup in the receiveChunk error path.

* fix: check storeAction result in deleteOrBlockServerFile_

The store action result (deleteFile/blockFile) was discarded with void.
If the DB row was already deleted by a concurrent operation, the
function still decremented usedStorage, causing drift. Now the error
propagates via ExceptT, skipping the usedStorage adjustment.

* fix: check deleteFile result in expireServerFiles

deleteFile result was discarded with void. If a concurrent delete
already removed the file, deleteFile returned AUTH but usedStorage
was still decremented — causing double-decrement drift. Now the
usedStorage adjustment and filesExpired stat only run on success.

* refactor: merge STM store into Store.hs, parameterize server tests

- Move STMFileStore and its FileStoreClass instance from Store/STM.hs
  back into Store.hs — the separate file was unnecessary indirection
  for the always-present default implementation.

- Parameterize xftpFileTests over store backend using HSpec SpecWith
  pattern (following SMP's serverTests approach). The same 11 tests
  now run against both memory and PostgreSQL backends via a bracket
  parameter, eliminating all *Pg test duplicates.

- Extract shared run* functions (runTestFileChunkDeliveryAddRecipients,
  runTestWrongChunkSize, runTestFileChunkExpiration, runTestFileStorageQuota)
  from inlined test bodies.

* refactor: clean up per good-code review

- Remove internal helpers from Postgres.hs export list (withDB, withDB',
  handleDuplicate, assertUpdated, withLog are not imported externally)
- Replace local isNothing_ with Data.Maybe.isNothing in Env.hs
- Consolidate duplicate/unused imports in XFTPStoreTests.hs
- Add file_path IS NULL and status guards to STM setFilePath, matching
  the Postgres implementation semantics

* test: parameterize XFTP server, agent and CLI tests over store backend

- xftpTest/xftpTest2/xftpTest4/xftpTestN now take XFTPTestBracket as
  first argument, enabling the same test to run against both memory
  and PostgreSQL backends.

- xftpFileTests (server tests), xftpAgentFileTests (agent tests), and
  xftpCLIFileTests (CLI tests) are SpecWith-parameterized suites that
  receive the bracket from HSpec's before combinator.

- Test.hs runs each parameterized suite twice: once with
  xftpMemoryBracket, once with xftpPostgresBracket (CPP-guarded).

- STM-specific tests (store log restore/replay) stay in memory-only
  xftpAgentTests. SNI/CORS tests stay in memory-only xftpServerTests.

* refactor: remove dead test wrappers after parameterization

Remove old non-parameterized test wrapper functions that were
superseded by the store-backend-parameterized test suites.
All test bodies (run* and _ functions) are preserved and called
from the parameterized specs. Clean up unused imports.

* feat: add manual tests and guide

* refactor: merge file_size CHECK into initial migration

* refactor: extract rowToFileRec shared by getFile sender/recipient paths

* refactor: parameterize XFTPServerConfig over store type

Embed XFTPStoreConfig s as serverStoreCfg field, matching SMP's
ServerConfig. runXFTPServer and newXFTPServerEnv now take a single
XFTPServerConfig s. Restore verifyCmd local helper structure.

* refactor: minimize diff in tests

Restore xftpServerTests and xftpAgentTests bodies to match master
byte-for-byte (only type signatures change for XFTPTestBracket
parameterization); inline the runTestXXX helpers that were split
on this branch.

* refactor: restore getFile position to match master

* refactor: rename withSTMFile back to withFile

* refactor: close store log inside closeFileStore for STM backend

Move STM store log close responsibility into closeFileStore to
match PostgresFileStore, removing the asymmetry where only PG's
close was self-contained.

STMFileStore holds the log in a TVar populated by newXFTPServerEnv
after readWriteFileStore; stopServer no longer needs the explicit
withFileLog closeStoreLog call. Writes still go through XFTPEnv.storeLog
via withFileLog (unchanged).

* refactor: rename XFTPTestBracket to XFTPTestServer

* fix: move file_size check from PG schema to store log import

* refactor: use SQL-standard type names in XFTP schema

* perf: batch expired file deletions with deleteFiles

* refactor: stream export instead of loading recipients into memory

* refactor: parameterize XFTP store with FSType singleton dispatch

* refactor: minimize diff per review feedback

* refactor: use types over strings, deduplicate parser

* refactor: always parse database store type, fail at startup

* fix compilation without postgresql

* refactor: always parse database store type, fail at startup
2026-04-16 09:06:04 +01:00