17 KiB
Client Services
How service certificates enable bulk queue subscriptions: identity lifecycle, queue association, service subscription flow, tracking, reconnection, and notification server usage. This is the cross-cutting view spanning transport, protocol, server, client, agent, and store layers.
For agent-internal subscription tracking (TSessionSubs service state, active/pending promotion), see agent/infrastructure.md. For the router subscription model and delivery mechanics, see subscriptions.md. For the full implementation reference with types, wire encoding, test gaps, security invariants, and risk analysis, see rcv-services.md.
- Overview
- Service identity lifecycle
- Queue-service association
- Service subscription flow
- Service tracking in TSessionSubs
- Reconnection and graceful degradation
- Notification server usage
Overview
Source: Server.hs, Client.hs, Agent/Client.hs
A service client is a high-volume SMP client (notification router, chat relay, directory service) that presents a TLS client certificate during handshake. The router assigns it a persistent ServiceId derived from the certificate fingerprint. Individual queues are then associated with this ServiceId via per-queue SUB commands carrying a service signature. Once associated, the service client can bulk-subscribe all its queues with a single SUBS command instead of O(n) individual SUB commands on each reconnection.
Service client SMP Router
| |
|---- TLS + service cert --------->| Three-way handshake
|<--- ServiceId -------------------| (Transport layer)
| |
|---- SUB + service sig ---------->| Per-queue association
|<--- SOK(ServiceId) --------------| (one-time per queue)
| |
|---- SUBS count idsHash --------->| Bulk subscribe
|<--- SOKS count' idsHash' --------| (server's actual state)
|<--- MSG ... MSG ... MSG ---------| Buffered messages
|<--- ALLS ------------------------| All delivered
Two version gates control feature availability: serviceCertsSMPVersion (v16) enables the service handshake, SOK, and dual signatures; rcvServiceSMPVersion (v19) adds count+hash parameters to SUBS/NSUBS/SOKS/ENDS and enables the messaging service role (SRMessaging). Below v19, SUBS/NSUBS exist but are sent without parameters.
Service identity lifecycle
Source: Transport.hs, Agent/Client.hs, Agent/Store/AgentStore.hs
Credential generation
The agent generates a self-signed X.509 certificate per (userId, server) pair on first use via getServiceCredentials. The certificate is generated with genCredentials using a long validity period and is stored in the client_services table along with the private signing key and certificate fingerprint. The ServiceId column is NULL until the first successful handshake.
Three-way handshake
Standard SMP handshake is two messages (server sends SMPServerHandshake, client sends SMPClientHandshake). When the client includes service credentials, an optional third message is added:
- Router -> Client: standard
SMPServerHandshake - Client -> Router:
SMPClientHandshakewithSMPClientHandshakeService {serviceRole, serviceCertKey}. TheserviceCertKeycontains the TLS client certificate chain plus a proof-of-possession - a fresh per-session Ed25519 key pair signed by the X.509 signing key. - Router -> Client:
SMPServerHandshakeResponse {serviceId}. The router verifies the certificate chain matches the TLS peer certificate, extracts the fingerprint, and callsgetCreateServiceto find or create aServiceIdfor that fingerprint.
The per-session Ed25519 key (not the X.509 key) is used to sign SUBS/NSUBS commands. This limits exposure - compromising a session key does not compromise the long-term service identity.
Dual signature scheme
When the TLS handshake established a service identity (the client has a THClientService) and the command is NEW, SUB, or NSUB (per useServiceAuth), authTransmission appends two signatures:
- The entity key signs over
serviceCertHash || transmission- binding the service identity to the queue operation - The service session key signs over
transmissionalone
This prevents MITM service substitution within TLS: an attacker cannot replace the service certificate hash without invalidating the entity key signature.
Version-gated role filtering
Messaging services (SRMessaging) are suppressed below v19 - mkClientService returns Nothing for messaging role when the router version is below rcvServiceSMPVersion. Notifier services (SRNotifier) are sent at v16+. This allows gradual rollout - routers can support notification service certificates before full messaging service support.
Queue-service association
Source: Server.hs, Server/QueueStore.hs
Queues are associated with services through per-queue SUB commands (with service signature) or at creation time via NEW. The router stores rcvServiceId :: Maybe ServiceId on each QueueRec.
sharedSubscribeQueue - four cases
sharedSubscribeQueue handles the intersection of client type and existing association:
Case 1: Service client, queue already associated with this service - Duplicate association (retry after lost response). If no service subscription exists yet, increments the client's service queue count.
Case 2: Service client, queue not yet associated (or different service) - Calls setQueueService to persist the association in QueueRec, increments client's serviceSubsCount by (1, queueIdHash rId).
Case 3: Non-service client, queue has service association - Calls setQueueService with Nothing to remove the association. This is the migration path when a user disables services.
Case 4: Non-service client, no service association - Standard per-queue subscription, no service involvement.
Association persistence
The setQueueService function in QueueStore updates rcvServiceId on the queue record and maintains the service's aggregate queue set (STMService.serviceRcvQueues). The set and its XOR hash are updated atomically. Associations persist across client disconnect - only live subscription state is cleaned up, not the stored rcvServiceId.
IdsHash - XOR-based drift detection
IdsHash is a 16-byte value computed as XOR of MD5 hashes of individual queue IDs. XOR is self-inverse, so both addServiceSubs and subtractServiceSubs use the same <> (XOR) operator for the hash component. The count field prevents collision - two different queue sets with the same XOR could have different counts.
Service subscription flow
Source: Server.hs, Client.hs, Agent/Client.hs
SUBS command processing
subscribeServiceMessagesreceivesSUBS count idsHashfrom the client.sharedSubscribeServicequeriesgetServiceQueueCountHashfor the router's actual count and hash. In one STM transaction, setsclientServiceSubscribed = Trueand swaps the client's service subs counter to the server's actual values (computing a delta). In a separate STM transaction, enqueues aCSServiceevent (carrying the delta) tosubQ.serverThreadprocesses this asynchronously: adds the client tosubClients, subtracts the delta fromtotalServiceSubs(preventing double-counting of per-queue accumulated counts), and upserts intoserviceSubscribers(displacing any previous subscriber).- Returns
SOKS count' idsHash'immediately - the client can compare expected vs actual to detect drift.
deliverServiceMessages and ALLS
If this is a new subscription (not duplicate), the router forks deliverServiceMessages:
foldRcvServiceMessagesiterates all queues associated with the service.- For each queue with a pending message:
getSubscriptioncreates aSubin the client'ssubscriptionsTMap if not already present (returningNothingfor duplicates). If a new Sub is created,setDeliveredrecords the message and the MSG event is written tomsgQimmediately. - Queue errors are accumulated in a list whose initial value is
[(NoCorrId, NoEntity, ALLS)]. Errors are prepended, so ALLS ends up as the last event. - After the fold completes, the accumulated events (errors plus ALLS) are written to
msgQin one batch.
MSG events are delivered individually during the fold (not accumulated), while ALLS is deferred to the end - this ensures ALLS arrives only after all pending messages have been sent.
If the subscription is a duplicate (hasSub is True), deliverServiceMessages is NOT forked - only SOKS is returned.
On-demand Sub creation for new messages
When a new message arrives for a service-associated queue via tryDeliverMessage, the router looks up the subscriber in serviceSubscribers (by ServiceId) rather than queueSubscribers (by QueueId). If no Sub exists in the client's subscriptions TMap (the fold hasn't reached this queue yet, or the queue was associated after SUBS), newServiceDeliverySub creates one on the fly. The fold's getSubscription performs the same check. STM serialization ensures at most one path creates the Sub for a given queue.
Service displacement
When a new service client subscribes to the same ServiceId and the previous subscriber is a different, still-connected client, cancelServiceSubs atomically zeros out the old client's service subs counter and prepares an ENDS count idsHash event. endPreviousSubscriptions first inserts ENDS into pendingEvents (for deferred delivery via sendPendingEvtsThread), then subtracts the changed subs from totalServiceSubs, swaps out the old client's individual subscription map to empty, and cancels per-queue Subs. The old client's fold thread (if still running from deliverServiceMessages) continues writing to the old client's msgQ until ALLS, then exits.
Service tracking in TSessionSubs
Source: Agent/TSessionSubs.hs, Agent/Client.hs
Aggregate tracking - service queues are not in activeSubs
When a queue has both a matching serviceId and serviceAssoc = True, it is tracked only via the count and hash in activeServiceSub, not in the activeSubs TMap. Callers pre-separate queues into two lists before calling batchAddActiveSubs: non-service queues go to activeSubs, service-associated queues are counted via updateActiveService. A queue on a service-capable session but with serviceAssoc = False still lands in activeSubs normally. Consequence: hasActiveSub(rId) returns False for service-associated queues - callers must check the service subscription separately.
Session ID gating
setActiveServiceSub only promotes the service subscription from pending to active if the session ID matches the current TLS session. If a reconnection occurred between sending SUBS and receiving SOKS, the stale response is kept as pending rather than promoted. This prevents a response from an old session from corrupting the new session's state.
State transitions
- setPendingServiceSub: stores expected
ServiceSubbefore SUBS is sent - setActiveServiceSub: promotes to active after SOKS, with session ID validation
- updateActiveService: incrementally builds the active service sub as individual queues return
SOK(Just serviceId)- used when per-queue SUBs succeed with service association - setServiceSubPending_: demotes active to pending on disconnect (called by
setSubsPending) - deleteServiceSub: clears both active and pending on ENDS
Service events
| Event | When |
|---|---|
SERVICE_UP srv result |
SUBS succeeded; ServiceSubResult carries any drift errors (count/hash/serviceId mismatch) |
SERVICE_DOWN srv sub |
Client disconnected while service was subscribed |
SERVICE_ALL srv |
ALLS received - all buffered messages delivered |
SERVICE_END srv sub |
ENDS received - another service client took over |
All are entity-less (AENone) events.
Reconnection and graceful degradation
Source: Agent/Client.hs
updateClientService - credential synchronization
After each SMP connection, updateClientService reconciles the agent's stored ServiceId with the router's:
- ServiceId matches: normal path, no action needed
- ServiceId changed (router data was reset): calls
removeRcvServiceAssocsto clear all queue-service associations for this server, forcing re-association via individual SUBs - Router lost service support (version downgrade): calls
deleteClientServiceto remove the local service record entirely - Router returned ServiceId without credentials: logs error (should not happen)
Resubscription ordering
On reconnect, the resubscription worker processes the pending service subscription before individual queues. This ensures the service context is established before queue-level SUB commands that depend on it (the router uses clntServiceId from the TLS session for queue-service association).
Fallback to individual subscriptions
resubscribeClientService handles two error classes by falling back to unassocSubscribeQueues:
SSErrorServiceId- the router returned a different ServiceId than expectedclientServiceError- matchesNO_SERVICE,SERVICE, andPROXY(BROKER NO_SERVICE)errors
unassocSubscribeQueues deletes the client_services row, sets rcv_service_assoc = 0 on all queues, and resubscribes them individually. This is the nuclear recovery path - service state is fully reset, and the next connection will generate fresh credentials.
Agent store triggers
The agent's client_services table tracks service_queue_count and service_queue_ids_hash. SQLite triggers on rcv_queues automatically maintain these counters when rcv_service_assoc changes. The triggers use simplex_xor_md5_combine - the SQLite equivalent of Haskell's queueIdHash <>. On credential update (new cert), service_id is set to NULL via ON CONFLICT DO UPDATE, forcing a fresh handshake.
Notification server usage
Source: Notifications/Server.hs, Notifications/Server/Env.hs
The notification server is the primary consumer of service certificates for the SRNotifier role. It manages thousands to millions of SMP queue subscriptions per SMP router.
Credential management
NtfServerConfig.useServiceCreds controls whether the NTF server uses service certificates. On first use per SMP router, mkDbService generates a self-signed TLS certificate (stored in the smp_servers table) and reuses it across connections.
Startup subscription
If a stored service subscription exists, subscribeSrvSubs sends NSUBS first (one command for all associated queues), then subscribes all queues individually in batches via subscribeQueuesNtfs (including service-associated queues, which were previously associated via NSUB).
Recovery path
On CAServiceUnavailable (irrecoverable service error, e.g., ServiceId mismatch after cert rotation), removeServiceAndAssociations performs nuclear DB cleanup: clears all service credentials, resets counters, and removes all ntf_service_assoc flags. The caller then resubscribes all queues individually via subscribeSrvSubs. The Postgres schema uses xor_combine triggers (equivalent to the agent's SQLite triggers) to maintain per-SMP-server notifier count and hash.
NSUBS vs SUBS
NSUBS uses the same sharedSubscribeService for registration in serviceSubscribers but does not fork deliverServiceMessages. Notification delivery is handled by the separate deliverNtfsThread which uses serviceSubscribers to look up the subscribed service client for each notification queue. Consequently, there is no ALLS signal for NSUBS subscriptions.