mirror of
https://github.com/simplex-chat/simplexmq.git
synced 2026-07-02 13:32:25 +00:00
smp-server: fix subscriptions memory leak (#1820)
* docs: plan for service subscriptions memory leak * fix: remove leaked service delivery subscriptions The CSAEndServiceSub handler decremented subscription counters but did not remove the per-queue delivery Sub from the service client's subscriptions map. Over queue churn a long-lived service connection accumulated orphaned Sub entries until disconnect, leaking memory. Mirror CSAEndSub via endServiceQueueSub (reusing endSub) so the entry is removed and its delivery thread cancelled. Add a regression test with white-box access to the server Env via runSMPServerBlocking_; verified failing without the fix. * test: remove service subs leak regression test Remove testServiceSubsRemovedOnQueueDelete and the test-only Env exposure (runSMPServerBlocking_, withSmpServerConfigEnvOn, serviceSubsMapSize), leaving only the server fix. * refactor: simplify unsubPrev with applicative Express the cancel-if-present logic as sequence_ (unsub_ <*> s_).
This commit is contained in:
@@ -0,0 +1,57 @@
|
||||
## Root cause: orphaned `Sub` entries in the service client's `subscriptions` map
|
||||
|
||||
**The leak is service-specific and was introduced by PR #1667 "messaging services" (`f0b7a4be`).** A long-lived messaging-service connection accumulates per-queue `Sub` records in its `Client.subscriptions` map that are **never removed** when the associated queues are deleted or unassociated — only the counter is decremented. Over normal queue churn the map grows monotonically for the entire lifetime of the service connection.
|
||||
|
||||
### The proof — an asymmetry between two handlers in `serverThread`
|
||||
|
||||
Both individual queue subscriptions and service subscriptions store a `Sub` per queue in `Client.subscriptions` (= `clientSubs` for the SMP subscriber thread, wired at `Server.hs:189`). When a queue ends/is deleted, the two paths diverge:
|
||||
|
||||
**Individual subscriber — entry IS removed** (`Server.hs:332`, `346`):
|
||||
```haskell
|
||||
CSAEndSub qId -> atomically (endSub c qId) >>= a unsub_ -- :332
|
||||
...
|
||||
endSub c qId = TM.lookupDelete qId (clientSubs c) >>= (removeWhenNoSubs c $>) -- :346
|
||||
```
|
||||
|
||||
**Service subscriber — entry is NOT removed** (`Server.hs:336-340`):
|
||||
```haskell
|
||||
CSAEndServiceSub qId -> atomically $ do
|
||||
modifyTVar' (clientServiceSubs c) decrease -- decrements serviceSubsCount
|
||||
modifyTVar' totalServiceSubs decrease -- decrements global count
|
||||
where decrease = subtractServiceSubs (1, queueIdHash qId)
|
||||
-- never touches (clientSubs c) — the Sub for qId stays forever
|
||||
```
|
||||
|
||||
### Where the orphaned entries are added (both new in this PR)
|
||||
- `Server.hs:1860-1862` — on service subscribe (`SSUB`), one `Sub` inserted per queue that has a pending message.
|
||||
- `Server.hs:2039-2043` (`newServiceDeliverySub`) — on **every** `SEND` to a service-associated queue with no existing sub, a `Sub` is inserted into the service client's `subscriptions`. After delivery the thread state resets to `NoSub` (`:2069`) but the map entry remains as a "already delivering" marker (`:1856-1859`).
|
||||
|
||||
### Why they leak
|
||||
The only places the service client's `subscriptions` map is cleared are:
|
||||
- `clientDisconnected` — `swapTVar subscriptions M.empty` (`Server.hs:1097`) — only on disconnect.
|
||||
- `CSADecreaseSubs` — `swapTVar (clientSubs c) M.empty` (`Server.hs:343`) — only on full service takeover by another connection.
|
||||
- `delQueueAndMsgs` — `TM.lookupDelete entId $ subscriptions clnt` (`Server.hs:2164`) — but `clnt` here is **the recipient deleting its own queue, not the service client**. The service's entry for that queue is reached only via the `CSDeleted → endServiceSub → CSAEndServiceSub` path (`Server.hs:306, 313, 336`), which decrements the counter but leaves the map entry.
|
||||
|
||||
**Concrete scenario (fully traced):** Service `S` subscribes (`SSUB`) and stays connected for days. Recipient `R` owns service-associated queue `Q`. A `SEND` to `Q` inserts a `Sub` into `S.subscriptions[Q]` (`:2042`). `R` later deletes `Q` → `delQueueAndMsgs` runs on `R`'s connection, removes `Q` from `R.subscriptions`, decrements counters, enqueues `CSDeleted Q (Just S)` (`:2167`) → `serverThread` runs `CSAEndServiceSub Q` for `S` (`:336`), decrementing `S.serviceSubsCount` but **leaving `S.subscriptions[Q]` in place**. Net: one orphaned `Sub` (record + 2 TVars) per service-associated queue ever deleted/unassociated, never reclaimed until `S` disconnects. The logical counter `serviceSubsCount` correctly drops, so the map size diverges from the counter — making the leak invisible to the existing service-sub metric.
|
||||
|
||||
### Verdict
|
||||
This is a deterministic, static-provable memory leak — no production logging needed to confirm the existence; the asymmetry between `CSAEndSub` (removes) and `CSAEndServiceSub` (doesn't) is the smoking gun. It is specific to messaging-service certificate clients, which is exactly the population added by the services/certificate PR.
|
||||
|
||||
### Secondary findings (lower impact, same PR area, not the primary cause)
|
||||
- **`forkClient` register-after-fork race** (`Server.hs:1356-1359`): if the forked action's `finally` delete (`:1358`) runs before the parent's `IM.insert` (`:1359`), a `Weak ThreadId` of a dead thread is left in `endThreads` until disconnect. Pre-existing, tiny per-entry, but exercised far more by the PR's higher END/DELD volume.
|
||||
- **Wrong-client counter decrement** (`Server.hs:2166`): `delQueueAndMsgs` decrements `serviceSubsCount` of the *deleting* client, not the service; harmless for non-service deleters (floored at 0) but corrupts accounting if a service deletes its own queue.
|
||||
|
||||
---
|
||||
|
||||
### Recommended fix (mirror `endSub` in the service path)
|
||||
Make `CSAEndServiceSub` also delete the per-queue `Sub` and cancel its delivery thread, exactly as `CSAEndSub`/`endSub` do for individual subscribers. Roughly:
|
||||
|
||||
```haskell
|
||||
CSAEndServiceSub qId -> do
|
||||
s_ <- atomically $ do
|
||||
modifyTVar' (clientServiceSubs c) decrease
|
||||
modifyTVar' totalServiceSubs decrease
|
||||
TM.lookupDelete qId (clientSubs c) <* removeWhenNoSubs c
|
||||
forM_ unsub_ $ \unsub -> mapM_ unsub s_
|
||||
where decrease = subtractServiceSubs (1, queueIdHash qId)
|
||||
```
|
||||
@@ -329,21 +329,25 @@ smpServer started cfg@ServerConfig {transports, transportConfig = tCfg, startOpt
|
||||
endPreviousSubscriptions = mapM_ $ \(c, subAction, evt) -> do
|
||||
atomically $ modifyTVar' pendingEvents $ IM.alter (Just . maybe [evt] (evt <|)) (clientId c)
|
||||
case subAction of
|
||||
CSAEndSub qId -> atomically (endSub c qId) >>= a unsub_
|
||||
where
|
||||
a (Just unsub) (Just s) = unsub s
|
||||
a _ _ = pure ()
|
||||
CSAEndServiceSub qId -> atomically $ do
|
||||
modifyTVar' (clientServiceSubs c) decrease
|
||||
modifyTVar' totalServiceSubs decrease
|
||||
where
|
||||
decrease = subtractServiceSubs (1, queueIdHash qId)
|
||||
CSAEndSub qId -> atomically (endSub c qId) >>= unsubPrev
|
||||
-- like endSub, also removes the delivery subscription from the service client's subscriptions map,
|
||||
-- otherwise the map retains entries for queues unassociated/deleted while the service stays connected.
|
||||
CSAEndServiceSub qId -> atomically (endServiceQueueSub c qId) >>= unsubPrev
|
||||
CSADecreaseSubs changedSubs -> do
|
||||
atomically $ modifyTVar' totalServiceSubs $ subtractServiceSubs changedSubs
|
||||
forM_ unsub_ $ \unsub -> atomically (swapTVar (clientSubs c) M.empty) >>= mapM_ unsub
|
||||
where
|
||||
unsubPrev :: Maybe sub -> IO ()
|
||||
unsubPrev s_ = sequence_ (unsub_ <*> s_)
|
||||
endSub :: Client s -> QueueId -> STM (Maybe sub)
|
||||
endSub c qId = TM.lookupDelete qId (clientSubs c) >>= (removeWhenNoSubs c $>)
|
||||
endServiceQueueSub :: Client s -> QueueId -> STM (Maybe sub)
|
||||
endServiceQueueSub c qId = do
|
||||
modifyTVar' (clientServiceSubs c) decrease
|
||||
modifyTVar' totalServiceSubs decrease
|
||||
endSub c qId
|
||||
where
|
||||
decrease = subtractServiceSubs (1, queueIdHash qId)
|
||||
-- remove client from server's subscribed cients
|
||||
removeWhenNoSubs c = do
|
||||
noClientSubs <- null <$> readTVar (clientSubs c)
|
||||
|
||||
Reference in New Issue
Block a user