support-bot: plans adversarial review

2026-05-11 11:04:56 +00:00 · 2026-04-21 09:13:36 +00:00
parent c0bdf53ef9
commit 40b814c423
2 changed files with 272 additions and 29 deletions
@@ -38,8 +38,9 @@ SimpleX Chat support bot — standalone Node.js app using `simplex-chat-nodejs`

 ```
 apps/simplex-support-bot/
-├── package.json          # deps: simplex-chat, @simplex-chat/types, async-mutex
+├── package.json          # deps: simplex-chat, @simplex-chat/types, async-mutex; devDeps: vitest, @types/node
 ├── tsconfig.json         # ES2022, strict, Node16 module resolution
+├── vitest.config.ts      # test runner config, path aliases for mocks
 ├── src/
 │   ├── index.ts          # Entry: parse config, init instance, run
 │   ├── config.ts         # CLI arg parsing, ID:name validation, Config type
@@ -48,6 +49,11 @@ apps/simplex-support-bot/
 │   ├── grok.ts           # GrokApiClient: xAI API wrapper, system prompt, history
 │   ├── messages.ts       # All user-facing message templates
 │   └── util.ts           # isWeekend, profileMutex, logging helpers
+├── bot.test.ts           # Vitest suite (129 tests, 28 describes)
+├── test/
+│   └── __mocks__/
+│       ├── simplex-chat.js        # MockChatApi + utility re-exports
+│       └── simplex-chat-types.js  # enum re-exports for tests
 └── data/                 # SQLite databases (created at runtime)
 ```

@@ -63,12 +69,41 @@ The Grok system-prompt / context file is supplied at runtime via `--context-file
 | `--team-group` | Yes | — | `name` | Team group display name (auto-created if absent, resolved by persisted ID on restarts) |
 | `--auto-add-team-members` / `-a` | No | `""` | `ID:name,...` | Comma-separated team member contacts. Validated at startup — exits on mismatch. |
 | `--context-file` | Required when `GROK_API_KEY` set | — | path | Grok system-prompt file (SimpleX documentation context). `parseConfig` throws if `GROK_API_KEY` is set without this flag. |
-| `--timezone` | No | `"UTC"` | IANA tz | For weekend detection (24h vs 48h). Weekend = Sat 00:00 – Sun 23:59 in this tz. |
-| `--complete-hours` | No | `3` | number | Hours of customer inactivity after last team/Grok reply before auto-completing a conversation (✅) |
-| `--card-flush-seconds` | No | `300` | number | Seconds between card dashboard update flushes |
+| `--timezone` | No | `"UTC"` | IANA tz | For weekend detection (24h vs 48h). Weekend = Sat 00:00 – Sun 23:59 in this tz. `parseConfig` validates the value by constructing a probe `Intl.DateTimeFormat` and throws with a clear error on `RangeError` (invalid IANA zone) — bot exits before init. |
+| `--complete-hours` | No | `3` | integer ≥ 0 | Hours of customer inactivity after last team/Grok reply before auto-completing a conversation (✅). `parseConfig` rejects non-numeric, negative, or `NaN` values with a fail-fast error. `0` is allowed and disables auto-complete. |
+| `--card-flush-seconds` | No | `300` | integer ≥ 0 | Seconds between card dashboard update flushes. `parseConfig` rejects non-numeric, negative, or `NaN` values with a fail-fast error. `0` is allowed and disables periodic flush (card updates still occur on explicit `scheduleUpdate` callers but never auto-drain). |

 **Env vars:** `GROK_API_KEY` (optional) — xAI API key. If unset or empty, the bot starts with Grok support fully disabled: it logs `"No GROK_API_KEY provided, disabling Grok support"`, skips Grok profile/contact setup and event handler registration, omits `/grok` from the bot command list, drops the `/grok` clause from customer-facing messages, and treats any `/grok` the customer still types as an unknown command.

+**Numeric argument validation:** `parseConfig` MUST validate every numeric flag (`--complete-hours`, `--card-flush-seconds`) using a helper that throws on non-finite or negative results, rather than raw `parseInt`:
+
+```typescript
+function parseNonNegativeInt(raw: string, flag: string): number {
+  const n = parseInt(raw, 10)
+  if (!Number.isFinite(n) || n < 0) {
+    throw new Error(`${flag} must be a non-negative integer, got "${raw}"`)
+  }
+  return n
+}
+
+const completeHours     = parseNonNegativeInt(optionalArg(args, "--complete-hours",     "3"),   "--complete-hours")
+const cardFlushSeconds  = parseNonNegativeInt(optionalArg(args, "--card-flush-seconds", "300"), "--card-flush-seconds")
+```
+
+Rationale: `parseInt("foo", 10)` returns `NaN`, and `NaN * 3600_000 === NaN`. Every subsequent comparison (`now - lastTeamGrokTime >= completeMs`) is `false`, so the feature silently becomes a no-op — auto-complete never fires, cards never auto-refresh — and the operator has no signal that they typo'd a flag. Failing fast at startup surfaces the typo before customers interact. `0` is explicitly allowed as a valid "disable" setting.
+
+**Timezone validation:** `parseConfig` MUST validate `--timezone` by constructing a probe `Intl.DateTimeFormat`:
+
+```typescript
+try {
+  new Intl.DateTimeFormat("en-US", {timeZone: timezone, weekday: "short"})
+} catch (err) {
+  throw new Error(`--timezone "${timezone}" is not a valid IANA time zone: ${(err as Error).message}`)
+}
+```
+
+Rationale: `isWeekend` is called from `queueMessage` and `teamAddedMessage` — both run on the hot customer message path. `new Intl.DateTimeFormat(..., {timeZone: <invalid>, ...})` throws `RangeError: Invalid time zone specified` at every call. Without startup validation, a typo in `--timezone` turns every `/grok`, `/team`, or first-customer-message dispatch into an unhandled error that crashes the per-item handler (though the outer try/catch in `onNewChatItems` contains it, customers receive no reply at all). Validating once at startup surfaces the typo in the operator's console before any customer interaction.
+
 ```typescript
 interface Config {
  dbPrefix: string
@@ -92,10 +127,58 @@ Only two keys. All other state is persisted in the group's `customData` (per-con

 **Grok contact resolution** (state-file lookup always runs; contact establishment only when enabled):
 1. Read `grokContactId` from state file → validate via `apiListContacts` → set `config.grokContactId` (this always runs, even when `grokApiKey === null`, so the one-way gate can identify and remove Grok members from groups)
-2. If not found and `grokEnabled`: main profile creates one-time invite link, Grok profile connects, wait `contactConnected` (60s), persist new contact ID
+2. If not found and `grokEnabled`: main profile creates one-time invite link, Grok profile connects, wait for a `contactConnected` event filtered by profile identity (60s — see "Grok contact identification" below), persist the resulting `contactId` atomically before proceeding.
 3. If unavailable (with Grok otherwise enabled), bot runs but `/grok` returns "temporarily unavailable"
 4. If `grokApiKey === null`: the Grok profile is not resolved or created, no invite link is issued — but `config.grokContactId` is still set from the state file if the contact exists.

+### Grok contact identification
+
+`grokContactId` is written once and used forever — it is the single identifier for every subsequent Grok check (one-way gate, `onMemberConnected` skip, `isGrok` in card rendering). Identification MUST be narrowly scoped so that the `contactId` stored is unambiguously Grok's and no other contact completing a handshake in the 60s establishment window can be latched by mistake.
+
+Use the predicate form of `ChatApi.wait`. The signature (defined in `node_modules/simplex-chat/src/api.ts:217`) is:
+
+```typescript
+wait<K extends CEvt.Tag>(
+  event: K,
+  predicate: ((event: ChatEvent & {type: K}) => boolean) | undefined,
+  timeout: number,
+): Promise<ChatEvent & {type: K} | undefined>
+```
+
+The implementation (api.ts:234) keeps the subscriber attached when the predicate returns `false`, so non-matching events are silently discarded and the wait continues until a matching event arrives or the timeout fires.
+
+Identification accepts only a `contactConnected` event observed by the MAIN profile (the profile whose `apiCreateLink` issued the invite, and whose `contactId` we persist and later pass to `apiAddMember`) whose connecting contact's profile `displayName` equals the Grok profile's displayName:
+
+```typescript
+const grokProfileName = grokUser.profile.displayName   // "Grok" or "Grok AI"
+const evt = await chat.wait(
+  "contactConnected",
+  (e) =>
+    e.user.userId === mainUser.userId &&
+    e.contact.profile.displayName === grokProfileName,
+  60_000,
+)
+if (!evt) {
+  console.error(`Timeout waiting for Grok contact (60s, displayName="${grokProfileName}"). ` +
+    `Check SMP relay availability or re-run after clearing state. Exiting.`)
+  process.exit(1)
+}
+config.grokContactId = evt.contact.contactId
+state.grokContactId = config.grokContactId
+writeState(stateFilePath, state)   // atomic: tmp-file + rename (see §13 state persistence)
+log(`Grok contact established: ID=${config.grokContactId} (displayName="${grokProfileName}")`)
+```
+
+Filter rationale:
+- `e.user.userId === mainUser.userId` selects the main profile's view of the handshake. Both profiles observe the handshake (the Grok-side event describes the main profile as the `contact`); only the main-side event carries the `contactId` we need for subsequent `apiAddMember` calls.
+- `e.contact.profile.displayName === grokProfileName` accepts only the contact whose profile matches the Grok profile just created/updated. This rejects stray inbound contacts (late business-request acceptance, operator test DM, a reconnect of an existing contact) that may complete in the same 60s window. The displayName is read from `evt.contact.profile`, which is `LocalProfile` (see `@simplex-chat/types/src/types.ts:2867`).
+
+`grokProfileName` is captured from `grokUser.profile.displayName` immediately before the wait, so whichever name the Grok profile was created/updated with earlier in startup is the exact string matched here.
+
+Single-tenant deployment caveat: if a human contact happens to set its SimpleX displayName to the literal `"Grok"` (or `"Grok AI"`) and completes a handshake with the main profile in the 60s window, the displayName filter alone cannot distinguish them. MVP is single-tenant and Grok's profile is created by the bot itself, so this is not expected in practice; deployments that need stronger guarantees can add a second filter (e.g. `e.contact.profile.image === grokImage` — the bot knows the exact image bytes it assigned to the Grok profile).
+
+Persistence: `writeState` is atomic (tmp-file + `fs.renameSync`, see §13 "State persistence") so a crash between identification and persistence cannot corrupt the state file. `state.grokContactId` is flushed to disk BEFORE proceeding to bot event wiring — if the process dies after wiring but before persistence, the next startup would issue a second invite link and leave the first Grok contact orphaned in the database.
+
 **Team group resolution** (auto-create):
 1. Read `teamGroupId` from state file → validate via group list
 2. If not found: create with `apiNewGroup`, persist new group ID
@@ -115,6 +198,8 @@ Per-conversation state is stored in the group's `customData` and written at the

 **WELCOME detection:** customData has no `state` field until the bot handles the first transition. `deriveState` returns `WELCOME` precisely when `customData.state` is absent.

+**Type vs. persisted state.** The `ConversationState` union in `cards.ts` enumerates all five conceptual states (`WELCOME | QUEUE | GROK | TEAM-PENDING | TEAM`) so event handlers and composition can reason about them uniformly. However, `WELCOME` is NEVER written to `customData.state` — the runtime invariant is "persisted state ∈ {QUEUE, GROK, TEAM-PENDING, TEAM}; absence of the `state` field derives as WELCOME". The `isConversationState` guard in `cards.ts` rejects `WELCOME` on read to preserve this invariant (any stale `state: "WELCOME"` from a crashed transition is treated as absent). Do NOT introduce a separate `PersistedState` type in MVP — the invariant is small enough to enforce at two choke points: `getRawCustomData` on read and the dispatch handlers on write.
+
 **State-write matrix:**

 | Bot-observed event | `customData.state` written |
@@ -188,7 +273,7 @@ Card is two messages. **Message 1 (card text):**

 **Message count:** All messages in chat history except the bot's own (`groupSnd` from main profile).

-**Message preview:** Last several messages, most recent last, separated by ` / `. Newlines in message text are replaced with spaces to prevent card layout bloat from spam. The customer's display name is also sanitized (newlines → spaces) for the card header, but the `/join` command uses the raw name so it matches the actual group profile. Newest messages are prioritized — when the total exceeds ~1000 chars, the oldest messages are truncated (with `[truncated]` prepended) while the newest are always shown. When truncation occurs, the first visible message is guaranteed to have a sender prefix even if it was a continuation in the original sequence. Each message is prefixed with the sender's name (`Name: message`) on the first message in a consecutive run from that sender - subsequent messages from the same sender omit the prefix until a different sender's message appears. Sender identification: Grok contact is detected by `grokContactId` and labeled "Grok"; the customer is identified by matching `memberId` to the group's `customerId` and labeled with their display name; all other members use their `memberProfile.displayName`. Bot's own messages (`groupSnd`) are excluded. Each message truncated to ~200 chars. Media-only messages show type labels: `[image]`, `[file]`, `[voice]`, `[video]`.
+**Message preview:** Last several messages, most recent last, separated by ` / `. Newlines in message text are replaced with spaces to prevent card layout bloat from spam. The customer's display name is also sanitized (newlines → spaces) for the card header, but the `/join` command uses the raw name so it matches the actual group profile. Newest messages are prioritized — when the total exceeds ~500 chars (`maxTotal = 500` in `composeCard`), the oldest messages are truncated (with `[truncated]` prepended) while the newest are always shown. When truncation occurs, the first visible message is guaranteed to have a sender prefix even if it was a continuation in the original sequence. Each message is prefixed with the sender's name (`Name: message`) on the first message in a consecutive run from that sender - subsequent messages from the same sender omit the prefix until a different sender's message appears. Sender identification: Grok contact is detected by `grokContactId` and labeled "Grok"; the customer is identified by matching `memberId` to the group's `customerId` and labeled with their display name; all other members use their `memberProfile.displayName`. Bot's own messages (`groupSnd`) are excluded. Each message truncated to ~200 chars. Media-only messages show type labels: `[image]`, `[file]`, `[voice]`, `[video]`.

 **Join command:** `/join groupId:name` — `groupId` is the customer group's ID, `name` is the customer's display name. Names with spaces single-quoted: `/join 42:'First Last'`.

@@ -203,17 +288,17 @@ Card is two messages. **Message 1 (card text):**

 **Update** (delete + repost) — on every subsequent event (new customer msg, team/Grok reply, state change, agent join):
 1. Read `{cardItemId, joinItemId}` from `customData`
-2. Delete old card + join command via `apiDeleteChatItems([Group, teamGroupId], [cardItemId, joinItemId], "broadcast")` — ignore errors
-3. Post new card text + `/join` command as two messages → get new IDs
-4. Overwrite `customData` with new `{cardItemId, joinItemId}`
+2. Delete old card + join command via `apiDeleteChatItems([Group, teamGroupId], [cardItemId, joinItemId], "broadcast")`. Per `simplex-chat/src/api.ts:436-445` the call either returns `T.ChatItemDeletion[]` (possibly empty if the items no longer exist) or throws `ChatCommandError`. Both outcomes are acceptable: the surrounding `try { ... } catch { /* log and continue */ }` allows execution to proceed whether the items were still present, already gone, or the server returned a transient error.
+3. Post new card text + `/join` command as two messages via `apiSendMessages` → get new IDs. **On failure** the partial-failure policy below applies: log, re-queue this groupId into `pendingUpdates`, return without writing `customData`.
+4. Write `{cardItemId, joinItemId, complete?}` to `customData` via `mergeCustomData`. **On failure** the tracking-write policy below applies.

 **Debouncing:** Card updates debounced globally — pending changes flushed every `cardFlushSeconds` seconds (default 300, configurable via `--card-flush-seconds`). Within a batch, each group's card reposted at most once with latest state.

 **Wait time rules:** Time since the customer's last unanswered message. For ✅ (auto-completed) conversations, the wait field shows the literal string "done". If customer sends a follow-up, wait time resets to count from that message.

-**Auto-complete:** A conversation is marked ✅ when `completeHours` (default 3h, configurable via `--complete-hours`) have passed since the last team/Grok message **without any customer reply**. The card debounce flush (every 15 min) checks elapsed time and transitions to ✅ when the threshold is met. Customer follow-up at any point — including after ✅ — reverts to the derived active icon (👋/💬/⏰ for team states, 🟡/🔴 for queue), and wait time resets from that message.
+**Auto-complete:** A conversation is marked ✅ when `completeHours` (default 3h, configurable via `--complete-hours`) have passed since the last team/Grok message **without any customer reply**. The card debounce flush (every 300 seconds / 5 min, configurable via `--card-flush-seconds`) checks elapsed time and transitions to ✅ when the threshold is met. Customer follow-up at any point — including after ✅ — reverts to the derived active icon (👋/💬/⏰ for team states, 🟡/🔴 for queue), and wait time resets from that message.

-**Card icon state machine (TEAM states):**
+**Card icon derivation (TEAM states) — computed at each card render by comparing the timestamps of the most recent customer and team/Grok messages in the group; nothing about the icon is stored:**
 ```
 Team added, no reply yet         → 👋
 Team replied                     → 💬
@@ -226,6 +311,67 @@ Customer sends after ✅           → back to 💬 or ⏰ (derived from wait ti

 **Restart recovery:** On startup, `CardManager.refreshAllCards()` lists all groups, finds those with `customData.cardItemId` set and `customData.complete` not set, sorts by `cardItemId` ascending (higher ID = more recently updated), and re-posts them oldest-first so the most recently active cards end up at the bottom of the team group. Completed cards (`complete: true`) and old/pre-bot groups (no `customData`) are skipped. Old card messages are deleted before reposting; deletion failures (e.g., >24h old) are silently ignored. Individual card failures are caught and logged without aborting the batch.

+### Partial-failure and retry policy
+
+`createCard` and `updateCard` perform a multi-step sequence (delete + send + customData write). To design the correct policy we MUST be explicit about which failures the SimpleX core already handles for us vs. which surface to the bot:
+
+**SimpleX core semantics** (per `simplex-chat/src/api.ts` JSDoc):
+- `apiSendMessages` — "Network usage: background". The call returns `newChatItems` once the chat item is CREATED LOCALLY (written to SQLite) and the SMP broadcast is QUEUED. The core's background machinery retries relay delivery transparently — **the bot never observes a transient relay failure from `apiSendMessages`**. A thrown `ChatCommandError` means the local create step itself failed: permission denied, chat does not exist, invalid content, DB locked/corrupted.
+- `apiDeleteChatItems` — "Network usage: background". Same pattern: local delete + queued broadcast + core-managed delivery retry. A thrown error means the local delete step failed (item not found, permission, DB error).
+- `apiSetGroupCustomData` — "Network usage: **no**". Pure local SQLite write, no SMP involvement at all. A thrown error means a local DB error.
+
+Consequence: failures surfaced to the bot are **terminal local errors** (bad state, DB problem, permission change), not transient network blips. Retrying the same operation against the same DB/relay state will usually hit the same error. Retry value comes from the narrow slice of genuinely transient local conditions — a brief SQLite lock held by a concurrent write, a race with group-state mutation elsewhere in the same process — where the next attempt sees a different state.
+
+This reshapes the policy: the bot does not need aggressive retry for "network" reasons (core handles that), and compensating actions for customData-write failure are rarely useful (if the pure-local customData write fails, the retry's customData write will almost certainly fail for the same reason). The bot needs a light safety net: re-queue on any step failure, let the flush loop try again at most once per `cardFlushSeconds`, and on persistent failure accept that operator intervention is needed.
+
+Policy (applies to both `createCard` and `updateCard`):
+
+**Any step fails** — whether step 2 (delete), step 3 (send), or step 4 (customData write):
+- Log via `logError` with `{groupId, step, err}` so the operator can diagnose the underlying cause (permission change, DB corruption, bot removed from team group, etc).
+- Re-add `groupId` to `pendingUpdates` via `this.scheduleUpdate(groupId)`.
+- Return. Do NOT attempt compensating actions (no compensating delete for tracking-write failure — the scenario where send succeeds locally but customData write fails requires the SQLite DB to be healthy-then-unhealthy between two synchronous calls in the same transaction window, which is not a realistic transient state; the retry path handles any resulting duplicate by reading the stale `cardItemId` and deleting it on the next update attempt).
+
+**Flush dispatch** — the current `flush` loop calls `updateCard` unconditionally and `updateCard` returns early when `customData.cardItemId` is unset. This silently drops the retry path for a failed `createCard` — the group is in `pendingUpdates` but nothing will ever create a card for it. Replace with a single `flushOne(groupId)` that reads `customData` once and dispatches to create or update:
+
+```typescript
+private async flushOne(groupId: number): Promise<void> {
+  const groupInfo = await this.getGroupInfo(groupId)
+  if (!groupInfo) return                            // group deleted
+  const customData = this.deriveCustomData(groupInfo)
+  if (customData.complete) return                   // ✅ conversations don't auto-repost
+  if (typeof customData.cardItemId === "number") {
+    await this.updateCard(groupId, groupInfo)
+  } else {
+    await this.createCard(groupId, groupInfo)
+  }
+}
+
+async flush(): Promise<void> {
+  const groups = [...this.pendingUpdates]
+  this.pendingUpdates.clear()
+  for (const groupId of groups) {
+    try { await this.flushOne(groupId) }
+    catch (err) {
+      logError(`flush failed for group ${groupId}`, err)
+      this.scheduleUpdate(groupId)                  // re-queue on any thrown error
+    }
+  }
+}
+```
+
+Retry behavior for each failure point under this design:
+
+| Failure point | `customData` after failure | Retry's `flushOne` path | Retry outcome if condition cleared |
+|---|---|---|---|
+| `createCard` send fails | `cardItemId` absent | create-path | fresh card posted, `customData` written |
+| `updateCard` delete fails | old `cardItemId` still set | update-path | delete retried (idempotent — see below) + send + write |
+| `updateCard` send fails (delete succeeded) | old (now-deleted) `cardItemId` still set | update-path | delete retried against stale ID — tolerated (see below) — then send + write |
+| `updateCard` write fails (send succeeded, duplicate may exist) | old `cardItemId` still set, new card orphaned in team group | update-path | delete retried against stale old ID — tolerated — new card posted, tracking written; **leaked** new-card from the failed attempt persists until operator removes it |
+
+**Delete idempotency on retry** — `apiDeleteChatItems` against already-deleted IDs returns either an empty `ChatItemDeletion[]` or throws `ChatCommandError`. The step-2 `try { ... } catch { logError(...) }` swallows both; execution proceeds to step 3. Do NOT escalate a step-2 error to the partial-failure policy — that would create a retry loop for a permanent condition (items past the 24h deletion window will throw on every retry forever).
+
+**Persistent failures** — if the underlying condition is not transient (bot removed from team group, DB corruption, permission revoked), every retry hits the same error and the group stays in `pendingUpdates` indefinitely, logging at each flush. MVP accepts this — the operator-visible log stream makes the problem diagnosable. A bounded-retry-with-backoff-and-giveup strategy can be added later without changing the failure-point table above.
+
 ### Card implementation

 ```typescript
@@ -390,6 +536,51 @@ async function withProfile<T>(userId: number, fn: () => Promise<T>): Promise<T>

 Grok HTTP API calls are made **outside** the mutex to avoid blocking.

+**Per-group customData mutex** — `mergeCustomData` and `clearCustomData` must be serialized per customer group. `mergeCustomData` has two awaits (read via `getRawCustomData` → `apiListGroups`, then write via `apiSetGroupCustomData`); between them the event loop runs, so two concurrent async chains operating on the same `groupId` can both read the same snapshot, both produce a merged object, and the second write clobbers the first's patch.
+
+Concrete call sites that can overlap on one `groupId`:
+- `processMainChatItem` writing `state` transitions (WELCOME→QUEUE, WELCOME→GROK, QUEUE→GROK, one-way gate →TEAM)
+- `activateGrok`'s `revertStateOnFail` (fire-and-forget) racing with subsequent customer messages
+- `activateTeam` writing `TEAM-PENDING` racing with `/grok` or another `/team` on the same group
+- `CardManager.flush → updateCard` writing `{cardItemId, joinItemId, complete}` racing with dispatch writing `state`
+- `createCard` writing `{cardItemId, joinItemId}` immediately after dispatch writes `state`
+
+The CAS-on-state inside `revertStateOnFail` guards only the `state` key — other keys (`cardItemId`, `joinItemId`, `complete`) can still be lost when spread from a stale snapshot.
+
+Implementation:
+
+```typescript
+// In CardManager
+private customDataMutexes = new Map<number, Mutex>()
+
+private getCustomDataMutex(groupId: number): Mutex {
+  let m = this.customDataMutexes.get(groupId)
+  if (!m) { m = new Mutex(); this.customDataMutexes.set(groupId, m) }
+  return m
+}
+
+async mergeCustomData(groupId: number, patch: Partial<CardData>): Promise<void> {
+  return this.getCustomDataMutex(groupId).runExclusive(async () => {
+    const current = (await this.getRawCustomData(groupId)) ?? {}
+    const merged = {...current, ...patch}
+    for (const key of Object.keys(merged) as (keyof CardData)[]) {
+      if (merged[key] === undefined) delete merged[key]
+    }
+    await this.withMainProfile(() => this.chat.apiSetGroupCustomData(groupId, merged))
+  })
+}
+
+async clearCustomData(groupId: number): Promise<void> {
+  return this.getCustomDataMutex(groupId).runExclusive(() =>
+    this.withMainProfile(() => this.chat.apiSetGroupCustomData(groupId))
+  )
+}
+```
+
+Nesting rule: the per-group customData mutex is the **outer** lock; `profileMutex` (via `withMainProfile`) is the **inner** lock. Never acquire them in the opposite order, and never hold the customData mutex while calling an external (non-SimpleX) async function — this prevents cross-group deadlock and keeps the critical section short.
+
+Cleanup: entries in `customDataMutexes` are bounded by the number of customer groups. Removing the entry on `onLeftMember(customer)` is sufficient (the group's `customData` is also cleared at that point). Skip this refinement in MVP if acceptable — a long-running bot with many customers accumulates a few bytes per group.
+
 **Profile images:** Both profiles have base64-encoded JPEG profile pictures (128x128, quality 85, under the 12,500-char data URI limit enforced by iOS/Android clients) set via the `image` field in `T.Profile`. The images are defined as `data:image/jpg;base64,...` string constants in `index.ts`. The main profile image is passed to `bot.run()` which handles update-on-change automatically. The Grok profile image is passed to `apiCreateActiveUser()` on first run; on subsequent runs, the bot compares the current profile against the desired one using `util.fromLocalProfile()` and calls `apiUpdateProfile()` if any field differs — this sends the update to all Grok contacts.

 **Startup sequence:**
@@ -405,7 +596,7 @@ Grok HTTP API calls are made **outside** the mutex to avoid blocking.
 9. Validate `--auto-add-team-members` (`-a`) if provided
 10. Register Grok event handlers on `chat` (filtered by `event.user === grokUserId`)
 10b. Refresh stale cards: `CardManager.refreshAllCards()` — lists all groups, skips those with `customData.complete` or no `customData.cardItemId`, sorts remaining by `cardItemId` ascending, re-posts oldest-first so newest cards land at the bottom of team group
-11. On SIGINT/SIGTERM → delete invite link, exit
+11. On SIGINT/SIGTERM → `clearTimeout(inviteLinkTimer)` (noop if already deleted), `cards.destroy()` (stops the card-flush interval), `deleteInviteLink()` (profileMutex-gated `apiDeleteGroupLink`), `process.exit(0)`. Signal handler is reentrant-safe: an `inviteLinkDeleted` flag prevents double-deletion; `clearTimeout`/`clearInterval` are no-op on undefined.

 **Grok event registration** (same ChatApi, filtered by profile):

@@ -459,8 +650,7 @@ chat.on("connectedToGroupMember", (evt) => {
 // 4. Skip groupSnd (own messages)
 // 5. Identify sender via businessChat.customerId
 // 6. Team member message → check if first team text (trigger one-way gate: remove Grok, disable /grok), schedule card update
-// 7. Team member or Grok reaction → schedule card update (auto-complete)
-// 8. Customer message → derive state, dispatch:
+// 7. Customer message → derive state, dispatch:
 //    - WELCOME: create card, send queue msg (or handle /grok first msg → WELCOME→GROK, skip queue)
 //    - QUEUE: /grok → invite Grok; /team → add ALL configured team members; else schedule card update
 //    - GROK: /team → add ALL configured team members (Grok stays); else schedule card update
@@ -516,10 +706,10 @@ Type signatures affected:

 **Main profile side (invite + failure detection):**
 0. Send `grokInvitingMessage` ("Inviting Grok, please wait...")
-1. `apiAddMember(groupId, grokContactId, Member)` → get `member.memberId`. If `groupDuplicateMember` (customer sent `/grok` again before join completed), silent return — the in-flight activation handles the outcome.
-2. Store `pendingGrokJoins.set(memberId, mainGroupId)`
-3. Drain `bufferedGrokInvitations` — if the `receivedGroupInvitation` event arrived during step 1's await (race condition), process it now.
-4. Set `grokInitialResponsePending.add(groupId)` — suppresses per-message responses from `onGrokNewChatItems` for this group until the initial combined response completes. Without this gate, the message backlog arriving via `newChatItems` would trigger individual per-message responses racing with the initial combined response — producing duplicate replies (e.g., 3 replies for 2 messages).
+1. **Set `grokInitialResponsePending.add(groupId)` FIRST** — the gate must be raised before any operation that could make Grok recognizable to `onGrokNewChatItems`. Specifically: before `apiAddMember`, before `pendingGrokJoins` is set, and before `bufferedGrokInvitations` is drained (which populates `reverseGrokMap`). Without this ordering, the sequence `apiAddMember → pendingGrokJoins.set → drain → reverseGrokMap.set → gate.add` contains a window where `reverseGrokMap` identifies the group as a Grok-active group but the gate is still DOWN. A customer message arriving in that window triggers a per-message response concurrent with the initial combined response — producing duplicate Grok replies. Every error path below MUST clear the gate.
+2. `apiAddMember(groupId, grokContactId, Member)` → get `member.memberId`. On `groupDuplicateMember` (customer sent `/grok` again before join completed), **clear the gate** and silent return — the in-flight activation handles the outcome. On any other error, clear the gate, revert state, send `grokUnavailableMessage`.
+3. Store `pendingGrokJoins.set(memberId, mainGroupId)`
+4. Drain `bufferedGrokInvitations` — if the `receivedGroupInvitation` event arrived during step 2's await (race condition), process it now. (The gate is already up from step 1, so `onGrokNewChatItems` suppresses any per-message responses during drain and the subsequent join.)
 5. `waitForGrokJoin(120s)` — awaits resolver from Grok profile's `connectedToGroupMember` (step 8 below)
 6. Timeout → notify customer (`grokUnavailableMessage`), send queue message if was WELCOME→GROK, fall back to QUEUE (CAS-guarded: only if `customData.state` is still `GROK` — a concurrent `/team` that switched to `TEAM-PENDING` is respected), clear `grokInitialResponsePending`

@@ -532,7 +722,7 @@ Type signatures affected:
 10. If no customer messages found (visible history disabled or API failed), send generic greeting asking customer to repeat their question
 11. Call Grok HTTP API (outside mutex)
 12. Send response via `apiSendTextMessage` (through mutex with Grok profile)
-13. Clear `grokInitialResponsePending` (via `finally` block — runs on success, failure, or early return). After this, per-message responses from `onGrokNewChatItems` resume normally for subsequent customer messages.
+13. Clear `grokInitialResponsePending` (via `finally` block — runs on success, failure, or early return). After this, per-message responses from `onGrokNewChatItems` resume normally for subsequent customer messages. Note: because the gate is raised at step 1 (before any other work), the `finally` block MUST be wired to cover every code path from step 1 onward — including the `groupDuplicateMember` silent-return and all revert/timeout branches — otherwise per-message responses stay suppressed indefinitely for the affected group.

 ```typescript
 const pendingGrokJoins = new Map<string, number>()       // memberId → mainGroupId
@@ -551,12 +741,12 @@ Grok profile's `onGrokNewChatItems` handler:
 3. Only trigger for `groupRcv` **text** messages from customer (identified via `businessChat.customerId`)
 4. Ignore: non-text messages (images, files, voice — card update handled by main profile), bot messages, own messages (`groupSnd`), team member messages
 5. Read last 100 messages from own view (customer → `user`, own → `assistant`)
-6. Call Grok HTTP API (serialized per group — queue if call in flight)
+6. Call Grok HTTP API — different groups' calls run concurrently (see "Cross-group Grok parallelism" below). Per-group serialization of overlapping in-flight calls is NOT implemented in MVP (see §20.6).
 7. Send response into group

 **Per-message error:** Send error message in group ("Sorry, I couldn't process that. Please try again or send /team for a human team member."), stay GROK. Customer can retry.

-**Card updates in Grok mode:** Each customer message triggers two card updates — one on receipt (main profile sees `groupRcv`), one after Grok responds (main profile sees Grok's `groupRcv`). Both go through the 15-min debounce.
+**Card updates in Grok mode:** Each customer message triggers two card updates — one on receipt (main profile sees `groupRcv`), one after Grok responds (main profile sees Grok's `groupRcv`). Both go through the 300-second debounce (default `--card-flush-seconds`).

 ### Grok removal

@@ -597,6 +787,55 @@ If `GROK_API_KEY` is set but `--context-file` is missing, `parseConfig` throws a

 Customer messages always in `user` role, never `system`.

+### Grok HTTP request timeout
+
+Every `fetch` to `api.x.ai/v1/chat/completions` MUST pass an `AbortSignal.timeout(60_000)` (60-second default). Without a timeout, a stuck TCP connection or an unresponsive server blocks the awaiting call indefinitely; because `processGrokChatItem` runs under the Grok profile's sequential event dispatch, a single hung call stalls per-message responses for ALL customer groups using Grok — and the same hang in `activateGrok`'s initial-response path leaves `grokInitialResponsePending` stuck (gate never released) until the process is killed.
+
+Implementation in `GrokApiClient.chatRaw`:
+
+```typescript
+const response = await fetch("https://api.x.ai/v1/chat/completions", {
+  method: "POST",
+  headers: { ... },
+  body: JSON.stringify({ ... }),
+  signal: AbortSignal.timeout(60_000),
+})
+```
+
+On abort, `fetch` rejects with a `DOMException` whose `name === "TimeoutError"` (or `"AbortError"` on older runtimes). Callers treat this identically to other `chat()` failures:
+- `processGrokChatItem` → sends `grokErrorMessage` to the customer group, conversation stays GROK.
+- `activateGrok` initial-response path → logs, sends `grokUnavailableMessage`, lets the `finally` block clear `grokInitialResponsePending`.
+
+Rationale for 60s: typical xAI responses return in 1–10s; a 60s ceiling accommodates cold-start / heavy-load latencies while still bounding worst-case per-customer wait. Not exposed as a CLI flag in MVP — a later iteration can add `--grok-timeout-seconds` if operator tuning is needed.
+
+### Cross-group Grok parallelism
+
+`onGrokNewChatItems` MUST dispatch per-group work concurrently. A naïve `for (const ci of lastPerGroup.values()) { await this.processGrokChatItem(ci) }` serializes calls across unrelated customer groups — if xAI takes 3s per call and five customers message in one event batch, customer #5 waits ~15s instead of ~3s. This is pure latency amplification with no ordering benefit (the groups are independent; within-group order is already preserved by batch deduplication picking the last message).
+
+Implementation:
+
+```typescript
+async onGrokNewChatItems(evt: CEvt.NewChatItems): Promise<void> {
+  const lastPerGroup = new Map<number, T.AChatItem>()
+  for (const ci of evt.chatItems) {
+    // filter: groupRcv, customer text, not bot/team
+    // keep last per groupId
+  }
+  await Promise.allSettled(
+    [...lastPerGroup.values()].map((ci) => this.processGrokChatItem(ci)),
+  )
+}
+```
+
+Why `Promise.allSettled` (not `Promise.all`): one group's Grok API failure MUST NOT cancel or reject pending work for other groups. Each `processGrokChatItem` already handles its own errors (sends `grokErrorMessage`, logs); the outer handler only needs to wait until all per-group tasks finish before returning control to the event dispatcher.
+
+Concurrency bound: the number of distinct customer groups that have new Grok-eligible messages in a single event batch — typically ≤ the SimpleX batch-delivery size, practically small. No global semaphore needed in MVP. If xAI rate limits become a concern, add a shared semaphore later; orthogonal to this fix.
+
+Ordering guarantees preserved:
+- Within a group, batch deduplication still picks only the latest message and earlier messages appear in the history context via `apiGetChat`.
+- Across groups, there is no ordering requirement — each customer group is an independent conversation.
+- The per-group gate (`grokInitialResponsePending`) still serializes against `activateGrok`'s initial response; this is a group-local check unaffected by cross-group parallelism.
+
 ## 11. Team Group Commands

 | Command | Effect |
@@ -835,7 +1074,7 @@ npx ts-node src/index.ts \
 7. Team member sends text → verify Grok removed, `/grok` rejected, card → 💬
 8. `/grok` in TEAM → verify "team mode" rejection
 9. `/team` when already invited → verify "already invited" message
-10. Card debouncing: multiple rapid events → verify single card update per 15min flush
+10. Card debouncing: multiple rapid events → verify single card update per 300s flush (default)
 11. `/join` from team group → verify team member added to customer group, promoted to Owner
 12. `/join` with non-business group → verify error
 13. Weekend → verify "48 hours"
@@ -983,7 +1222,7 @@ async function reachTeam(groupId?)        // reachTeamPending → add team membe

 Called as: `const p = simulateGrokJoinSuccess(); await bot.onNewChatItems(...); await p;`

-### 20.4 Test Catalog (129 tests, 27 suites)
+### 20.4 Test Catalog (129 tests, 28 suites)

 #### 1. Welcome & First Message (4 tests)
 - first message → queue reply + card created with /join command
@@ -1022,6 +1261,10 @@ Called as: `const p = simulateGrokJoinSuccess(); await bot.onNewChatItems(...);
 - customer text in TEAM → no bot reply, card update scheduled
 - /grok in TEAM-PENDING → invite Grok if not present

+#### 5b. One-Way Gate with Grok Disabled (2 tests)
+- team text removes Grok even when grokApi is null
+- Grok does not respond when disabled even if grokContactId is set
+
 #### 6. Team Member Lifecycle (6 tests)
 - team member connected → promoted to Owner
 - customer connected → NOT promoted
@@ -1178,11 +1421,11 @@ Called as: `const p = simulateGrokJoinSuccess(); await bot.onNewChatItems(...);
 ### 20.6 Test Coverage Notes

 **Covered vs plan catalog:**
- §20.4 items 1-13, 15, 17-27 fully covered (129 tests across 27 suites)
- §20.4 item 14 (Weekend Detection) — not unit-tested; `isWeekend` depends on `Intl.DateTimeFormat(new Date())`, would need clock mocking
- §20.4 item 16 (Profile Mutex) — not unit-tested; mutex serialization is verified implicitly through all other tests (MockChatApi tracks activeUserId)
- §20.4 item 19 (Startup & State Persistence) — not unit-tested; tests `index.ts` startup which requires native ChatApi. Integration test only. This includes `deleteInviteLink` (profileMutex + `apiSetActiveUser` before `apiDeleteGroupLink`), the conditional `apiUpdateGroupProfile` (compare `fullGroupPreferences` before calling), and the best-effort `apiCreateGroupLink` (catch + log on SMP relay failure) — all are in startup code and cannot be covered by MockChatApi-based tests.
+- §20.4 suites 1-13, 15, 17-27 plus 5b fully covered (129 tests across 28 suites)
+- Weekend detection (`util.isWeekend`) — not unit-tested; depends on `Intl.DateTimeFormat(new Date())`, would need clock mocking. Not present in the §20.4 catalog.
+- Profile Mutex serialization — not a standalone suite in §20.4; verified implicitly through all other tests (MockChatApi tracks activeUserId).
+- Startup & state persistence (`index.ts` path) — not unit-tested; requires native ChatApi. Integration-test only. Includes `deleteInviteLink` (profileMutex + `apiSetActiveUser` before `apiDeleteGroupLink`), the conditional `apiUpdateGroupProfile` (compare `fullGroupPreferences` before calling), and the best-effort `apiCreateGroupLink` (catch + log on SMP relay failure). Not in §20.4 catalog.

 **Known plan items NOT implemented (conscious gaps, not test gaps):**
 - Per-group Grok API call serialization (plan §10) — not implemented or tested
- Team member replacement on leave after sending (plan §15) — not implemented
+- Team member replacement on leave after sending — out of MVP scope. No plan section currently asserts it as a requirement; if added later, specify in SPEC §4.2 "Team replies" and implementation plan §15 "Error Handling" together.
@@ -462,7 +462,7 @@ In Grok mode, each customer message triggers two card updates — one on receipt

 If the Grok HTTP API call fails or times out for a per-message request, the Grok profile sends an error message into the group: "Sorry, I couldn't process that. Please try again or send /team for a human team member." Grok remains in the group and the state stays GROK — the customer can retry by sending another message.

-Grok API calls are serialized per customer group — if a new customer message arrives while a Grok API call is in flight, it is queued and processed after the current call completes. This ensures Grok's history includes its own prior response before handling the next message.
+Grok API calls are NOT serialized per customer group in the MVP. If a new customer message arrives while a Grok API call is in flight, a second call runs concurrently — `apiGetChat` is re-read at the start of each call so history converges eventually, but two rapid messages in the same group can produce interleaved context. Cross-group calls run concurrently by design (see implementation plan §10 "Cross-group Grok parallelism"). Per-group serialization is a planned future improvement.

 #### Grok removal