Files
meshcore-analyzer/cmd
Kpa-clawbot f15d2efe81 fix(#1386): #1324 follow-up — test coverage + RWMutex + lock-hold-time + dead code + cadence (#1390)
# #1324 follow-up — test coverage + RWMutex + lock-hold-time + dead code
+ cadence

Addresses the post-merge audit findings in #1386 on PR #1324
(multi-byte capability persistence). Two independent audits (Kent
Beck test-quality + Carmack perf) surfaced one top-level
test-coverage gap and three perf concerns. This PR closes all of
them; cadence cleanup is included.

Red commit: `<RED_SHA>` (CI: `<RED_URL>`)

## What

1. **Tests** (`cmd/ingestor/multibyte_persist_test.go`):
   - `TestRunMultibyteCapPersist_RoundTrip` — end-to-end persist →
     close store → reopen → assert DB state survived.
   - `TestRunMultibyteCapPersist_MalformedSnapshot` — corrupt
     snapshot must log + no-op, not crash.
   - `TestRunMultibyteCapPersist_MissingSchemaColumns` — legacy DB
     without `multibyte_sup` cols must skip with explicit log, not
     panic / silently swallow.
   - `TestRunMultibyteCapPersist_PreservesConfirmedOnUnknown` —
     status=`unknown` MUST NOT clobber an existing `confirmed` row
     (mutation guard for the data-destruction check).
2. **`cmd/server/store.go`**
   - `cacheMu sync.Mutex` → `sync.RWMutex`. The per-node
     `GetMultibyteCapFor` read path in `/api/nodes` (`routes.go:1215`)
     uses `RLock` now; no longer serializes against itself or
     against analytics readers.
   - Build the multi-byte index map OUTSIDE `cacheMu`, then swap the
     pointer inside. Removes a 2400-iteration allocation hold from
     the analytics-cycle critical section.
   - Drop the dead `GetMultiByteCapMap` (zero callers confirmed by
     `rg`) and the stale `multibyteStatusToInt` tombstone comment.
3. **`cmd/ingestor/multibyte_persist.go`**
- Replace the per-entry pair of `UPDATE nodes` + `UPDATE inactive_nodes`
     (50% guaranteed-miss) with a single dispatch-by-table-membership
     `UPDATE` per entry. ~50% fewer prepared-stmt round-trips.
   - Explicit `MalformedSnapshot` log line distinct from cold-start.
   - Defensive schema-presence check via `PRAGMA table_info` once at
     start; logs `[multibyte-persist] schema missing` and returns
     clean stats on legacy DBs.
4. **`cmd/server/analytics_recomputer.go` / `config.example.json`** —
   bump default snapshot cadence from 15s to 1m (the snapshot is a
   derived cache the ingestor only reads every 5 min; 4× less disk
   churn, no observable freshness loss).

## Why

Direct quotes from the audit (#1386):

> *"No end-to-end persist→restart→load round-trip — the documented
> value prop of the PR ('survives restart') has no single test
> exercising the full path."* (Kent Beck)

> *"`cacheMu` is `sync.Mutex` not `sync.RWMutex` + per-node read in
> `handleNodes` — 2400 serialized lock acquisitions per `/api/nodes`
> call, contended against every analytics-cache reader/writer.
> The O(1) win is consumed by lock contention."* (Carmack #1)

> *"Map construction held under shared `cacheMu` — every 15s
> analytics cycle blocks every API cache read for the duration of a
> 2400-entry map build. Build outside the lock, swap pointer
> inside."* (Carmack #2)

> *"`UPDATE nodes` + `UPDATE inactive_nodes` per entry … 4800
> prepared-stmt round-trips, 2400 guaranteed-empty."* (Carmack #3)

> *"Server writes 20 snapshots for every one the ingestor reads.
> Cadence mismatch — server could publish every 1 min and lose
> nothing."* (Carmack §2)

## TDD

Red commit adds the four tests above. Two of the four
(`MalformedSnapshot`, `MissingSchemaColumns`) fail on assertions
against the pre-fix `multibyte_persist.go`; the other two
(`RoundTrip`, `PreservesConfirmedOnUnknown`) are regression coverage
of behaviour the original implementation already honoured but never
exercised — they exist to guard future mutation (the audit's
mutation-suggestion lens). Green commit lands the implementation.

## Bench

`go test -bench BenchmarkGetMultibyteCapFor -benchmem -count=10`
(local, idle laptop, n=2400-entry index, 8 reader goroutines vs. one
analytics writer):

| variant            | ns/op | allocs/op |
|--------------------|------:|----------:|
| `sync.Mutex` (pre) | n/a — see note | — |
| `sync.RWMutex`     | n/a — see note | — |

Note: did not produce a concurrent benchmark in this PR (would
require non-trivial test scaffolding around the cache lifecycle).
The win is structural — `RLock` allows the ~2400 per-`/api/nodes`
reads to proceed in parallel rather than serializing on the same
mutex held by every analytics writer. Documenting honestly per
AGENTS.md "perf claims require proof": full microbench deferred to
a follow-up.

## Manual verification (staging)

- New tests: `go test ./... -count=1 -timeout 300s` in `cmd/ingestor`
  and `cmd/server` — green.
- All multibyte-area tests (`#1366`, `#1368`, `#1372` regression
  suites in `multibyte_capability_test.go`, `multibyte_enrich_test.go`,
  `multibyte_region_filter_test.go`): green.
- Preflight: `bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh
  origin/master` — exit 0.

Fixes #1386

---------

Co-authored-by: claw <claw@openclaw.local>
2026-05-25 23:29:35 -07:00
..