mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-05-22 05:55:16 +00:00
2754251a53
## Summary Fixes #1239 — `/api/analytics/distance` 15s cold on staging under heavy ingest. Two independent fixes. First commit on this branch is the RED test for Fix B (`a539882`), demonstrating reader/writer contention against the main store lock. CI: see Actions tab for the run on the test-only commit — it asserts >150µs avg writer cycle and fails at 82367µs pre-fix. GREEN commit (`d3938f1`) brings it to 1µs. ## Fix A — TTL bump 15s → 60s (`5eae1e0`) - `rfCacheTTL` default in `cmd/server/store.go` changed from `15 * time.Second` to `60 * time.Second`. This is the shared TTL for RF / topology / distance / hash-sizes / subpath / channel analytics caches. - Per operator clarification (issue thread): distance analytics IS viewed live during analysis sessions, not background-glanced. 60s smooths the cold-miss churn during heavy ingest without freezing data. - `config.example.json`: documented `cacheTTL.analyticsRF` with new default + caveat. - Existing assertions (`TestCacheTTLDefaults`, `TestHashCollisionsCacheTTL`) updated to the new default. ## Fix B — Drop main RLock around compute (`a539882` red, `d3938f1` green) `computeAnalyticsDistance` previously held `s.mu.RLock()` for the entire iteration: region match-set construction, hop/path filtering, sort, dedup, histogram, category stats, time series. Readers serialized writers (ingest, `buildDistanceIndex`). Refactor: hold the RLock only long enough to snapshot the `distHops`/`distPaths` slice headers AND build the region match-set (which reads `tx.Observations`, mutated under `s.mu.Lock`). For `region=""` (the hot cold-call path) the lock hold is just the header snapshot — microseconds. Everything else runs on the locally-captured slices outside the lock. Safety: `distHops`/`distPaths` are append-only via re-slice in `buildDistanceIndex` / `updateDistanceIndexForTxs` (both under `s.mu.Lock`). If the backing array reallocates after the snapshot, the snapshot still references the prior array (GC-pinned) at the consistent length captured under the lock. Records are value types — no torn writes. ## Test results `cmd/server/distance_lock_contention_test.go` (8 reader goroutines × 20k synthetic distHops × 200 writer Lock/Unlock cycles): - pre-fix avg writer cycle: **82367µs** (16.5s for 200 cycles) - post-fix avg writer cycle: **1µs** (279µs for 200 cycles) - ~82000× reduction in writer contention; reader result shape unchanged Full `go test ./cmd/server/...` green with `-race`. ## Out of scope (per issue) - Same lock pattern in topology / RF / hash / subpath analytics — file separately if needed. - Per-region cache key sharding. - WebSocket-driven cache invalidation. --------- Co-authored-by: openclaw-bot <bot@openclaw.local>