mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-05-25 18:14:03 +00:00
1efe93d7f6
RED commit `a2879e12` — perf regression test; CI run: see Actions tab. Fixes #1257. ## Root cause `handleNodes` looped over the response page and called `store.GetRepeaterRelayInfo(pk, win)` + `store.GetRepeaterUsefulnessScore(pk)` for every repeater/room. Each call: - grabbed its own `s.mu.RLock`, - walked `byPathHop[pk]` (+ the matching 1-byte raw-prefix bucket, which on busy networks fans out to nearly the entire non-advert tx set), - and re-parsed every `tx.FirstSeen` with `parseRelayTS`. Default page is the 50 most-recently-seen nodes — almost all hot repeaters — so the request did O(50) lock acquisitions and hundreds of thousands of timestamp parses on the same set of txs. That's the classic load-then-paginate / per-row N+1 shape called out in the issue (same family as #1226). The `?limit=2000` variant looks faster relatively only because per-node enrichment dwarfs serialization; on staging both still bottleneck on the same loop. ## Fix Two new bulk methods on `PacketStore`: - `GetRepeaterRelayInfoMap(windowHours)` → `pubkey → RepeaterRelayInfo` - `GetRepeaterUsefulnessScoreMap()` → `pubkey → 0..1` Both snapshot `byPathHop` under a single `RLock`, pre-parse each `FirstSeen` exactly once (a tx that appears in N hop buckets used to be parsed N times), and emit one entry per hop key. Cached 15s — same TTL as `GetNodeHashSizeInfo` / `GetMultiByteCapMap`, same status-column freshness budget. `handleNodes` is one map-lookup per node; behavior, output schema, and `RelayActive` / `RelayCount{1h,24h}` / `LastRelayed` / `usefulness_score` semantics are preserved. ## Why no `limit` default change The issue mentioned a default-limit knob. Investigated: `queryInt(r, "limit", 50)` already defaults to 50 — frontends calling `/api/nodes` (no limit) get a 50-row page today. Capping further would change behavior (live.js already passes `?limit=2000` when it wants more); the cost was per-repeater enrichment, not page size. Fixing the N+1 is the correct lever and preserves backward compat. ## Perf Regression test `TestHandleNodesPerfLargeFleet` (600 nodes, 150k non-advert tx, repeaters indexed under `byPathHop`): | | elapsed | vs 2s budget | |---|---|---| | before (master) | 4.72s | ✗ | | after | ~4ms | ✓ (~1000×) | ## TDD - RED: `a2879e12` — test fails at 4.72s on master. - GREEN: `c529d29a` — fix; full `cmd/server` + `cmd/ingestor` suites green. --------- Co-authored-by: corescope-bot <bot@corescope>