Files
meshcore-analyzer/cmd/server
efiten 38eb7103b3 perf(nodes): batch relay stats to fix O(N×M) /api/nodes regression (#1164)
## Problem

`handleNodes` enriches each repeater/room node by calling
`GetRepeaterRelayInfo` and `GetRepeaterUsefulnessScore` **per node**
inside a loop. `GetRepeaterUsefulnessScore` acquires `s.mu.RLock()` and
then iterates **all** `byPayloadType` entries to compute the non-advert
denominator — once per node.

On a deployment with ~1500 repeater/room nodes and ~145K transmissions
in memory, this is **~220M iterations per `/api/nodes` request**, plus
~3000 separate lock acquisitions. Response times of 18–44 seconds have
been observed in production, especially during startup backfill when
write-lock contention compounds the issue.

## Fix

Add `GetRepeaterNodeStatsBatch(pubkeys []string, windowHours float64)
map[string]RepeaterNodeStats` to `repeater_usefulness.go`:

- Takes **one** `s.mu.RLock()` for the entire node list
- Computes the non-advert denominator **once** (shared across all nodes)
- Snapshots `byPathHop` slice headers for all requested pubkeys under
that single lock
- Processes timestamps and counts **outside** the lock

Update `handleNodes` to collect repeater/room pubkeys first, call the
batch method once, and apply results.

**Complexity: O(M + N) instead of O(N × M)** per request (M = total
transmissions, N = repeater nodes).

`GetRepeaterRelayInfo` and `GetRepeaterUsefulnessScore` are unchanged —
they are still correct for single-node calls (e.g. `handleNodeDetail`).

## Test plan

- [ ] `go build ./cmd/server` passes
- [ ] `/api/nodes` response is correct (relay_active,
relay_count_1h/24h, usefulness_score fields present for repeaters)
- [ ] No change in output for `/api/nodes/{pubkey}` (uses existing
single-node methods)
- [ ] CI passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-20 20:57:02 -07:00
..