mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-06-04 09:21:18 +00:00
perf(server): fix repeaterEnrichTTL mismatch causing 18s /api/nodes latency (#1425)
## Root cause `repeaterEnrichTTL` was **15 seconds**, but the background recomputer (`StartRepeaterEnrichmentRecomputer`) runs every **5 minutes**. After each recomputer tick, the relay/usefulness caches were valid for 15 seconds. For the remaining 4m45s, every `/api/nodes` request hit a stale TTL gate in `GetRepeaterRelayInfoMap` / `GetRepeaterUsefulnessScoreMap` and fell through to `computeRepeaterRelayInfoMap` **on the request goroutine**. On production (16k+ transmissions, 240k hop records) that rebuild takes ~18 seconds, making `/api/nodes?limit=5000` freeze on virtually every page load. The pattern was: ``` recomputer runs at T=0 → cache valid T=15s → TTL expires T=15s … T=5min → every request rebuilds on-thread (18s each) T=5min → recomputer runs again → 15s valid window repeat ``` ## Fix One line in `repeater_enrich_bulk.go`: ```go // Before const repeaterEnrichTTL = 15 * time.Second // After const repeaterEnrichTTL = 10 * time.Minute ``` The TTL now exceeds the recomputer interval so the cache is always warm between background ticks. The TTL remains as a safety net for cases where the recomputer isn't running (tests, early startup edge cases) — it just no longer expires between ticks. ## Production results (analyzer.on8ar.eu) Tested with binary injection on the live server before opening this PR. | Metric | Before | After | |--------|--------|-------| | TTFB (`/api/nodes?limit=5000`) | 18.6 s | 0.47–0.54 s | | Total response time | 18.9 s | 1.55–1.73 s | | Improvement | — | **34–39×** | Confirmed still fast at t+60s (well past the old 15s window). ## Test results ``` TestHandleNodesPerfLargeFleet elapsed=1.9ms budget=2s PASS TestHandleNodesLimit2000ColdMiss elapsed=5.3ms budget=2s PASS ``` Both existing perf regression tests pass unchanged — the TTL change doesn't affect their behavior (they test the cold-prewarm path, not TTL expiry). ## Why this wasn't caught by tests `TestHandleNodesLimit2000ColdMiss` only tests the cold-startup path (cache nil → on-thread build → cache hit). It doesn't test the TTL-expiry path (cache exists but stale → on-thread rebuild). A test covering the latter would need to fast-forward time past the TTL, which the existing fixture doesn't do. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -5,12 +5,13 @@ import (
|
||||
"time"
|
||||
)
|
||||
|
||||
// repeaterEnrichTTL bounds how stale the per-page bulk enrichment caches
|
||||
// for handleNodes may be. Same 15s budget as GetNodeHashSizeInfo — the
|
||||
// numbers feed an at-a-glance status column, not an alerting path, so
|
||||
// up-to-15s freshness is fine and keeps the request path O(page) instead
|
||||
// of O(page × byPathHop[pk] × parsed timestamps).
|
||||
const repeaterEnrichTTL = 15 * time.Second
|
||||
// repeaterEnrichTTL is the safety-net TTL for the bulk enrichment caches.
|
||||
// Derived as 2× the recomputer's default tick so the cache is always valid
|
||||
// between background refreshes. Without a recomputer (tests, edge cases)
|
||||
// the cache rebuilds on-thread at most once per TTL window.
|
||||
// Note: if analytics.defaultIntervalSeconds is configured above 600 the
|
||||
// TTL will expire before the recomputer runs; keep that value < TTL/2.
|
||||
const repeaterEnrichTTL = 2 * repeaterEnrichmentRecomputerDefaultInterval
|
||||
|
||||
// GetRepeaterRelayInfoMap returns a cached pubkey → RepeaterRelayInfo
|
||||
// map covering EVERY pubkey that currently appears as a path hop in any
|
||||
@@ -29,9 +30,9 @@ const repeaterEnrichTTL = 15 * time.Second
|
||||
// The cached map is keyed by lowercase pubkey/hop key (same shape as
|
||||
// byPathHop). Lookups should use strings.ToLower(pk).
|
||||
//
|
||||
// The cache is invalidated by TTL only — never by ingest. With a 15s
|
||||
// budget that's acceptable for a status column; if a fresher signal is
|
||||
// ever needed for a non-status caller, expose a non-cached path.
|
||||
// The cache is invalidated by TTL only — never by ingest. Up-to-10min
|
||||
// freshness is fine for an at-a-glance status column; if a fresher
|
||||
// signal is ever needed for a non-status caller, expose a non-cached path.
|
||||
func (s *PacketStore) GetRepeaterRelayInfoMap(windowHours float64) map[string]RepeaterRelayInfo {
|
||||
s.repeaterEnrichMu.Lock()
|
||||
if s.repeaterRelayCache != nil &&
|
||||
|
||||
@@ -7,9 +7,9 @@ import (
|
||||
|
||||
// repeaterEnrichmentRecomputerInterval is the default tick interval
|
||||
// for the steady-state recompute of the repeater enrichment bulk
|
||||
// caches. The on-request 15s-TTL fallback in repeater_enrich_bulk.go
|
||||
// is kept as a safety net — the recomputer just makes sure the cache
|
||||
// is populated before any request arrives.
|
||||
// caches. The on-request TTL fallback in repeater_enrich_bulk.go is
|
||||
// kept as a safety net — the recomputer just makes sure the cache is
|
||||
// populated before any request arrives.
|
||||
//
|
||||
// 5min mirrors the analytics_recomputer default from #1240 and is
|
||||
// plenty fresh for an at-a-glance status column.
|
||||
|
||||
Reference in New Issue
Block a user