## Summary
- Removes the TTL-based inline rebuild from `GetRepeaterRelayInfoMap`
and `GetRepeaterUsefulnessScoreMap`
- When the cache is non-nil it is returned immediately, regardless of
age — no more 700ms on-request recompute
- Inline compute is retained only as a nil-cache guard (edge case: tests
without a running recomputer)
- Fixes the stale `// 15s-TTL gate` comment in
`recomputeRepeaterEnrichmentSafe`
**Root cause:** `computeRepeaterRelayInfoMap` runs inline when the TTL
expires, taking ~700ms on a busy instance.
`StartRepeaterEnrichmentRecomputer` (introduced in #1262) already keeps
the cache warm via synchronous prewarm at startup + 5-min ticks, making
the inline path dead code that fires only when the TTL is shorter than
the recomputer interval (e.g. custom `analytics.defaultIntervalSeconds >
600`).
## Test plan
- [ ] `TestGetRepeaterRelayInfoMap_ServesStaleOnTTLExpiry` — regression
guard: stale sentinel is returned without recompute
- [ ] `TestGetRepeaterUsefulnessScoreMap_ServesStaleOnTTLExpiry` — same
for usefulness score map
- [ ] `TestGetRepeaterRelayInfoMap_BuildsWhenNil` — nil-cache fallback
still works
- [ ] Full `-short` suite passes (`go test -short ./...`)
Closes#1272🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Root cause
`repeaterEnrichTTL` was **15 seconds**, but the background recomputer
(`StartRepeaterEnrichmentRecomputer`) runs every **5 minutes**.
After each recomputer tick, the relay/usefulness caches were valid for
15 seconds. For the remaining 4m45s, every `/api/nodes` request hit a
stale TTL gate in `GetRepeaterRelayInfoMap` /
`GetRepeaterUsefulnessScoreMap` and fell through to
`computeRepeaterRelayInfoMap` **on the request goroutine**. On
production (16k+ transmissions, 240k hop records) that rebuild takes ~18
seconds, making `/api/nodes?limit=5000` freeze on virtually every page
load.
The pattern was:
```
recomputer runs at T=0 → cache valid
T=15s → TTL expires
T=15s … T=5min → every request rebuilds on-thread (18s each)
T=5min → recomputer runs again → 15s valid window
repeat
```
## Fix
One line in `repeater_enrich_bulk.go`:
```go
// Before
const repeaterEnrichTTL = 15 * time.Second
// After
const repeaterEnrichTTL = 10 * time.Minute
```
The TTL now exceeds the recomputer interval so the cache is always warm
between background ticks. The TTL remains as a safety net for cases
where the recomputer isn't running (tests, early startup edge cases) —
it just no longer expires between ticks.
## Production results (analyzer.on8ar.eu)
Tested with binary injection on the live server before opening this PR.
| Metric | Before | After |
|--------|--------|-------|
| TTFB (`/api/nodes?limit=5000`) | 18.6 s | 0.47–0.54 s |
| Total response time | 18.9 s | 1.55–1.73 s |
| Improvement | — | **34–39×** |
Confirmed still fast at t+60s (well past the old 15s window).
## Test results
```
TestHandleNodesPerfLargeFleet elapsed=1.9ms budget=2s PASS
TestHandleNodesLimit2000ColdMiss elapsed=5.3ms budget=2s PASS
```
Both existing perf regression tests pass unchanged — the TTL change
doesn't affect their behavior (they test the cold-prewarm path, not TTL
expiry).
## Why this wasn't caught by tests
`TestHandleNodesLimit2000ColdMiss` only tests the cold-startup path
(cache nil → on-thread build → cache hit). It doesn't test the
TTL-expiry path (cache exists but stale → on-thread rebuild). A test
covering the latter would need to fast-forward time past the TTL, which
the existing fixture doesn't do.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
RED commit: `22ce5736066142583017cad7303fa48d9e00ccf0` — CI on red:
https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1262
## Problem
After #1260 added a 15s-TTL bulk cache for repeater enrichment in
`handleNodes`,
`/api/nodes` (default limit) dropped to ~500ms. But
`/api/nodes?limit=2000` —
called by `public/live.js` at SPA startup for hop resolution — still
took
**15.7s cold** on staging (75k tx, 600 nodes). Warm hits were ~40ms.
Root cause: the bulk cache was lazily populated on the first request
after
TTL expiry. The rebuild ran on the request-serving goroutine. Every cold
SPA
load triggered the rebuild and ate 15s.
## Fix
Add `StartRepeaterEnrichmentRecomputer` — a steady-state background
recomputer that mirrors the `analytics_recomputer.go` pattern from
#1240:
- **Prewarm**: initial synchronous compute on Start so the first request
hits a populated cache.
- **Steady-state**: ticker refreshes the snapshot every 5min
(configurable
via the existing analytics recompute interval knob).
- **Panic-safe** + idempotent Start.
Wired into `main.go` right after `StartAnalyticsRecomputers`, using
`cfg.GetHealthThresholds().RelayActiveHours` as the window.
## Test
`TestHandleNodesLimit2000ColdMiss` — seeds 600 nodes + 150k non-advert
tx with repeaters indexed under a shared 1-byte hop prefix (matches
production hop-prefix collisions), starts the recomputer, then issues
`/api/nodes?limit=2000` with **no HTTP warmup**.
| State | Latency |
|---|---|
| Before (master, on-thread rebuild) | 3.37s |
| After (prewarm + steady-state) | 56ms |
| Budget | 2s |
Staging end-to-end: 15.7s → expected sub-100ms on the same call path.
Red commit (`22ce5736066142583017cad7303fa48d9e00ccf0`) compiles with a
no-op stub of the new method so the
test fails on the latency **assertion**, not a missing symbol.
Fixes#1262
---------
Co-authored-by: corescope-bot <bot@corescope.local>