mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-06-03 17:51:18 +00:00
13bdee57d4
## What Three of the four P0s from #1481's scale-test findings. Each cuts a distinct hot path; together they target /api/observers, /api/analytics/neighbor-graph, and /api/observers/{id}/analytics — the top three live offenders. ### P0-1: 5-min atomic-pointer cache for default neighbor-graph response - Live p95 10.8s on the most-trafficked organic endpoint. - Background recomputer (5-min cadence per operator directive) builds the default-filter (`minCount=5 minScore=0.1`, no region, no role) `NeighborGraphResponse` and stores it via `atomic.Pointer`. - `handleNeighborGraph` short-circuits on the default shape; non-default filters take the extracted `computeNeighborGraphResponse` path (identical semantics to the previous inline build). ### P0-2: cache parsed `StoreObs.Timestamp` + drop RLock window - `handleObserverAnalytics` re-parsed the RFC3339 timestamp three times per observation, for 60k+ observations per active observer, under `s.store.mu.RLock` — blocking writers for the full scan. - `StoreObs.ParsedTime()` parses once via `sync.Once` (mirrors `StoreTx.ParsedDecoded`). - Handler snapshots the `byObserver[id]` pointer slice, releases the RLock immediately, then iterates locally. ### P0-3: 30s cache for `/api/observers` + sargable `IN` + covering index - Three SQL queries on every request → ~1.7s p50 at 50-concurrent. - Atomic-pointer 30s cache for the default (no-filter) query. - `GetNodeLocationsByKeys` drops `LOWER(public_key) IN (...)` (non-sargable); callers pre-lowercase in Go and the plain `IN` matches the existing `public_key` index. - New ingestor migration `obs_observer_ts_idx_v1` adds composite index `idx_observations_observer_idx_timestamp(observer_idx, timestamp)` so `GetObserverPacketCounts` can resolve its GROUP-BY + range filter from the index without scanning the 1.9M-row observations table. ### P0-4: deferred `perfMiddleware`'s global mutex was claimed to serialize every API request. A direct test (`50 concurrent requests through the middleware, handler sleeps 20ms each`) shows total elapsed ≈ 25ms, not 1s — the lock is held only for the post-handler bookkeeping (a few µs). Real impact is below measurement noise. Skipping to avoid invasive churn on PerfStats consumers without a demonstrable win. ## Test plan Red → green per P0: - `observers_cache_test.go` — handler reads `s.observersCache` before SQL, TTL boundary, atomic.Pointer (no mutex contention). - `storeobs_parsedtime_test.go` — parses three timestamp shapes, caches result, no race under concurrent readers. - `neighbor_graph_cache_test.go` — handler serves from atomic pointer when set, bypasses cache when `?region=` (or any non-default filter) is passed. Full server + ingestor suites pass: `go test -count=1 ./...`. ## Perf proof Before/after p50/p95/p99 (50 requests × 50 concurrent) against prod (before) and staging once CI deploys (after) will be posted as a PR comment per the operator's "no merge without proof of improvement" gate. Closes #1481 ## TDD exemption — P0-1 and P0-2 (net-new surfaces, AGENTS.md) Per CoreScope `AGENTS.md` § "Exemptions": **net-new code surfaces with no prior tests to break** may land tests in the same PR without a strict test-first → impl commit split. - **P0-1 (neighbor-graph atomic-pointer cache)** — `neighborGraphCache`, `recomputeNeighborGraphCache`, `loadNeighborGraphCacheBytes`, `startNeighborGraphRecomputer` and the default-shape short-circuit in `handleNeighborGraph` were brand-new code with no pre-existing assertions covering them. There was no green test to first turn red. - **P0-2 (cached `StoreObs.Timestamp` + RLock window drop)** — `StoreObs.ParsedTime()` and the snapshot+release pattern in `handleObserverAnalytics` were new surfaces; the prior code did the parse inline per call with no behavioural test to break. P0-3 was authored properly red-then-green (commit `6e63ec6a` red, then `83ae129b` green) and does NOT use this exemption. ## Default-filter detection vs frontend reality (#1483 follow-up) The Neighbor Graph analytics tab in `public/analytics.js` fetches `/analytics/neighbor-graph?min_count=1&min_score=0` because the client-side sliders need the full edge set to filter from. That shape did NOT match the `(5, 0.1)` cached default, so the UI tab still paid the cold compute cost despite #1481 P0-1. The #1483 follow-up commit caches BOTH shapes in the same recomputer pass: - `(minCount=5, minScore=0.1, no region, no role)` — `live.js` affinity-scoring consumer. - `(minCount=1, minScore=0, no region, no role)` — analytics tab. Both are served from `atomic.Pointer` with an `X-Cache-Age-Seconds` header. The per-shape cost in the background goroutine is roughly linear in edge count; total recompute time stays well under the 5-minute cadence on prod-scale graphs. --------- Co-authored-by: openclaw-bot <bot@openclaw.dev> Co-authored-by: mc-bot <mc-bot@users.noreply.github.com>
75 lines
2.4 KiB
Go
75 lines
2.4 KiB
Go
package main
|
|
|
|
// observers cache for /api/observers default (no-filter) response.
|
|
// Issue #1481 P0-3 + #1483 follow-up.
|
|
//
|
|
// Design:
|
|
// - Atomic pointer holds the immutable cached response.
|
|
// - Wall-clock TTL replaced with monotonic time.Time (#1483: NTP
|
|
// step-backward must not extend the cache).
|
|
// - singleflight collapses TTL-boundary thundering herd into one
|
|
// SQL fill, regardless of incoming concurrency.
|
|
|
|
import (
|
|
"sync/atomic"
|
|
"time"
|
|
|
|
"golang.org/x/sync/singleflight"
|
|
)
|
|
|
|
// observersCacheTTL is the default freshness window for the cached
|
|
// default (no-filter) /api/observers response when no per-server
|
|
// override is configured. Configurable via ObserversCache.TTLSeconds
|
|
// (#1483).
|
|
const observersCacheTTL = 30 * time.Second
|
|
|
|
// effectiveObserversCacheTTL returns the cfg-overridden TTL or the
|
|
// default. Falls back to the default on nil cfg / non-positive value.
|
|
func (s *Server) effectiveObserversCacheTTL() time.Duration {
|
|
if s.cfg != nil && s.cfg.ObserversCache != nil && s.cfg.ObserversCache.TTLSeconds > 0 {
|
|
return time.Duration(s.cfg.ObserversCache.TTLSeconds) * time.Second
|
|
}
|
|
return observersCacheTTL
|
|
}
|
|
|
|
// singleflight key for the default-shape cache fill.
|
|
const observersCacheFlightKey = "observers:default"
|
|
|
|
// observersCacheEntry pairs the response with the monotonic timestamp
|
|
// of when it was built. atomic.Pointer guarantees the read is a single
|
|
// load; the entry is immutable once stored.
|
|
type observersCacheEntry struct {
|
|
resp ObserverListResponse
|
|
at time.Time
|
|
}
|
|
|
|
// observersCacheField bundles the atomic pointer with the singleflight
|
|
// group that gates concurrent refills.
|
|
type observersCacheField struct {
|
|
ptr atomic.Pointer[observersCacheEntry]
|
|
sf singleflight.Group
|
|
|
|
// fillCount increments once per actual SQL fill (i.e., per
|
|
// singleflight winner). Tests use this to assert the herd was
|
|
// collapsed; production code never reads it.
|
|
fillCount atomic.Int64
|
|
}
|
|
|
|
// observersCacheExpired reports whether the cached entry at `t` is
|
|
// older than observersCacheTTL or absent (zero time).
|
|
func (s *Server) observersCacheExpired(t time.Time) bool {
|
|
if t.IsZero() {
|
|
return true
|
|
}
|
|
return time.Since(t) >= s.effectiveObserversCacheTTL()
|
|
}
|
|
|
|
// loadObserversCache returns the cached entry and its age, or nil.
|
|
func (s *Server) loadObserversCache() (*observersCacheEntry, bool) {
|
|
e := s.observersCacheV2.ptr.Load()
|
|
if e == nil {
|
|
return nil, false
|
|
}
|
|
return e, true
|
|
}
|