mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-06-29 16:01:55 +00:00
048143f54f
## #1690 — cold-load uses wrong time axis (RED → GREEN) The on-disk DB has thousands of long-lived hashes with recent traffic. Prod's cold-load filter (`transmissions.first_seen >= cutoff`) is bound to a column that is set once at insert time and never updated — so re-observation of an old hash does not move it into the hot window. Result: prod cold-loaded ~0.3% of the on-disk rows and flipped `backgroundLoadComplete=true` without ever walking the retention window (the `retentionHours - hotStartupHours <= 0` short-circuit at line 1353 of `cmd/server/store.go`). ### Three sub-fixes **A) Denormalize `transmissions.last_seen`** so cold-load can window on effective recency. - `internal/dbschema/dbschema.go::ensureTransmissionsLastSeenColumn` adds the column + `idx_tx_last_seen` (single-column INTEGER ALTER + index; both PREFLIGHT-annotated as cheap metadata-only ops). - `cmd/ingestor/db.go::OpenStoreWithInterval` schedules `tx_last_seen_backfill_v1` via `Store.RunAsyncMigration` — `UPDATE transmissions SET last_seen = MAX(observations.timestamp) WHERE last_seen = 0` — non-blocking on boot (1.9M+ obs row scan in prod). - Writer-side: `InsertTransmission` seeds `last_seen` on initial insert, and every observation insert bumps `last_seen = ?` via prepared statement `stmtBumpTxLastSeen` (conditional `last_seen < ?` so out-of-order ingest never goes backwards). - Reader-side: `cmd/server/store.go::Load`, `loadChunk`, and `cmd/server/chunked_load.go::LoadChunked` switch the WHERE/ORDER-BY clauses to `t.last_seen` when the column is present (PRAGMA-detected via `DB.hasLastSeen`). Test/legacy DBs without the column fall back to `first_seen` so existing fixtures stay green. **B) Honest `backgroundLoadComplete` gating.** - Drop the `retentionHours - hotStartupHours <= 0` short-circuit. Prod runs with both at 12h, which flipped Done=true immediately. - After the chunk loop, query `SELECT COUNT(*) FROM transmissions WHERE last_seen >= retentionFloor` and compute `loadCoverageRatio = inMem / inDB`. Done=true only when `ratio >= 0.90` AND no chunk errors. `backgroundLoadFailed=true` + `backgroundLoadError` populated otherwise (e.g. `"loaded 20.0% of 5000 rows (1000 in memory)"`). - `bgErrMu`-guarded `loadCoverageRatio` + `backgroundLoadErr` so the perf endpoint can read them without blocking the writer. **C) Perf exposure.** `PerfPacketStoreStats` gains `RetentionHours`, `OldestLoaded`, `LoadCoverageRatio`, `BackgroundLoadError` — surfaces what fraction of the on-disk DB the in-memory store currently reflects, so operators can see the 0.3% case in `/api/perf` without reading the logs. ### TDD trail - **RED**: `05f0c6dd2bea6dc37324c548a49564d739aca920` — failing tests + 21-line store.go scaffolding. CI on this commit failed on assertions (intended). - **GREEN**: this PR's HEAD commit (8 files, +271/-24). Targeted suite: `Test1690_ColdLoad_TimeAxis`, `Test1690_BackgroundLoadHonesty`, `Test1690_PerfStats_NewFields`, `TestHotStartup_*`, `TestIssue1690_LastSeenUpdatedOnObservation` — all pass. Anti-tautology: locally reverted the `if !s.backgroundLoadFailed.Load()` guard around `backgroundLoadDone.Store(true)` — `Test1690_BackgroundLoadHonesty` fails on the assertion `"backgroundLoadDone=true with only 1000/5000 packets loaded; must be false until coverage ≥ 90%"`. Restored. ### Async-migration preflight - `ensureTransmissionsLastSeenColumn` — ALTER + CREATE INDEX both `// PREFLIGHT: async=true reason="..."` annotated. - `tx_last_seen_backfill_v1` — wrapped in `Store.RunAsyncMigration`. - `stmtBumpTxLastSeen` prepared statement — annotated; it is a row-level UPDATE BY PRIMARY KEY, not a migration. ### Preflight overrides PREFLIGHT-MIGRATION-SCALE: <30s N=5K - check-async-migration: justified for `cmd/server/issue1690_cold_load_test.go` CREATE TABLE/INDEX statements — these build an in-memory test fixture DB (≤5000 rows, runs in <1s in CI), not a prod migration. Fixes #1690. --------- Co-authored-by: meshcore-bot <bot@meshcore.local> Co-authored-by: bot <bot@example.com>