Files
meshcore-analyzer/internal
Kpa-clawbot 048143f54f fix(#1690): cold-load uses last_seen (effective recency) instead of first_seen (#1691)
## #1690 — cold-load uses wrong time axis (RED → GREEN)

The on-disk DB has thousands of long-lived hashes with recent traffic.
Prod's
cold-load filter (`transmissions.first_seen >= cutoff`) is bound to a
column
that is set once at insert time and never updated — so re-observation of
an
old hash does not move it into the hot window. Result: prod cold-loaded
~0.3%
of the on-disk rows and flipped `backgroundLoadComplete=true` without
ever
walking the retention window (the `retentionHours - hotStartupHours <=
0`
short-circuit at line 1353 of `cmd/server/store.go`).

### Three sub-fixes

**A) Denormalize `transmissions.last_seen`** so cold-load can window on
effective recency.

- `internal/dbschema/dbschema.go::ensureTransmissionsLastSeenColumn`
adds the
  column + `idx_tx_last_seen` (single-column INTEGER ALTER + index; both
  PREFLIGHT-annotated as cheap metadata-only ops).
- `cmd/ingestor/db.go::OpenStoreWithInterval` schedules
  `tx_last_seen_backfill_v1` via `Store.RunAsyncMigration` —
`UPDATE transmissions SET last_seen = MAX(observations.timestamp) WHERE
  last_seen = 0` — non-blocking on boot (1.9M+ obs row scan in prod).
- Writer-side: `InsertTransmission` seeds `last_seen` on initial insert,
and every observation insert bumps `last_seen = ?` via prepared
statement
`stmtBumpTxLastSeen` (conditional `last_seen < ?` so out-of-order ingest
  never goes backwards).
- Reader-side: `cmd/server/store.go::Load`, `loadChunk`, and
  `cmd/server/chunked_load.go::LoadChunked` switch the WHERE/ORDER-BY
clauses to `t.last_seen` when the column is present (PRAGMA-detected via
  `DB.hasLastSeen`). Test/legacy DBs without the column fall back to
  `first_seen` so existing fixtures stay green.

**B) Honest `backgroundLoadComplete` gating.**

- Drop the `retentionHours - hotStartupHours <= 0` short-circuit. Prod
runs
  with both at 12h, which flipped Done=true immediately.
- After the chunk loop, query
`SELECT COUNT(*) FROM transmissions WHERE last_seen >= retentionFloor`
and
  compute `loadCoverageRatio = inMem / inDB`. Done=true only when
  `ratio >= 0.90` AND no chunk errors. `backgroundLoadFailed=true` +
  `backgroundLoadError` populated otherwise (e.g. `"loaded 20.0% of 5000
  rows (1000 in memory)"`).
- `bgErrMu`-guarded `loadCoverageRatio` + `backgroundLoadErr` so the
perf
  endpoint can read them without blocking the writer.

**C) Perf exposure.**

`PerfPacketStoreStats` gains `RetentionHours`, `OldestLoaded`,
`LoadCoverageRatio`, `BackgroundLoadError` — surfaces what fraction of
the
on-disk DB the in-memory store currently reflects, so operators can see
the
0.3% case in `/api/perf` without reading the logs.

### TDD trail

- **RED**: `05f0c6dd2bea6dc37324c548a49564d739aca920` — failing tests +
21-line
store.go scaffolding. CI on this commit failed on assertions (intended).
- **GREEN**: this PR's HEAD commit (8 files, +271/-24). Targeted suite:
  `Test1690_ColdLoad_TimeAxis`, `Test1690_BackgroundLoadHonesty`,
  `Test1690_PerfStats_NewFields`, `TestHotStartup_*`,
  `TestIssue1690_LastSeenUpdatedOnObservation` — all pass.

Anti-tautology: locally reverted the `if !s.backgroundLoadFailed.Load()`
guard around `backgroundLoadDone.Store(true)` —
`Test1690_BackgroundLoadHonesty`
fails on the assertion `"backgroundLoadDone=true with only 1000/5000
packets
loaded; must be false until coverage ≥ 90%"`. Restored.

### Async-migration preflight

- `ensureTransmissionsLastSeenColumn` — ALTER + CREATE INDEX both
  `// PREFLIGHT: async=true reason="..."` annotated.
- `tx_last_seen_backfill_v1` — wrapped in `Store.RunAsyncMigration`.
- `stmtBumpTxLastSeen` prepared statement — annotated; it is a row-level
  UPDATE BY PRIMARY KEY, not a migration.

### Preflight overrides

PREFLIGHT-MIGRATION-SCALE: <30s N=5K
- check-async-migration: justified for
`cmd/server/issue1690_cold_load_test.go`
CREATE TABLE/INDEX statements — these build an in-memory test fixture DB
  (≤5000 rows, runs in <1s in CI), not a prod migration.

Fixes #1690.

---------

Co-authored-by: meshcore-bot <bot@meshcore.local>
Co-authored-by: bot <bot@example.com>
2026-06-12 12:47:53 -07:00
..