Files
meshcore-analyzer/cmd/server/issue1690_cold_load_test.go
Kpa-clawbot fa15ab0a30 fix(#1809): gate background loader on LoadChunked completion (#1811)
Partial fix for #1809.

Red commit: c9c782b5 (CI runs on PR open; standalone red-branch CI not
configured — local repro proves the gate, see below).

## Problem
Issue #1809: at startup the background fill loader logged `background
load FAILED` within seconds and `backgroundLoadFailed=true` was set,
leaving the coverage gate tripped even though LoadChunked itself
completed normally.

## Root cause
`main.go:225-245` spawned `go store.loadBackgroundChunks()` as soon as
`FirstChunkReady` fired (chunk #1 = 10000 tx). But `s.oldestLoaded` is
only assigned at the end of `LoadChunked` (`chunked_load.go:329-333`),
~tens of seconds later. The bg loader read `oldestLoaded==""` at
`store.go:1462-1466`, broke out immediately, walked zero chunks, and the
coverage gate at `store.go:1543-1554` flipped
`backgroundLoadFailed=true`.

## Fix (initial)
Introduce `PacketStore.RunStartupLoad(chunkSize)` (`chunked_load.go`).
It runs `LoadChunked` first; only on success and only when
`hotStartupHours > 0` does it call `loadBackgroundChunks`. `main.go`
invokes `RunStartupLoad` in the same goroutine pattern as before, so
`FirstChunkReady` still unblocks the HTTP listener bind at chunk #1 —
only the bg loader is gated.

## Round-1 followups (this push)
Reviewer-driven hardening on top of the initial fix:

### Production behavior (commit db5592f6)
- **Steady-state semantics tightened.** `RunStartupLoad` now picks a
terminal state on every branch:
- LoadChunked error → `backgroundLoadFailed=true` with captured error
(was: `done=false, failed=false` indefinite).
- `hotStartupHours == 0` → `backgroundLoadDone=true` immediately,
`progress=100` (was: `done=false` forever → healthz stuck on
`backgroundLoadComplete=false`).
- Successful hot-window path → terminal state is whatever
`loadBackgroundChunks` sets (#1690 semantics, unchanged).
- **Runtime invariant assertion (A7).** `loadBackgroundChunks` panics
when `oldestLoaded==""` and packets exist — a future refactor that
re-introduces the parallel-spawn race fails loudly instead of silently
shipping the same coverage regression.
- **`RunStartupLoad` cleanup.** Inlined the superfluous goroutine +
channel that wrapped `LoadChunked` (direct call is equivalent).
- **Logging.** Added an INFO line between `LoadChunked` completion and
bg-loader start (the #1809 post-mortem needed exactly this signal).
Fixed the lying `"background load will start"` log that fired even on
the `hotStartupHours==0` branch.
- **Immutability documented.** `hotStartupHours` is now explicitly
documented as immutable post-construction, so the lock-free reads in
`LoadChunked` / `RunStartupLoad` / `loadBackgroundChunks` are sound.

### Test coverage (commits db5592f6 + e9e12acf)
- **Tautology fix (B1, commit e9e12acf).** The original
`Test1809_StartupLoad_BgLoaderSeesOldestLoaded` fixture seeded all 100
rows inside the 1h hot window, so `LoadChunked` alone produced
coverage=1.0 — the test passed even if `loadBackgroundChunks` was a
no-op. Rewrote the fixture to spread 100 rows over 14 days with
`hotStartupHours=24`, so only ~7 rows are hot and the remaining ~93 MUST
be loaded by the bg loader for the assertions to hold. Original
red-commit assertions kept intact; added `len(packets) > hot-only cap`
and `oldestLoaded < hot-cutoff - 12h` assertions on top.
- **New tests (commit db5592f6, `runstartup_load_test.go`)** codify the
new contracts:
  - `TestRunStartupLoad_HotStartupHoursZero_SetsDoneImmediately`
  - `TestRunStartupLoad_LoadChunkedError_SetsFailedTerminal`
  - `TestRunStartupLoad_EmptyDB_SetsDoneTerminal`
  - `TestRunStartupLoad_BgLoaderRunsAfterLoadChunkedSets_OldestLoaded`
  - `TestLoadBackgroundChunks_PanicsOnOldestLoadedEmpty_Invariant`

### Docs (commit 70fa16f7)
- Package-level doc in `chunked_load.go` now documents `RunStartupLoad`
as the orchestrator entry point alongside `LoadChunked` /
`loadStatusMiddleware` / `OnChunkLoaded`.

### Preflight (commit eec1b48c)
- Test-fixture DDL annotated with `// PREFLIGHT: async=true
reason="unit-test fixture"` so the async-migration gate distinguishes
ephemeral test schema from prod migration paths.

## What's still NOT fixed (left intentionally open)
This PR addresses the startup race specifically. Issue #1809 will be
closed by the operator after observing healthy startup logs (`background
load complete: ... coverage=100.0%`) in prod for a full restart cycle.
Do not auto-close — leave open until the operator verifies.

## Test
Local repro (red branch state, before green commit): `go test
./cmd/server/ -run Test1809_StartupLoad_BgLoaderSeesOldestLoaded` → FAIL
on assertion `backgroundLoadFailed=true ...
oldest="2026-06-30T18:56:09Z"`. After green commit + round-1 followups:
PASS. Full `cmd/server/...` suite: ok in ~70s.

## Risk
Low — startup path only. New behavior gates surfaced by the new tests;
coverage-gate semantics unchanged. Runtime panic is a new failure mode
but only fires on a state (`oldestLoaded=="" && len(packets)>0`) that is
unreachable on the current code path — it exists solely as a refactor
tripwire.

---------

Co-authored-by: mc-bot <bot@corescope.local>
Co-authored-by: openclaw-bot <bot@openclaw.local>
Co-authored-by: meshcore-bot <bot@meshcore>
2026-06-30 17:19:56 -07:00

171 lines
5.8 KiB
Go

package main
// Tests for issue #1690 — cold-load uses wrong time axis (first_seen instead
// of effective recency). Three tests live in this file:
//
// Test1690_ColdLoad_TimeAxis — long-lived transmissions (first_seen 30d
// ago) with recent observations must load
// under a 1h hotStartupHours window.
// Test1690_BackgroundLoadHonesty — backgroundLoadComplete must NOT flip to
// true when coverage is below threshold.
// Test1690_PerfStats_NewFields — typed perf response must expose
// retentionHours, oldestLoaded,
// loadCoverageRatio.
import (
"encoding/json"
"path/filepath"
"strings"
"testing"
"time"
_ "modernc.org/sqlite"
)
// createTestDBWithLastSeen seeds a DB with the post-fix schema (last_seen
// column on transmissions). nowSec is the unix-second reference; fixture
// rows are placed relative to it.
//
// numTx transmissions, each with first_seen = nowSec - firstSeenAgo, and
// last_seen = nowSec - lastSeenAgo. Each tx has obsPerTx observations whose
// timestamps are within the last 20 minutes.
//
// Thin wrapper over seedTestDBRows (PR #1811 adv #2 — DRY). The schema
// DDL and the block-level PREFLIGHT async-migration annotation live
// there; this helper only supplies the row-time policy used by the
// #1690 fixture (fixed first_seen, fixed last_seen).
func createTestDBWithLastSeen(t *testing.T, dbPath string, numTx, obsPerTx int, nowSec int64, firstSeenAgo, lastSeenAgo time.Duration) {
t.Helper()
firstSeenTime := time.Unix(nowSec, 0).UTC().Add(-firstSeenAgo).Format(time.RFC3339)
lastSeenUnix := nowSec - int64(lastSeenAgo.Seconds())
seedTestDBRows(t, dbPath, numTx, obsPerTx, func(i int) (string, int64) {
return firstSeenTime, lastSeenUnix
})
}
// Test1690_ColdLoad_TimeAxis seeds 1000 transmissions whose hash *first
// appeared* 30 days ago but whose last observation was 30 minutes ago.
// With a 1h hotStartupHours, the pre-fix code (filtering on first_seen)
// loads zero rows; the post-fix code (filtering on last_seen) must load
// all 1000.
func Test1690_ColdLoad_TimeAxis(t *testing.T) {
dir := t.TempDir()
dbPath := filepath.Join(dir, "test.db")
nowSec := time.Now().UTC().Unix()
createTestDBWithLastSeen(t, dbPath, 1000, 1, nowSec,
30*24*time.Hour, // first_seen = 30d ago
30*time.Minute) // last_seen = 30min ago
db, err := OpenDB(dbPath)
if err != nil {
t.Fatalf("OpenDB: %v", err)
}
defer db.conn.Close()
store := NewPacketStore(db, &PacketStoreConfig{
RetentionHours: 168,
HotStartupHours: 1,
})
if err := store.LoadChunked(0); err != nil {
t.Fatalf("LoadChunked: %v", err)
}
loaded := len(store.packets)
if loaded < 1000 {
t.Fatalf("Test1690_ColdLoad_TimeAxis: expected ≥1000 transmissions loaded "+
"(all 1000 fixture rows have last_seen within 1h), got %d. "+
"Pre-fix behavior: chunked_load.go filters t.first_seen >= now-1h "+
"which excludes all 30d-old rows.", loaded)
}
}
// Test1690_BackgroundLoadHonesty seeds 1000 transmissions but caps the
// store's memory budget so it can only fit a fraction. After
// loadBackgroundChunks runs, backgroundLoadDone must be FALSE and
// backgroundLoadFailed must be TRUE because actual coverage is < 90%.
func Test1690_BackgroundLoadHonesty(t *testing.T) {
dir := t.TempDir()
dbPath := filepath.Join(dir, "test.db")
nowSec := time.Now().UTC().Unix()
// 5000 rows; chunkSize=500 + maxMemoryMB=1 (→ maxPackets ≈ 1000) so
// the load breaks at the end of the chunk that crosses the cap and
// totalLoaded ≪ 5000.
createTestDBWithLastSeen(t, dbPath, 5000, 1, nowSec,
30*time.Minute, 30*time.Minute)
db, err := OpenDB(dbPath)
if err != nil {
t.Fatalf("OpenDB: %v", err)
}
defer db.conn.Close()
store := NewPacketStore(db, &PacketStoreConfig{
RetentionHours: 168,
HotStartupHours: 1,
MaxMemoryMB: 1, // forces bounded load ≪ 5000 rows
})
if err := store.LoadChunked(500); err != nil {
t.Fatalf("LoadChunked: %v", err)
}
store.loadBackgroundChunks()
if store.backgroundLoadDone.Load() {
t.Errorf("backgroundLoadDone=true with only %d/5000 packets loaded; "+
"must be false until coverage ≥ 90%%", len(store.packets))
}
if !store.backgroundLoadFailed.Load() {
t.Errorf("backgroundLoadFailed=false despite under-coverage "+
"(%d/5000 packets loaded); must be true with a reason", len(store.packets))
}
// The error message must mention a percentage so operators can see
// the actual ratio surface in the perf endpoint.
errMsg := store.BackgroundLoadError()
if !strings.Contains(errMsg, "%") {
t.Errorf("backgroundLoadError=%q; expected human-readable ratio "+
"(e.g. 'loaded X%% of Y rows')", errMsg)
}
}
// Test1690_PerfStats_NewFields asserts the typed perf payload exposes the
// retention/coverage fields needed for prod observability.
func Test1690_PerfStats_NewFields(t *testing.T) {
dir := t.TempDir()
dbPath := filepath.Join(dir, "test.db")
nowSec := time.Now().UTC().Unix()
createTestDBWithLastSeen(t, dbPath, 10, 1, nowSec,
30*time.Minute, 30*time.Minute)
db, err := OpenDB(dbPath)
if err != nil {
t.Fatalf("OpenDB: %v", err)
}
defer db.conn.Close()
store := NewPacketStore(db, &PacketStoreConfig{
RetentionHours: 168,
HotStartupHours: 1,
})
if err := store.LoadChunked(0); err != nil {
t.Fatalf("LoadChunked: %v", err)
}
ps := store.GetPerfStoreStatsTyped()
buf, err := json.Marshal(ps)
if err != nil {
t.Fatalf("marshal: %v", err)
}
var asMap map[string]interface{}
if err := json.Unmarshal(buf, &asMap); err != nil {
t.Fatalf("unmarshal: %v", err)
}
for _, key := range []string{"retentionHours", "oldestLoaded", "loadCoverageRatio"} {
if _, ok := asMap[key]; !ok {
t.Errorf("PerfPacketStoreStats missing %q field; payload=%s", key, string(buf))
}
}
}