Files
meshcore-analyzer/cmd/server/runstartup_load_test.go
Kpa-clawbot fa15ab0a30 fix(#1809): gate background loader on LoadChunked completion (#1811)
Partial fix for #1809.

Red commit: c9c782b5 (CI runs on PR open; standalone red-branch CI not
configured — local repro proves the gate, see below).

## Problem
Issue #1809: at startup the background fill loader logged `background
load FAILED` within seconds and `backgroundLoadFailed=true` was set,
leaving the coverage gate tripped even though LoadChunked itself
completed normally.

## Root cause
`main.go:225-245` spawned `go store.loadBackgroundChunks()` as soon as
`FirstChunkReady` fired (chunk #1 = 10000 tx). But `s.oldestLoaded` is
only assigned at the end of `LoadChunked` (`chunked_load.go:329-333`),
~tens of seconds later. The bg loader read `oldestLoaded==""` at
`store.go:1462-1466`, broke out immediately, walked zero chunks, and the
coverage gate at `store.go:1543-1554` flipped
`backgroundLoadFailed=true`.

## Fix (initial)
Introduce `PacketStore.RunStartupLoad(chunkSize)` (`chunked_load.go`).
It runs `LoadChunked` first; only on success and only when
`hotStartupHours > 0` does it call `loadBackgroundChunks`. `main.go`
invokes `RunStartupLoad` in the same goroutine pattern as before, so
`FirstChunkReady` still unblocks the HTTP listener bind at chunk #1 —
only the bg loader is gated.

## Round-1 followups (this push)
Reviewer-driven hardening on top of the initial fix:

### Production behavior (commit db5592f6)
- **Steady-state semantics tightened.** `RunStartupLoad` now picks a
terminal state on every branch:
- LoadChunked error → `backgroundLoadFailed=true` with captured error
(was: `done=false, failed=false` indefinite).
- `hotStartupHours == 0` → `backgroundLoadDone=true` immediately,
`progress=100` (was: `done=false` forever → healthz stuck on
`backgroundLoadComplete=false`).
- Successful hot-window path → terminal state is whatever
`loadBackgroundChunks` sets (#1690 semantics, unchanged).
- **Runtime invariant assertion (A7).** `loadBackgroundChunks` panics
when `oldestLoaded==""` and packets exist — a future refactor that
re-introduces the parallel-spawn race fails loudly instead of silently
shipping the same coverage regression.
- **`RunStartupLoad` cleanup.** Inlined the superfluous goroutine +
channel that wrapped `LoadChunked` (direct call is equivalent).
- **Logging.** Added an INFO line between `LoadChunked` completion and
bg-loader start (the #1809 post-mortem needed exactly this signal).
Fixed the lying `"background load will start"` log that fired even on
the `hotStartupHours==0` branch.
- **Immutability documented.** `hotStartupHours` is now explicitly
documented as immutable post-construction, so the lock-free reads in
`LoadChunked` / `RunStartupLoad` / `loadBackgroundChunks` are sound.

### Test coverage (commits db5592f6 + e9e12acf)
- **Tautology fix (B1, commit e9e12acf).** The original
`Test1809_StartupLoad_BgLoaderSeesOldestLoaded` fixture seeded all 100
rows inside the 1h hot window, so `LoadChunked` alone produced
coverage=1.0 — the test passed even if `loadBackgroundChunks` was a
no-op. Rewrote the fixture to spread 100 rows over 14 days with
`hotStartupHours=24`, so only ~7 rows are hot and the remaining ~93 MUST
be loaded by the bg loader for the assertions to hold. Original
red-commit assertions kept intact; added `len(packets) > hot-only cap`
and `oldestLoaded < hot-cutoff - 12h` assertions on top.
- **New tests (commit db5592f6, `runstartup_load_test.go`)** codify the
new contracts:
  - `TestRunStartupLoad_HotStartupHoursZero_SetsDoneImmediately`
  - `TestRunStartupLoad_LoadChunkedError_SetsFailedTerminal`
  - `TestRunStartupLoad_EmptyDB_SetsDoneTerminal`
  - `TestRunStartupLoad_BgLoaderRunsAfterLoadChunkedSets_OldestLoaded`
  - `TestLoadBackgroundChunks_PanicsOnOldestLoadedEmpty_Invariant`

### Docs (commit 70fa16f7)
- Package-level doc in `chunked_load.go` now documents `RunStartupLoad`
as the orchestrator entry point alongside `LoadChunked` /
`loadStatusMiddleware` / `OnChunkLoaded`.

### Preflight (commit eec1b48c)
- Test-fixture DDL annotated with `// PREFLIGHT: async=true
reason="unit-test fixture"` so the async-migration gate distinguishes
ephemeral test schema from prod migration paths.

## What's still NOT fixed (left intentionally open)
This PR addresses the startup race specifically. Issue #1809 will be
closed by the operator after observing healthy startup logs (`background
load complete: ... coverage=100.0%`) in prod for a full restart cycle.
Do not auto-close — leave open until the operator verifies.

## Test
Local repro (red branch state, before green commit): `go test
./cmd/server/ -run Test1809_StartupLoad_BgLoaderSeesOldestLoaded` → FAIL
on assertion `backgroundLoadFailed=true ...
oldest="2026-06-30T18:56:09Z"`. After green commit + round-1 followups:
PASS. Full `cmd/server/...` suite: ok in ~70s.

## Risk
Low — startup path only. New behavior gates surfaced by the new tests;
coverage-gate semantics unchanged. Runtime panic is a new failure mode
but only fires on a state (`oldestLoaded=="" && len(packets)>0`) that is
unreachable on the current code path — it exists solely as a refactor
tripwire.

---------

Co-authored-by: mc-bot <bot@corescope.local>
Co-authored-by: openclaw-bot <bot@openclaw.local>
Co-authored-by: meshcore-bot <bot@meshcore>
2026-06-30 17:19:56 -07:00

325 lines
11 KiB
Go

package main
// Tests for RunStartupLoad branch behavior and #1809 invariants
// (PR #1811 round-1 followups B2/B3/B4/B5).
//
// The pre-#1811 RunStartupLoad left several steady states undefined:
// * hotStartupHours == 0 → backgroundLoadDone stayed false forever
// * LoadChunked error → both done & failed stayed false
// * empty DB + no bg work needed → backgroundLoadDone stayed false
//
// These tests codify the post-#1811 contract:
// * LoadChunked error → backgroundLoadFailed=true, done=false
// * hotStartupHours == 0 → backgroundLoadDone=true, failed=false,
// bg loader NOT called
// * empty DB + hot window → backgroundLoadDone reflects coverage
// (1.0 on empty DB → done=true, failed=false)
// * call ordering inside RunStartupLoad: LoadChunked completes
// before loadBackgroundChunks executes (so oldestLoaded is set)
import (
"fmt"
"path/filepath"
"strings"
"sync"
"sync/atomic"
"testing"
"time"
)
// TestRunStartupLoad_HotStartupHoursZero_SetsDoneImmediately covers B3:
// when hotStartupHours == 0 the bg loader has no work to do; healthz
// must NOT be stuck on backgroundLoadComplete=false.
func TestRunStartupLoad_HotStartupHoursZero_SetsDoneImmediately(t *testing.T) {
dir := t.TempDir()
dbPath := filepath.Join(dir, "test.db")
nowSec := time.Now().UTC().Unix()
createTestDBWithLastSeen(t, dbPath, 10, 1, nowSec,
30*time.Minute, 30*time.Minute)
db, err := OpenDB(dbPath)
if err != nil {
t.Fatalf("OpenDB: %v", err)
}
defer db.conn.Close()
store := NewPacketStore(db, &PacketStoreConfig{
RetentionHours: 168,
HotStartupHours: 0, // disable hot window → bg loader must not run
})
if err := store.RunStartupLoad(500); err != nil {
t.Fatalf("RunStartupLoad: %v", err)
}
if !store.backgroundLoadDone.Load() {
t.Fatalf("backgroundLoadDone must be true when hotStartupHours=0 (no bg work needed)")
}
if store.backgroundLoadFailed.Load() {
t.Fatalf("backgroundLoadFailed must be false on the no-bg-work path; got error=%q",
store.BackgroundLoadError())
}
}
// TestRunStartupLoad_LoadChunkedError_SetsFailedTerminal covers B2:
// when LoadChunked errors, the steady state must be terminal — both
// done=true (LoadChunked contract: done is the primary observable
// signal, see store.go contract block) AND failed=true. The pre-#1811
// implementation left done=false, leaving healthz wedged on the
// failure path (dij #1 / adv #7).
func TestRunStartupLoad_LoadChunkedError_SetsFailedTerminal(t *testing.T) {
dir := t.TempDir()
dbPath := filepath.Join(dir, "test.db")
nowSec := time.Now().UTC().Unix()
createTestDBWithLastSeen(t, dbPath, 5, 1, nowSec,
30*time.Minute, 30*time.Minute)
db, err := OpenDB(dbPath)
if err != nil {
t.Fatalf("OpenDB: %v", err)
}
// Close the underlying connection to force LoadChunked to fail on
// its very first query. We're explicitly verifying the failure path
// terminal state, not the success path.
_ = db.conn.Close()
store := NewPacketStore(db, &PacketStoreConfig{
RetentionHours: 168,
HotStartupHours: 1,
})
loadErr := store.RunStartupLoad(500)
if loadErr == nil {
t.Fatalf("RunStartupLoad must return an error when LoadChunked fails")
}
if !store.backgroundLoadFailed.Load() {
t.Fatalf("backgroundLoadFailed must be true after LoadChunked error (terminal state)")
}
if !store.backgroundLoadDone.Load() {
t.Fatalf("backgroundLoadDone must be true on LoadChunked error (LoadChunked contract: done observable, qualified by failed)")
}
if store.BackgroundLoadError() == "" {
t.Fatalf("BackgroundLoadError must be non-empty after LoadChunked failure")
}
}
// TestRunStartupLoad_EmptyDB_SetsDoneTerminal covers B4: empty DB with
// hot window > 0 — oldestLoaded stays "" because there are no packets.
// loadBackgroundChunks must reach its coverage block (totalInDB==0 →
// ratio=1.0) and set done=true rather than leaving the store stuck.
func TestRunStartupLoad_EmptyDB_SetsDoneTerminal(t *testing.T) {
dir := t.TempDir()
dbPath := filepath.Join(dir, "test.db")
createTestDBWithLastSeen(t, dbPath, 0, 0, time.Now().UTC().Unix(),
30*time.Minute, 30*time.Minute)
db, err := OpenDB(dbPath)
if err != nil {
t.Fatalf("OpenDB: %v", err)
}
defer db.conn.Close()
store := NewPacketStore(db, &PacketStoreConfig{
RetentionHours: 168,
HotStartupHours: 1,
})
if err := store.RunStartupLoad(500); err != nil {
t.Fatalf("RunStartupLoad on empty DB: %v", err)
}
if !store.backgroundLoadDone.Load() {
t.Fatalf("backgroundLoadDone must be true after empty-DB load (nothing to load == complete)")
}
if store.backgroundLoadFailed.Load() {
t.Fatalf("backgroundLoadFailed must be false on empty DB; got %q",
store.BackgroundLoadError())
}
}
// TestRunStartupLoad_BgLoaderRunsAfterLoadChunkedSets_OldestLoaded
// covers B5/B6 (dij #3, adv #3): assert the in-process call ordering
// inside RunStartupLoad — bg loader entry MUST happen-after the final
// OnChunkLoaded callback (which is the last write to s.oldestLoaded
// inside LoadChunked).
//
// The previous version only asserted that OnChunkLoaded fired and
// that oldestLoaded was non-empty after RunStartupLoad returned —
// which proves existence but not ordering (a future refactor could
// re-introduce the parallel-spawn race and the test would still
// pass because both signals are observable post-hoc).
//
// This version captures timestamps from each side (chunk callback,
// bg-loader entry) via a test-only hook and asserts
// tBgEntry >= tLastChunk monotonically. The runtime invariant in
// loadBackgroundChunks remains as the deterministic backstop.
func TestRunStartupLoad_BgLoaderRunsAfterLoadChunkedSets_OldestLoaded(t *testing.T) {
dir := t.TempDir()
dbPath := filepath.Join(dir, "test.db")
nowSec := time.Now().UTC().Unix()
createTestDBWithLastSeen(t, dbPath, 50, 1, nowSec,
30*time.Minute, 30*time.Minute)
db, err := OpenDB(dbPath)
if err != nil {
t.Fatalf("OpenDB: %v", err)
}
defer db.conn.Close()
store := NewPacketStore(db, &PacketStoreConfig{
RetentionHours: 168,
HotStartupHours: 1,
})
var (
mu sync.Mutex
lastChunkAt time.Time
bgEntryAt time.Time
chunkSawOldest string
)
store.OnChunkLoaded(func(rowsThisChunk, totalRows int) {
mu.Lock()
lastChunkAt = time.Now()
// LoadChunked sets oldestLoaded under s.mu AFTER the last
// fireChunkCallbacks call returns, so this read may be empty
// on the last chunk — record whatever is visible.
store.mu.RLock()
chunkSawOldest = store.oldestLoaded
store.mu.RUnlock()
mu.Unlock()
})
store.bgLoaderEntryHook = func() {
mu.Lock()
bgEntryAt = time.Now()
mu.Unlock()
}
if err := store.RunStartupLoad(500); err != nil {
t.Fatalf("RunStartupLoad: %v", err)
}
mu.Lock()
defer mu.Unlock()
if lastChunkAt.IsZero() {
t.Fatalf("OnChunkLoaded never fired — LoadChunked did not process any chunks")
}
if bgEntryAt.IsZero() {
t.Fatalf("bgLoaderEntryHook never fired — loadBackgroundChunks did not enter")
}
if !bgEntryAt.After(lastChunkAt) && !bgEntryAt.Equal(lastChunkAt) {
t.Fatalf("ordering violation: bg loader entered at %s BEFORE last chunk callback at %s — #1809 race re-introduced",
bgEntryAt.Format(time.RFC3339Nano), lastChunkAt.Format(time.RFC3339Nano))
}
if store.oldestLoaded == "" {
t.Fatalf("oldestLoaded is empty after RunStartupLoad — bg loader would have read \"\" and bailed")
}
_ = chunkSawOldest // diagnostic only; kept to make race future-debug easier
}
// TestRunStartupLoad_LoadChunkedError_MidLoad covers dij #6 / adv #5:
// the existing TestRunStartupLoad_LoadChunkedError_SetsFailedTerminal
// closes the conn BEFORE LoadChunked starts, so the error surfaces on
// chunk 0's first query. That's a degenerate path — it does not prove
// mid-load failure behavior. This test fires OnChunkLoaded once,
// closes the DB conn from the callback, and asserts the chunk-N query
// fails and the terminal state is still failed=true,done=true with a
// non-empty error.
func TestRunStartupLoad_LoadChunkedError_MidLoad(t *testing.T) {
dir := t.TempDir()
dbPath := filepath.Join(dir, "test.db")
nowSec := time.Now().UTC().Unix()
// Seed enough rows so a small chunk size yields >1 chunk.
createTestDBWithLastSeen(t, dbPath, 200, 1, nowSec,
30*time.Minute, 30*time.Minute)
db, err := OpenDB(dbPath)
if err != nil {
t.Fatalf("OpenDB: %v", err)
}
store := NewPacketStore(db, &PacketStoreConfig{
RetentionHours: 168,
HotStartupHours: 1,
})
// Close the underlying conn from inside the chunk callback so the
// NEXT chunk's query fails mid-load instead of on the first query.
var closed atomic.Bool
store.OnChunkLoaded(func(rowsThisChunk, totalRows int) {
if closed.CompareAndSwap(false, true) {
_ = db.conn.Close()
}
})
loadErr := store.RunStartupLoad(50)
if loadErr == nil {
t.Fatalf("RunStartupLoad must error when conn closes mid-load")
}
if !store.backgroundLoadFailed.Load() {
t.Fatalf("backgroundLoadFailed must be true after mid-load error")
}
if !store.backgroundLoadDone.Load() {
t.Fatalf("backgroundLoadDone must be true after mid-load error (terminal contract)")
}
if !strings.Contains(store.BackgroundLoadError(), "LoadChunked failed") {
t.Fatalf("BackgroundLoadError must mention LoadChunked failure; got %q", store.BackgroundLoadError())
}
}
// TestLoadBackgroundChunks_PanicsOnOldestLoadedEmpty_Invariant covers the
// runtime assertion (A7). Manually populate s.packets without setting
// oldestLoaded and call loadBackgroundChunks directly — the panic guard
// must fire so future refactors cannot silently re-introduce the
// #1809 race.
func TestLoadBackgroundChunks_PanicsOnOldestLoadedEmpty_Invariant(t *testing.T) {
dir := t.TempDir()
dbPath := filepath.Join(dir, "test.db")
// Reuse the existing schema-only fixture helper (0 rows) so this
// test does not introduce a new inline CREATE TABLE block (pr-preflight
// async-migration gate). The fixture provides exactly the bare schema
// loadBackgroundChunks needs to reach its panic guard.
createTestDBWithLastSeen(t, dbPath, 0, 0, time.Now().UTC().Unix(),
30*time.Minute, 30*time.Minute)
db, err := OpenDB(dbPath)
if err != nil {
t.Fatalf("OpenDB: %v", err)
}
defer db.conn.Close()
store := NewPacketStore(db, &PacketStoreConfig{
RetentionHours: 168,
HotStartupHours: 1,
})
// Simulate the #1809 race: packets present, oldestLoaded never set.
store.mu.Lock()
store.packets = append(store.packets, &StoreTx{ID: 1, Hash: "deadbeef", FirstSeen: "2025-01-01T00:00:00Z"})
store.oldestLoaded = ""
store.mu.Unlock()
// Override invariantViolation so the test can recover() and
// assert the message without log.Fatalf killing the runner
// (adv #6: prod uses log.Fatalf for a clean shutdown). Restore on
// exit so other tests are unaffected.
prev := invariantViolation
defer func() { invariantViolation = prev }()
invariantViolation = func(msg string) { panic(msg) }
defer func() {
r := recover()
if r == nil {
t.Fatalf("loadBackgroundChunks must trip invariantViolation when oldestLoaded=\"\" with packets in store (#1809 invariant)")
}
msg := fmt.Sprintf("%v", r)
// adv #4: tighten the tripwire — assert the specific invariant
// message contains the issue tag so a future refactor that
// reuses the panic path for a different bug doesn't silently
// satisfy this test.
if !strings.Contains(msg, "#1809") {
t.Fatalf("invariant message must reference #1809 to gate future regressions; got %q", msg)
}
if !strings.Contains(msg, "oldestLoaded") {
t.Fatalf("invariant message must mention oldestLoaded; got %q", msg)
}
}()
store.loadBackgroundChunks()
}