mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-07-03 13:51:41 +00:00
fa15ab0a30
Partial fix for #1809. Red commit:c9c782b5(CI runs on PR open; standalone red-branch CI not configured — local repro proves the gate, see below). ## Problem Issue #1809: at startup the background fill loader logged `background load FAILED` within seconds and `backgroundLoadFailed=true` was set, leaving the coverage gate tripped even though LoadChunked itself completed normally. ## Root cause `main.go:225-245` spawned `go store.loadBackgroundChunks()` as soon as `FirstChunkReady` fired (chunk #1 = 10000 tx). But `s.oldestLoaded` is only assigned at the end of `LoadChunked` (`chunked_load.go:329-333`), ~tens of seconds later. The bg loader read `oldestLoaded==""` at `store.go:1462-1466`, broke out immediately, walked zero chunks, and the coverage gate at `store.go:1543-1554` flipped `backgroundLoadFailed=true`. ## Fix (initial) Introduce `PacketStore.RunStartupLoad(chunkSize)` (`chunked_load.go`). It runs `LoadChunked` first; only on success and only when `hotStartupHours > 0` does it call `loadBackgroundChunks`. `main.go` invokes `RunStartupLoad` in the same goroutine pattern as before, so `FirstChunkReady` still unblocks the HTTP listener bind at chunk #1 — only the bg loader is gated. ## Round-1 followups (this push) Reviewer-driven hardening on top of the initial fix: ### Production behavior (commitdb5592f6) - **Steady-state semantics tightened.** `RunStartupLoad` now picks a terminal state on every branch: - LoadChunked error → `backgroundLoadFailed=true` with captured error (was: `done=false, failed=false` indefinite). - `hotStartupHours == 0` → `backgroundLoadDone=true` immediately, `progress=100` (was: `done=false` forever → healthz stuck on `backgroundLoadComplete=false`). - Successful hot-window path → terminal state is whatever `loadBackgroundChunks` sets (#1690 semantics, unchanged). - **Runtime invariant assertion (A7).** `loadBackgroundChunks` panics when `oldestLoaded==""` and packets exist — a future refactor that re-introduces the parallel-spawn race fails loudly instead of silently shipping the same coverage regression. - **`RunStartupLoad` cleanup.** Inlined the superfluous goroutine + channel that wrapped `LoadChunked` (direct call is equivalent). - **Logging.** Added an INFO line between `LoadChunked` completion and bg-loader start (the #1809 post-mortem needed exactly this signal). Fixed the lying `"background load will start"` log that fired even on the `hotStartupHours==0` branch. - **Immutability documented.** `hotStartupHours` is now explicitly documented as immutable post-construction, so the lock-free reads in `LoadChunked` / `RunStartupLoad` / `loadBackgroundChunks` are sound. ### Test coverage (commitsdb5592f6+e9e12acf) - **Tautology fix (B1, commite9e12acf).** The original `Test1809_StartupLoad_BgLoaderSeesOldestLoaded` fixture seeded all 100 rows inside the 1h hot window, so `LoadChunked` alone produced coverage=1.0 — the test passed even if `loadBackgroundChunks` was a no-op. Rewrote the fixture to spread 100 rows over 14 days with `hotStartupHours=24`, so only ~7 rows are hot and the remaining ~93 MUST be loaded by the bg loader for the assertions to hold. Original red-commit assertions kept intact; added `len(packets) > hot-only cap` and `oldestLoaded < hot-cutoff - 12h` assertions on top. - **New tests (commitdb5592f6, `runstartup_load_test.go`)** codify the new contracts: - `TestRunStartupLoad_HotStartupHoursZero_SetsDoneImmediately` - `TestRunStartupLoad_LoadChunkedError_SetsFailedTerminal` - `TestRunStartupLoad_EmptyDB_SetsDoneTerminal` - `TestRunStartupLoad_BgLoaderRunsAfterLoadChunkedSets_OldestLoaded` - `TestLoadBackgroundChunks_PanicsOnOldestLoadedEmpty_Invariant` ### Docs (commit70fa16f7) - Package-level doc in `chunked_load.go` now documents `RunStartupLoad` as the orchestrator entry point alongside `LoadChunked` / `loadStatusMiddleware` / `OnChunkLoaded`. ### Preflight (commiteec1b48c) - Test-fixture DDL annotated with `// PREFLIGHT: async=true reason="unit-test fixture"` so the async-migration gate distinguishes ephemeral test schema from prod migration paths. ## What's still NOT fixed (left intentionally open) This PR addresses the startup race specifically. Issue #1809 will be closed by the operator after observing healthy startup logs (`background load complete: ... coverage=100.0%`) in prod for a full restart cycle. Do not auto-close — leave open until the operator verifies. ## Test Local repro (red branch state, before green commit): `go test ./cmd/server/ -run Test1809_StartupLoad_BgLoaderSeesOldestLoaded` → FAIL on assertion `backgroundLoadFailed=true ... oldest="2026-06-30T18:56:09Z"`. After green commit + round-1 followups: PASS. Full `cmd/server/...` suite: ok in ~70s. ## Risk Low — startup path only. New behavior gates surfaced by the new tests; coverage-gate semantics unchanged. Runtime panic is a new failure mode but only fires on a state (`oldestLoaded=="" && len(packets)>0`) that is unreachable on the current code path — it exists solely as a refactor tripwire. --------- Co-authored-by: mc-bot <bot@corescope.local> Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: meshcore-bot <bot@meshcore>
325 lines
11 KiB
Go
325 lines
11 KiB
Go
package main
|
|
|
|
// Tests for RunStartupLoad branch behavior and #1809 invariants
|
|
// (PR #1811 round-1 followups B2/B3/B4/B5).
|
|
//
|
|
// The pre-#1811 RunStartupLoad left several steady states undefined:
|
|
// * hotStartupHours == 0 → backgroundLoadDone stayed false forever
|
|
// * LoadChunked error → both done & failed stayed false
|
|
// * empty DB + no bg work needed → backgroundLoadDone stayed false
|
|
//
|
|
// These tests codify the post-#1811 contract:
|
|
// * LoadChunked error → backgroundLoadFailed=true, done=false
|
|
// * hotStartupHours == 0 → backgroundLoadDone=true, failed=false,
|
|
// bg loader NOT called
|
|
// * empty DB + hot window → backgroundLoadDone reflects coverage
|
|
// (1.0 on empty DB → done=true, failed=false)
|
|
// * call ordering inside RunStartupLoad: LoadChunked completes
|
|
// before loadBackgroundChunks executes (so oldestLoaded is set)
|
|
|
|
import (
|
|
"fmt"
|
|
"path/filepath"
|
|
"strings"
|
|
"sync"
|
|
"sync/atomic"
|
|
"testing"
|
|
"time"
|
|
)
|
|
|
|
// TestRunStartupLoad_HotStartupHoursZero_SetsDoneImmediately covers B3:
|
|
// when hotStartupHours == 0 the bg loader has no work to do; healthz
|
|
// must NOT be stuck on backgroundLoadComplete=false.
|
|
func TestRunStartupLoad_HotStartupHoursZero_SetsDoneImmediately(t *testing.T) {
|
|
dir := t.TempDir()
|
|
dbPath := filepath.Join(dir, "test.db")
|
|
nowSec := time.Now().UTC().Unix()
|
|
createTestDBWithLastSeen(t, dbPath, 10, 1, nowSec,
|
|
30*time.Minute, 30*time.Minute)
|
|
|
|
db, err := OpenDB(dbPath)
|
|
if err != nil {
|
|
t.Fatalf("OpenDB: %v", err)
|
|
}
|
|
defer db.conn.Close()
|
|
|
|
store := NewPacketStore(db, &PacketStoreConfig{
|
|
RetentionHours: 168,
|
|
HotStartupHours: 0, // disable hot window → bg loader must not run
|
|
})
|
|
|
|
if err := store.RunStartupLoad(500); err != nil {
|
|
t.Fatalf("RunStartupLoad: %v", err)
|
|
}
|
|
if !store.backgroundLoadDone.Load() {
|
|
t.Fatalf("backgroundLoadDone must be true when hotStartupHours=0 (no bg work needed)")
|
|
}
|
|
if store.backgroundLoadFailed.Load() {
|
|
t.Fatalf("backgroundLoadFailed must be false on the no-bg-work path; got error=%q",
|
|
store.BackgroundLoadError())
|
|
}
|
|
}
|
|
|
|
// TestRunStartupLoad_LoadChunkedError_SetsFailedTerminal covers B2:
|
|
// when LoadChunked errors, the steady state must be terminal — both
|
|
// done=true (LoadChunked contract: done is the primary observable
|
|
// signal, see store.go contract block) AND failed=true. The pre-#1811
|
|
// implementation left done=false, leaving healthz wedged on the
|
|
// failure path (dij #1 / adv #7).
|
|
func TestRunStartupLoad_LoadChunkedError_SetsFailedTerminal(t *testing.T) {
|
|
dir := t.TempDir()
|
|
dbPath := filepath.Join(dir, "test.db")
|
|
nowSec := time.Now().UTC().Unix()
|
|
createTestDBWithLastSeen(t, dbPath, 5, 1, nowSec,
|
|
30*time.Minute, 30*time.Minute)
|
|
|
|
db, err := OpenDB(dbPath)
|
|
if err != nil {
|
|
t.Fatalf("OpenDB: %v", err)
|
|
}
|
|
// Close the underlying connection to force LoadChunked to fail on
|
|
// its very first query. We're explicitly verifying the failure path
|
|
// terminal state, not the success path.
|
|
_ = db.conn.Close()
|
|
|
|
store := NewPacketStore(db, &PacketStoreConfig{
|
|
RetentionHours: 168,
|
|
HotStartupHours: 1,
|
|
})
|
|
|
|
loadErr := store.RunStartupLoad(500)
|
|
if loadErr == nil {
|
|
t.Fatalf("RunStartupLoad must return an error when LoadChunked fails")
|
|
}
|
|
if !store.backgroundLoadFailed.Load() {
|
|
t.Fatalf("backgroundLoadFailed must be true after LoadChunked error (terminal state)")
|
|
}
|
|
if !store.backgroundLoadDone.Load() {
|
|
t.Fatalf("backgroundLoadDone must be true on LoadChunked error (LoadChunked contract: done observable, qualified by failed)")
|
|
}
|
|
if store.BackgroundLoadError() == "" {
|
|
t.Fatalf("BackgroundLoadError must be non-empty after LoadChunked failure")
|
|
}
|
|
}
|
|
|
|
// TestRunStartupLoad_EmptyDB_SetsDoneTerminal covers B4: empty DB with
|
|
// hot window > 0 — oldestLoaded stays "" because there are no packets.
|
|
// loadBackgroundChunks must reach its coverage block (totalInDB==0 →
|
|
// ratio=1.0) and set done=true rather than leaving the store stuck.
|
|
func TestRunStartupLoad_EmptyDB_SetsDoneTerminal(t *testing.T) {
|
|
dir := t.TempDir()
|
|
dbPath := filepath.Join(dir, "test.db")
|
|
createTestDBWithLastSeen(t, dbPath, 0, 0, time.Now().UTC().Unix(),
|
|
30*time.Minute, 30*time.Minute)
|
|
|
|
db, err := OpenDB(dbPath)
|
|
if err != nil {
|
|
t.Fatalf("OpenDB: %v", err)
|
|
}
|
|
defer db.conn.Close()
|
|
|
|
store := NewPacketStore(db, &PacketStoreConfig{
|
|
RetentionHours: 168,
|
|
HotStartupHours: 1,
|
|
})
|
|
|
|
if err := store.RunStartupLoad(500); err != nil {
|
|
t.Fatalf("RunStartupLoad on empty DB: %v", err)
|
|
}
|
|
if !store.backgroundLoadDone.Load() {
|
|
t.Fatalf("backgroundLoadDone must be true after empty-DB load (nothing to load == complete)")
|
|
}
|
|
if store.backgroundLoadFailed.Load() {
|
|
t.Fatalf("backgroundLoadFailed must be false on empty DB; got %q",
|
|
store.BackgroundLoadError())
|
|
}
|
|
}
|
|
|
|
// TestRunStartupLoad_BgLoaderRunsAfterLoadChunkedSets_OldestLoaded
|
|
// covers B5/B6 (dij #3, adv #3): assert the in-process call ordering
|
|
// inside RunStartupLoad — bg loader entry MUST happen-after the final
|
|
// OnChunkLoaded callback (which is the last write to s.oldestLoaded
|
|
// inside LoadChunked).
|
|
//
|
|
// The previous version only asserted that OnChunkLoaded fired and
|
|
// that oldestLoaded was non-empty after RunStartupLoad returned —
|
|
// which proves existence but not ordering (a future refactor could
|
|
// re-introduce the parallel-spawn race and the test would still
|
|
// pass because both signals are observable post-hoc).
|
|
//
|
|
// This version captures timestamps from each side (chunk callback,
|
|
// bg-loader entry) via a test-only hook and asserts
|
|
// tBgEntry >= tLastChunk monotonically. The runtime invariant in
|
|
// loadBackgroundChunks remains as the deterministic backstop.
|
|
func TestRunStartupLoad_BgLoaderRunsAfterLoadChunkedSets_OldestLoaded(t *testing.T) {
|
|
dir := t.TempDir()
|
|
dbPath := filepath.Join(dir, "test.db")
|
|
nowSec := time.Now().UTC().Unix()
|
|
createTestDBWithLastSeen(t, dbPath, 50, 1, nowSec,
|
|
30*time.Minute, 30*time.Minute)
|
|
|
|
db, err := OpenDB(dbPath)
|
|
if err != nil {
|
|
t.Fatalf("OpenDB: %v", err)
|
|
}
|
|
defer db.conn.Close()
|
|
|
|
store := NewPacketStore(db, &PacketStoreConfig{
|
|
RetentionHours: 168,
|
|
HotStartupHours: 1,
|
|
})
|
|
|
|
var (
|
|
mu sync.Mutex
|
|
lastChunkAt time.Time
|
|
bgEntryAt time.Time
|
|
chunkSawOldest string
|
|
)
|
|
store.OnChunkLoaded(func(rowsThisChunk, totalRows int) {
|
|
mu.Lock()
|
|
lastChunkAt = time.Now()
|
|
// LoadChunked sets oldestLoaded under s.mu AFTER the last
|
|
// fireChunkCallbacks call returns, so this read may be empty
|
|
// on the last chunk — record whatever is visible.
|
|
store.mu.RLock()
|
|
chunkSawOldest = store.oldestLoaded
|
|
store.mu.RUnlock()
|
|
mu.Unlock()
|
|
})
|
|
store.bgLoaderEntryHook = func() {
|
|
mu.Lock()
|
|
bgEntryAt = time.Now()
|
|
mu.Unlock()
|
|
}
|
|
|
|
if err := store.RunStartupLoad(500); err != nil {
|
|
t.Fatalf("RunStartupLoad: %v", err)
|
|
}
|
|
|
|
mu.Lock()
|
|
defer mu.Unlock()
|
|
if lastChunkAt.IsZero() {
|
|
t.Fatalf("OnChunkLoaded never fired — LoadChunked did not process any chunks")
|
|
}
|
|
if bgEntryAt.IsZero() {
|
|
t.Fatalf("bgLoaderEntryHook never fired — loadBackgroundChunks did not enter")
|
|
}
|
|
if !bgEntryAt.After(lastChunkAt) && !bgEntryAt.Equal(lastChunkAt) {
|
|
t.Fatalf("ordering violation: bg loader entered at %s BEFORE last chunk callback at %s — #1809 race re-introduced",
|
|
bgEntryAt.Format(time.RFC3339Nano), lastChunkAt.Format(time.RFC3339Nano))
|
|
}
|
|
if store.oldestLoaded == "" {
|
|
t.Fatalf("oldestLoaded is empty after RunStartupLoad — bg loader would have read \"\" and bailed")
|
|
}
|
|
_ = chunkSawOldest // diagnostic only; kept to make race future-debug easier
|
|
}
|
|
|
|
// TestRunStartupLoad_LoadChunkedError_MidLoad covers dij #6 / adv #5:
|
|
// the existing TestRunStartupLoad_LoadChunkedError_SetsFailedTerminal
|
|
// closes the conn BEFORE LoadChunked starts, so the error surfaces on
|
|
// chunk 0's first query. That's a degenerate path — it does not prove
|
|
// mid-load failure behavior. This test fires OnChunkLoaded once,
|
|
// closes the DB conn from the callback, and asserts the chunk-N query
|
|
// fails and the terminal state is still failed=true,done=true with a
|
|
// non-empty error.
|
|
func TestRunStartupLoad_LoadChunkedError_MidLoad(t *testing.T) {
|
|
dir := t.TempDir()
|
|
dbPath := filepath.Join(dir, "test.db")
|
|
nowSec := time.Now().UTC().Unix()
|
|
// Seed enough rows so a small chunk size yields >1 chunk.
|
|
createTestDBWithLastSeen(t, dbPath, 200, 1, nowSec,
|
|
30*time.Minute, 30*time.Minute)
|
|
|
|
db, err := OpenDB(dbPath)
|
|
if err != nil {
|
|
t.Fatalf("OpenDB: %v", err)
|
|
}
|
|
|
|
store := NewPacketStore(db, &PacketStoreConfig{
|
|
RetentionHours: 168,
|
|
HotStartupHours: 1,
|
|
})
|
|
|
|
// Close the underlying conn from inside the chunk callback so the
|
|
// NEXT chunk's query fails mid-load instead of on the first query.
|
|
var closed atomic.Bool
|
|
store.OnChunkLoaded(func(rowsThisChunk, totalRows int) {
|
|
if closed.CompareAndSwap(false, true) {
|
|
_ = db.conn.Close()
|
|
}
|
|
})
|
|
|
|
loadErr := store.RunStartupLoad(50)
|
|
if loadErr == nil {
|
|
t.Fatalf("RunStartupLoad must error when conn closes mid-load")
|
|
}
|
|
if !store.backgroundLoadFailed.Load() {
|
|
t.Fatalf("backgroundLoadFailed must be true after mid-load error")
|
|
}
|
|
if !store.backgroundLoadDone.Load() {
|
|
t.Fatalf("backgroundLoadDone must be true after mid-load error (terminal contract)")
|
|
}
|
|
if !strings.Contains(store.BackgroundLoadError(), "LoadChunked failed") {
|
|
t.Fatalf("BackgroundLoadError must mention LoadChunked failure; got %q", store.BackgroundLoadError())
|
|
}
|
|
}
|
|
|
|
// TestLoadBackgroundChunks_PanicsOnOldestLoadedEmpty_Invariant covers the
|
|
// runtime assertion (A7). Manually populate s.packets without setting
|
|
// oldestLoaded and call loadBackgroundChunks directly — the panic guard
|
|
// must fire so future refactors cannot silently re-introduce the
|
|
// #1809 race.
|
|
func TestLoadBackgroundChunks_PanicsOnOldestLoadedEmpty_Invariant(t *testing.T) {
|
|
dir := t.TempDir()
|
|
dbPath := filepath.Join(dir, "test.db")
|
|
// Reuse the existing schema-only fixture helper (0 rows) so this
|
|
// test does not introduce a new inline CREATE TABLE block (pr-preflight
|
|
// async-migration gate). The fixture provides exactly the bare schema
|
|
// loadBackgroundChunks needs to reach its panic guard.
|
|
createTestDBWithLastSeen(t, dbPath, 0, 0, time.Now().UTC().Unix(),
|
|
30*time.Minute, 30*time.Minute)
|
|
|
|
db, err := OpenDB(dbPath)
|
|
if err != nil {
|
|
t.Fatalf("OpenDB: %v", err)
|
|
}
|
|
defer db.conn.Close()
|
|
|
|
store := NewPacketStore(db, &PacketStoreConfig{
|
|
RetentionHours: 168,
|
|
HotStartupHours: 1,
|
|
})
|
|
// Simulate the #1809 race: packets present, oldestLoaded never set.
|
|
store.mu.Lock()
|
|
store.packets = append(store.packets, &StoreTx{ID: 1, Hash: "deadbeef", FirstSeen: "2025-01-01T00:00:00Z"})
|
|
store.oldestLoaded = ""
|
|
store.mu.Unlock()
|
|
|
|
// Override invariantViolation so the test can recover() and
|
|
// assert the message without log.Fatalf killing the runner
|
|
// (adv #6: prod uses log.Fatalf for a clean shutdown). Restore on
|
|
// exit so other tests are unaffected.
|
|
prev := invariantViolation
|
|
defer func() { invariantViolation = prev }()
|
|
invariantViolation = func(msg string) { panic(msg) }
|
|
|
|
defer func() {
|
|
r := recover()
|
|
if r == nil {
|
|
t.Fatalf("loadBackgroundChunks must trip invariantViolation when oldestLoaded=\"\" with packets in store (#1809 invariant)")
|
|
}
|
|
msg := fmt.Sprintf("%v", r)
|
|
// adv #4: tighten the tripwire — assert the specific invariant
|
|
// message contains the issue tag so a future refactor that
|
|
// reuses the panic path for a different bug doesn't silently
|
|
// satisfy this test.
|
|
if !strings.Contains(msg, "#1809") {
|
|
t.Fatalf("invariant message must reference #1809 to gate future regressions; got %q", msg)
|
|
}
|
|
if !strings.Contains(msg, "oldestLoaded") {
|
|
t.Fatalf("invariant message must mention oldestLoaded; got %q", msg)
|
|
}
|
|
}()
|
|
store.loadBackgroundChunks()
|
|
}
|