mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-07-03 17:51:40 +00:00
fa15ab0a30
Partial fix for #1809. Red commit:c9c782b5(CI runs on PR open; standalone red-branch CI not configured — local repro proves the gate, see below). ## Problem Issue #1809: at startup the background fill loader logged `background load FAILED` within seconds and `backgroundLoadFailed=true` was set, leaving the coverage gate tripped even though LoadChunked itself completed normally. ## Root cause `main.go:225-245` spawned `go store.loadBackgroundChunks()` as soon as `FirstChunkReady` fired (chunk #1 = 10000 tx). But `s.oldestLoaded` is only assigned at the end of `LoadChunked` (`chunked_load.go:329-333`), ~tens of seconds later. The bg loader read `oldestLoaded==""` at `store.go:1462-1466`, broke out immediately, walked zero chunks, and the coverage gate at `store.go:1543-1554` flipped `backgroundLoadFailed=true`. ## Fix (initial) Introduce `PacketStore.RunStartupLoad(chunkSize)` (`chunked_load.go`). It runs `LoadChunked` first; only on success and only when `hotStartupHours > 0` does it call `loadBackgroundChunks`. `main.go` invokes `RunStartupLoad` in the same goroutine pattern as before, so `FirstChunkReady` still unblocks the HTTP listener bind at chunk #1 — only the bg loader is gated. ## Round-1 followups (this push) Reviewer-driven hardening on top of the initial fix: ### Production behavior (commitdb5592f6) - **Steady-state semantics tightened.** `RunStartupLoad` now picks a terminal state on every branch: - LoadChunked error → `backgroundLoadFailed=true` with captured error (was: `done=false, failed=false` indefinite). - `hotStartupHours == 0` → `backgroundLoadDone=true` immediately, `progress=100` (was: `done=false` forever → healthz stuck on `backgroundLoadComplete=false`). - Successful hot-window path → terminal state is whatever `loadBackgroundChunks` sets (#1690 semantics, unchanged). - **Runtime invariant assertion (A7).** `loadBackgroundChunks` panics when `oldestLoaded==""` and packets exist — a future refactor that re-introduces the parallel-spawn race fails loudly instead of silently shipping the same coverage regression. - **`RunStartupLoad` cleanup.** Inlined the superfluous goroutine + channel that wrapped `LoadChunked` (direct call is equivalent). - **Logging.** Added an INFO line between `LoadChunked` completion and bg-loader start (the #1809 post-mortem needed exactly this signal). Fixed the lying `"background load will start"` log that fired even on the `hotStartupHours==0` branch. - **Immutability documented.** `hotStartupHours` is now explicitly documented as immutable post-construction, so the lock-free reads in `LoadChunked` / `RunStartupLoad` / `loadBackgroundChunks` are sound. ### Test coverage (commitsdb5592f6+e9e12acf) - **Tautology fix (B1, commite9e12acf).** The original `Test1809_StartupLoad_BgLoaderSeesOldestLoaded` fixture seeded all 100 rows inside the 1h hot window, so `LoadChunked` alone produced coverage=1.0 — the test passed even if `loadBackgroundChunks` was a no-op. Rewrote the fixture to spread 100 rows over 14 days with `hotStartupHours=24`, so only ~7 rows are hot and the remaining ~93 MUST be loaded by the bg loader for the assertions to hold. Original red-commit assertions kept intact; added `len(packets) > hot-only cap` and `oldestLoaded < hot-cutoff - 12h` assertions on top. - **New tests (commitdb5592f6, `runstartup_load_test.go`)** codify the new contracts: - `TestRunStartupLoad_HotStartupHoursZero_SetsDoneImmediately` - `TestRunStartupLoad_LoadChunkedError_SetsFailedTerminal` - `TestRunStartupLoad_EmptyDB_SetsDoneTerminal` - `TestRunStartupLoad_BgLoaderRunsAfterLoadChunkedSets_OldestLoaded` - `TestLoadBackgroundChunks_PanicsOnOldestLoadedEmpty_Invariant` ### Docs (commit70fa16f7) - Package-level doc in `chunked_load.go` now documents `RunStartupLoad` as the orchestrator entry point alongside `LoadChunked` / `loadStatusMiddleware` / `OnChunkLoaded`. ### Preflight (commiteec1b48c) - Test-fixture DDL annotated with `// PREFLIGHT: async=true reason="unit-test fixture"` so the async-migration gate distinguishes ephemeral test schema from prod migration paths. ## What's still NOT fixed (left intentionally open) This PR addresses the startup race specifically. Issue #1809 will be closed by the operator after observing healthy startup logs (`background load complete: ... coverage=100.0%`) in prod for a full restart cycle. Do not auto-close — leave open until the operator verifies. ## Test Local repro (red branch state, before green commit): `go test ./cmd/server/ -run Test1809_StartupLoad_BgLoaderSeesOldestLoaded` → FAIL on assertion `backgroundLoadFailed=true ... oldest="2026-06-30T18:56:09Z"`. After green commit + round-1 followups: PASS. Full `cmd/server/...` suite: ok in ~70s. ## Risk Low — startup path only. New behavior gates surfaced by the new tests; coverage-gate semantics unchanged. Runtime panic is a new failure mode but only fires on a state (`oldestLoaded=="" && len(packets)>0`) that is unreachable on the current code path — it exists solely as a refactor tripwire. --------- Co-authored-by: mc-bot <bot@corescope.local> Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: meshcore-bot <bot@meshcore>
171 lines
5.8 KiB
Go
171 lines
5.8 KiB
Go
package main
|
|
|
|
// Tests for issue #1690 — cold-load uses wrong time axis (first_seen instead
|
|
// of effective recency). Three tests live in this file:
|
|
//
|
|
// Test1690_ColdLoad_TimeAxis — long-lived transmissions (first_seen 30d
|
|
// ago) with recent observations must load
|
|
// under a 1h hotStartupHours window.
|
|
// Test1690_BackgroundLoadHonesty — backgroundLoadComplete must NOT flip to
|
|
// true when coverage is below threshold.
|
|
// Test1690_PerfStats_NewFields — typed perf response must expose
|
|
// retentionHours, oldestLoaded,
|
|
// loadCoverageRatio.
|
|
|
|
import (
|
|
"encoding/json"
|
|
"path/filepath"
|
|
"strings"
|
|
"testing"
|
|
"time"
|
|
|
|
_ "modernc.org/sqlite"
|
|
)
|
|
|
|
// createTestDBWithLastSeen seeds a DB with the post-fix schema (last_seen
|
|
// column on transmissions). nowSec is the unix-second reference; fixture
|
|
// rows are placed relative to it.
|
|
//
|
|
// numTx transmissions, each with first_seen = nowSec - firstSeenAgo, and
|
|
// last_seen = nowSec - lastSeenAgo. Each tx has obsPerTx observations whose
|
|
// timestamps are within the last 20 minutes.
|
|
//
|
|
// Thin wrapper over seedTestDBRows (PR #1811 adv #2 — DRY). The schema
|
|
// DDL and the block-level PREFLIGHT async-migration annotation live
|
|
// there; this helper only supplies the row-time policy used by the
|
|
// #1690 fixture (fixed first_seen, fixed last_seen).
|
|
func createTestDBWithLastSeen(t *testing.T, dbPath string, numTx, obsPerTx int, nowSec int64, firstSeenAgo, lastSeenAgo time.Duration) {
|
|
t.Helper()
|
|
firstSeenTime := time.Unix(nowSec, 0).UTC().Add(-firstSeenAgo).Format(time.RFC3339)
|
|
lastSeenUnix := nowSec - int64(lastSeenAgo.Seconds())
|
|
seedTestDBRows(t, dbPath, numTx, obsPerTx, func(i int) (string, int64) {
|
|
return firstSeenTime, lastSeenUnix
|
|
})
|
|
}
|
|
|
|
// Test1690_ColdLoad_TimeAxis seeds 1000 transmissions whose hash *first
|
|
// appeared* 30 days ago but whose last observation was 30 minutes ago.
|
|
// With a 1h hotStartupHours, the pre-fix code (filtering on first_seen)
|
|
// loads zero rows; the post-fix code (filtering on last_seen) must load
|
|
// all 1000.
|
|
func Test1690_ColdLoad_TimeAxis(t *testing.T) {
|
|
dir := t.TempDir()
|
|
dbPath := filepath.Join(dir, "test.db")
|
|
|
|
nowSec := time.Now().UTC().Unix()
|
|
createTestDBWithLastSeen(t, dbPath, 1000, 1, nowSec,
|
|
30*24*time.Hour, // first_seen = 30d ago
|
|
30*time.Minute) // last_seen = 30min ago
|
|
|
|
db, err := OpenDB(dbPath)
|
|
if err != nil {
|
|
t.Fatalf("OpenDB: %v", err)
|
|
}
|
|
defer db.conn.Close()
|
|
|
|
store := NewPacketStore(db, &PacketStoreConfig{
|
|
RetentionHours: 168,
|
|
HotStartupHours: 1,
|
|
})
|
|
|
|
if err := store.LoadChunked(0); err != nil {
|
|
t.Fatalf("LoadChunked: %v", err)
|
|
}
|
|
|
|
loaded := len(store.packets)
|
|
if loaded < 1000 {
|
|
t.Fatalf("Test1690_ColdLoad_TimeAxis: expected ≥1000 transmissions loaded "+
|
|
"(all 1000 fixture rows have last_seen within 1h), got %d. "+
|
|
"Pre-fix behavior: chunked_load.go filters t.first_seen >= now-1h "+
|
|
"which excludes all 30d-old rows.", loaded)
|
|
}
|
|
}
|
|
|
|
// Test1690_BackgroundLoadHonesty seeds 1000 transmissions but caps the
|
|
// store's memory budget so it can only fit a fraction. After
|
|
// loadBackgroundChunks runs, backgroundLoadDone must be FALSE and
|
|
// backgroundLoadFailed must be TRUE because actual coverage is < 90%.
|
|
func Test1690_BackgroundLoadHonesty(t *testing.T) {
|
|
dir := t.TempDir()
|
|
dbPath := filepath.Join(dir, "test.db")
|
|
|
|
nowSec := time.Now().UTC().Unix()
|
|
// 5000 rows; chunkSize=500 + maxMemoryMB=1 (→ maxPackets ≈ 1000) so
|
|
// the load breaks at the end of the chunk that crosses the cap and
|
|
// totalLoaded ≪ 5000.
|
|
createTestDBWithLastSeen(t, dbPath, 5000, 1, nowSec,
|
|
30*time.Minute, 30*time.Minute)
|
|
|
|
db, err := OpenDB(dbPath)
|
|
if err != nil {
|
|
t.Fatalf("OpenDB: %v", err)
|
|
}
|
|
defer db.conn.Close()
|
|
|
|
store := NewPacketStore(db, &PacketStoreConfig{
|
|
RetentionHours: 168,
|
|
HotStartupHours: 1,
|
|
MaxMemoryMB: 1, // forces bounded load ≪ 5000 rows
|
|
})
|
|
if err := store.LoadChunked(500); err != nil {
|
|
t.Fatalf("LoadChunked: %v", err)
|
|
}
|
|
store.loadBackgroundChunks()
|
|
|
|
if store.backgroundLoadDone.Load() {
|
|
t.Errorf("backgroundLoadDone=true with only %d/5000 packets loaded; "+
|
|
"must be false until coverage ≥ 90%%", len(store.packets))
|
|
}
|
|
if !store.backgroundLoadFailed.Load() {
|
|
t.Errorf("backgroundLoadFailed=false despite under-coverage "+
|
|
"(%d/5000 packets loaded); must be true with a reason", len(store.packets))
|
|
}
|
|
// The error message must mention a percentage so operators can see
|
|
// the actual ratio surface in the perf endpoint.
|
|
errMsg := store.BackgroundLoadError()
|
|
if !strings.Contains(errMsg, "%") {
|
|
t.Errorf("backgroundLoadError=%q; expected human-readable ratio "+
|
|
"(e.g. 'loaded X%% of Y rows')", errMsg)
|
|
}
|
|
}
|
|
|
|
// Test1690_PerfStats_NewFields asserts the typed perf payload exposes the
|
|
// retention/coverage fields needed for prod observability.
|
|
func Test1690_PerfStats_NewFields(t *testing.T) {
|
|
dir := t.TempDir()
|
|
dbPath := filepath.Join(dir, "test.db")
|
|
|
|
nowSec := time.Now().UTC().Unix()
|
|
createTestDBWithLastSeen(t, dbPath, 10, 1, nowSec,
|
|
30*time.Minute, 30*time.Minute)
|
|
|
|
db, err := OpenDB(dbPath)
|
|
if err != nil {
|
|
t.Fatalf("OpenDB: %v", err)
|
|
}
|
|
defer db.conn.Close()
|
|
|
|
store := NewPacketStore(db, &PacketStoreConfig{
|
|
RetentionHours: 168,
|
|
HotStartupHours: 1,
|
|
})
|
|
if err := store.LoadChunked(0); err != nil {
|
|
t.Fatalf("LoadChunked: %v", err)
|
|
}
|
|
|
|
ps := store.GetPerfStoreStatsTyped()
|
|
buf, err := json.Marshal(ps)
|
|
if err != nil {
|
|
t.Fatalf("marshal: %v", err)
|
|
}
|
|
var asMap map[string]interface{}
|
|
if err := json.Unmarshal(buf, &asMap); err != nil {
|
|
t.Fatalf("unmarshal: %v", err)
|
|
}
|
|
for _, key := range []string{"retentionHours", "oldestLoaded", "loadCoverageRatio"} {
|
|
if _, ok := asMap[key]; !ok {
|
|
t.Errorf("PerfPacketStoreStats missing %q field; payload=%s", key, string(buf))
|
|
}
|
|
}
|
|
}
|