mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-06-28 01:01:52 +00:00
9465949e79
## Summary Closes #1558. The background-backfill path (`loadChunk`) silently dropped the resolved-path indexing branch that `Load` performs per observation. Same SQL rows, two different post-conditions — a contract violation between the hot-startup load and the background chunk load. ## Root cause (the differential matters) The reporter's hypothesis — `indexByNode` not invoked on background-loaded transmissions — was 90% right but pointed at the wrong line. - `cmd/server/store.go:1116` already calls `s.indexByNode(tx)` inside the loadChunk per-batch merge lock for every backfilled tx. Decoded `pubKey` / `destPubKey` / `srcPubKey` ARE indexed. - `indexByNode` (store.go:1313 pre-patch) only reads three fields from `decoded_json`. It does NOT and cannot touch `resolved_path`. - `Load` (store.go:783-799) per-observation unmarshals `o.resolved_path`, extracts every relay-hop pubkey, and feeds them through `addToByNode` + `addResolvedPubkeysToPathHopIndex` + `addToResolvedPubkeyIndex`. - `loadChunk` (store.go:937-1023 pre-patch) selects `o.resolved_path` into `resolvedPathStr`… then never touches it. Result: after a container restart, every transmission older than `hotStartupHours` ends up present in `s.packets` / `s.byHash` / `s.byTxID` but missing from `s.byNode[relayPK]` for every relay pubkey. Home-page per-node `packetsToday` / `totalTransmissions` / `observers` / `avgHops` / `avgSnr` collapse for relay-heavy nodes (753 → 8 in the reporter's trace). Stats only self-heal as live ingest re-populates `byNode` through the ingest path (which DID call the full sequence inline). ## Fix shape 1. **Extract a shared `(s *PacketStore) indexResolvedPathHops(tx, pks, hopsSeen)` helper.** Owns the `addToByNode` + `addResolvedPubkeysToPathHopIndex` + `addToResolvedPubkeyIndex` sequence. Single point of truth so the "feed decode-window consumers for resolved-path pubkeys" invariant is structural, not duplicated. 2. **Re-point `Load` and both ingest sites at the helper.** Load's semantic behaviour is byte-identical with the prior inline block. 3. **Add the missing call in `loadChunk`.** Per AGENTS.md performance rule #0 ("no expensive work under locks"), unmarshal `resolved_path` and dedupe relay pubkeys per txID **outside** the merge critical section (`localResolvedPKsByTx`), then feed the pre-built slice through `indexResolvedPathHops` inside the existing per-batch lock alongside `indexByNode`. Mirrors `loadChunk`'s "build local, merge under lock" shape. ## TDD: red → green commits ```892424e6test(#1558): RED — loadChunk drops resolved_path relay-pubkey indexingc6768dcafix(#1558): mirror Load's resolved_path indexing into loadChunk via shared helper ``` The RED commit adds `TestLoadChunk_IndexesResolvedPathPubkeys_Issue1558` to `cmd/server/loadchunk_resolved_path_1558_test.go`. It loads a fixture DB containing 3 transmissions each with an observation whose `resolved_path` lists two distinct relay pubkeys, calls `Load()` with `HotStartupHours: 1` to confirm the rows are NOT picked up by the hot path, then calls `loadChunk` directly over the 48h-old window and asserts `s.byNode[relayPK]` contains 3 transmissions. ``` === RUN TestLoadChunk_IndexesResolvedPathPubkeys_Issue1558 (RED, pre-fix) loadchunk_resolved_path_1558_test.go:154: byNode[1111…]: got 0 transmissions, want 3 — loadChunk dropped the resolved_path indexing branch (issue #1558) loadchunk_resolved_path_1558_test.go:154: byNode[2222…]: got 0 transmissions, want 3 — loadChunk dropped the resolved_path indexing branch (issue #1558) --- FAIL: TestLoadChunk_IndexesResolvedPathPubkeys_Issue1558 (0.01s) === RUN TestLoadChunk_IndexesResolvedPathPubkeys_Issue1558 (GREEN, post-fix) --- PASS: TestLoadChunk_IndexesResolvedPathPubkeys_Issue1558 (0.01s) ``` Full `go test ./...` from `cmd/server`: PASS (45.3s). ## Files changed - `cmd/server/store.go` — helper + loadChunk fix + 3 call-site refactors - `cmd/server/loadchunk_resolved_path_1558_test.go` — regression test + fixture ## Performance / lock-scope The merge critical section now also calls `indexResolvedPathHops`, which is three map-append loops over the pre-deduplicated pubkey slice for this tx. JSON unmarshal happens once per observation **outside** any lock, in the same row loop as the existing scan work. No new allocations under lock beyond what `addToByNode` etc already do per relay pubkey. Matches the shape of the existing `indexByNode(tx)` call already in this critical section. ## Out of scope `/api/stats backfilling=true` sticky flag (mentioned in the reporter's writeup) is tracked separately at #1546. ## Preflight overrides - check-async-migrations: justified — flagged lines are SQLite DDL in the in-memory test fixture `createTestDBWithResolvedPath` (test-only DB created via `sql.Open(":memory:"-like temp path)`, not a production migration). Mirrors the identical pattern in `cmd/server/bounded_load_test.go:163-167` which the gate also flags as a false positive. No production schema is touched in this PR. --------- Co-authored-by: corescope-bot <bot@corescope.local>