Files
meshcore-analyzer/cmd
Kpa-clawbot eeddf46bc9 fix(ingestor): neighbor-builder delta scan + watermark — recovers 97% packet loss from #1289 (fixes #1339) (#1341)
## Summary
PR #1289 moved neighbor-graph construction into the ingestor with a 60s
ticker. `buildAndPersistNeighborEdges` then issued an **unbounded**
`SELECT … FROM observations o JOIN transmissions t …` every tick. On
staging (3.7M observations) one tick took ~2 minutes; with
`max_open_conns=1`, the SQLite single-writer was held continuously and
MQTT ingest collapsed (~6,500 tx/day → ~180 tx/day, 97% loss).

## Fix
Watermark-bounded delta scan. Each call derives the watermark from
`MAX(neighbor_edges.last_seen)` and restricts the SELECT to `WHERE
o.timestamp > ? ORDER BY o.timestamp LIMIT 50000`. `neighbor_edges`
itself is the persistence — no new metadata table, no in-memory state,
restarts resume cleanly from whatever the table reflects.

- Empty edges table → watermark 0 → full warm-up scan (preserves #1289's
synchronous warm-up intent).
- Warm-up loops the builder until a call returns fewer than the batch
cap, so the first server snapshot load sees a fully-populated table even
on fresh DBs.
- 50k batch cap stops any single tick from monopolising the writer; a
backlog drains over successive ticks.
- Per-tick wallclock is logged (`tick: N edges in DUR`); a tick >5s is
logged loudly as a possible regression of #1339. Broader instrumentation
is tracked in #1340.
- Output schema unchanged — server's `neighbor_recomputer.go` is
unaffected.

## Trade-off
An anomalously-old observation that arrives after its timestamp has been
crossed by the watermark will be skipped. Acceptable for an approximate
neighbor graph; a periodic full-rebuild can land later if needed.

## TDD
- **RED** (`d88e2522`): `TestNeighborEdgesBuilderDeltaScan` seeds 100k
observations, asserts an empty-delta tick is a no-op (<1s), and a
100-row delta is upserted in <500ms with no rescan of baseline rows.
Baseline builder fails the empty-delta assertion (sees all 200k baseline
edges).
- **GREEN** (`cf6fbb4e`): watermark + LIMIT — all assertions pass.
- **Mutation**: revert the `WHERE o.timestamp > ?` clause → the test
hangs to lock-contention timeout, confirming the WHERE actually gates
the behavior.

## Benchmark (synthetic, 100k observations, local sqlite)
| | Scan duration |
|---|---|
| Baseline builder, full scan every tick | ~40s |
| Patched builder, empty-delta tick | <50ms |
| Patched builder, 100-row delta | <50ms |

Staging projection: 2–3 min ticks → <1s ticks; SQLite writer freed for
MQTT ingest.

Fixes #1339

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-23 20:54:16 -07:00
..