mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-05-25 04:35:17 +00:00
eeddf46bc9
## Summary PR #1289 moved neighbor-graph construction into the ingestor with a 60s ticker. `buildAndPersistNeighborEdges` then issued an **unbounded** `SELECT … FROM observations o JOIN transmissions t …` every tick. On staging (3.7M observations) one tick took ~2 minutes; with `max_open_conns=1`, the SQLite single-writer was held continuously and MQTT ingest collapsed (~6,500 tx/day → ~180 tx/day, 97% loss). ## Fix Watermark-bounded delta scan. Each call derives the watermark from `MAX(neighbor_edges.last_seen)` and restricts the SELECT to `WHERE o.timestamp > ? ORDER BY o.timestamp LIMIT 50000`. `neighbor_edges` itself is the persistence — no new metadata table, no in-memory state, restarts resume cleanly from whatever the table reflects. - Empty edges table → watermark 0 → full warm-up scan (preserves #1289's synchronous warm-up intent). - Warm-up loops the builder until a call returns fewer than the batch cap, so the first server snapshot load sees a fully-populated table even on fresh DBs. - 50k batch cap stops any single tick from monopolising the writer; a backlog drains over successive ticks. - Per-tick wallclock is logged (`tick: N edges in DUR`); a tick >5s is logged loudly as a possible regression of #1339. Broader instrumentation is tracked in #1340. - Output schema unchanged — server's `neighbor_recomputer.go` is unaffected. ## Trade-off An anomalously-old observation that arrives after its timestamp has been crossed by the watermark will be skipped. Acceptable for an approximate neighbor graph; a periodic full-rebuild can land later if needed. ## TDD - **RED** (`d88e2522`): `TestNeighborEdgesBuilderDeltaScan` seeds 100k observations, asserts an empty-delta tick is a no-op (<1s), and a 100-row delta is upserted in <500ms with no rescan of baseline rows. Baseline builder fails the empty-delta assertion (sees all 200k baseline edges). - **GREEN** (`cf6fbb4e`): watermark + LIMIT — all assertions pass. - **Mutation**: revert the `WHERE o.timestamp > ?` clause → the test hangs to lock-contention timeout, confirming the WHERE actually gates the behavior. ## Benchmark (synthetic, 100k observations, local sqlite) | | Scan duration | |---|---| | Baseline builder, full scan every tick | ~40s | | Patched builder, empty-delta tick | <50ms | | Patched builder, 100-row delta | <50ms | Staging projection: 2–3 min ticks → <1s ticks; SQLite writer freed for MQTT ingest. Fixes #1339 --------- Co-authored-by: openclaw-bot <bot@openclaw.local>