meshcore-analyzer

mirror of https://github.com/Kpa-clawbot/meshcore-analyzer.git synced 2026-06-04 19:32:15 +00:00

Author	SHA1	Message	Date
efiten	317b59ab10	feat: area-based visual node filter — attribute packets by transmitter GPS (#804 ) (#839 ) ## Summary - Adds configurable GPS polygon areas to `config.json`; nodes are attributed to an area if their last-known position falls inside the polygon - New `Area: …` dropdown filter (matching the existing region filter style) appears on all analytics, nodes, packets, map, and live screens when areas are configured - Backend resolves area membership with a 30s TTL cache; area filter bypasses the 500-node cap on `/api/bulk-health` so all area nodes are always returned - Includes a polygon builder tool (`/area-map.html`) for drawing and exporting area boundaries ## Changes Backend - `AreaEntry` type + `Areas` config field - `GetNodePubkeysInArea` DB query + `resolveAreaNodes` (30s TTL, `areaNodeMu` RWMutex) - `PacketQuery.Area` + `filterPackets` polygon check - `?area=` param propagated through all analytics, topology, clock-health, and bulk-health routes - `/api/config/areas` endpoint Frontend - `area-filter.js`: single-select dropdown, persists to localStorage, cleans up stale keys on load - Wired into analytics, nodes, packets, channels, map, and live pages - Live map clears node markers on area change Docs & tools - `docs/user-guide/area-filter.md` — configuration and usage guide - `docs/api-spec.md` — updated with new endpoint and `?area=` param table - `tools/area-map.html` — polygon builder for defining area boundaries - Demo areas added to `config.example.json` ## Test plan - [x] No areas configured → filter dropdown does not appear on any page - [x] Areas configured → dropdown appears, "All" selected by default - [x] Selecting an area filters nodes/packets/topology/map correctly - [x] Selecting "All" restores unfiltered view - [x] Selection persists across page reloads (localStorage) - [x] Stale localStorage key (area removed from config) is cleared on load - [x] `/api/bulk-health?area=X` returns all nodes in area (no 500-node cap) - [x] `/api/config/areas` returns correct list 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Kpa-clawbot <kpaclawbot@outlook.com> Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-21 14:00:15 -07:00
efiten	caf3851ff8	feat(server): add opt-in HTTP gzip and WebSocket permessage-deflate compression (#934 ) ## Summary - Adds `"compression": {"gzip": true, "websocket": true}` config option (both `false` by default — no behavior change) - HTTP gzip middleware wraps the entire router; skips WebSocket upgrade requests and clients without `Accept-Encoding: gzip` - WebSocket permessage-deflate enabled via `hub.upgrader.EnableCompression` when `websocket: true` - `CompressionConfig` struct and `GZipEnabled()` / `WSCompressionEnabled()` helpers on `Config` - `Hub.upgrader` moved from package-level var to struct field so tests using `NewHub()` don't need changes ## Why opt-in / off by default Operators behind a reverse proxy that already compresses (nginx, Caddy with `encode gzip`) should leave this off to avoid double-compression. Only enable when the proxy does not compress. ## Test plan - [x] `TestCompressionConfigDefaults` — both helpers return false when `Compression` is nil - [x] `TestCompressionConfigExplicitFalse` — both helpers return false when set to false - [x] `TestCompressionConfigEnabled` — both helpers return true when set to true - [x] `TestGZipMiddlewareCompresses` — response body is valid gzip, headers set correctly - [x] `TestGZipMiddlewareSkipsNoAcceptEncoding` — passthrough when client doesn't send Accept-Encoding: gzip - [x] `TestGZipMiddlewareSkipsWebSocket` — WebSocket upgrades are never gzip-wrapped All 6 tests pass (`go test ./...` in `cmd/server`). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: OpenClaw Bot <bot@openclaw.local> Co-authored-by: efiten-bot <bot@efiten.dev>	2026-05-21 11:39:49 -07:00
efiten	51f823bf7e	feat: one-click prune nodes outside geofilter (#669 M4) (#738 ) ## Summary - Adds `POST /api/admin/prune-geo-filter` endpoint — dry-run by default, `?confirm=true` to permanently delete nodes outside the current geofilter polygon + buffer. Requires `X-API-Key` header. - Adds Prune nodes section inside the GeoFilter customizer tab (write-access only, same `writeEnabled` gate as PUT). Preview lists affected nodes; Confirm delete removes them. - Adds `GetNodesForGeoPrune` and `DeleteNodesByPubkeys` DB helpers. - Updates `docs/user-guide/geofilter.md` — documents the UI button as primary workflow, CLI script as alternative. > Depends on M3 (`feat/geofilter-m3-customizer`, PR #736). Merge M3 first. ## Test plan - [x] `cd cmd/server && go test ./...` — all pass - [x] Customizer GeoFilter tab without `apiKey` — Prune section not visible - [x] With `apiKey` + polygon active — Prune section visible - [x] Preview returns list of nodes outside polygon (no deletions) - [x] Confirm delete removes nodes, list clears - [x] `POST /api/admin/prune-geo-filter` without `X-API-Key` → 401 - [x] `POST /api/admin/prune-geo-filter` with no polygon configured → 400 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 03:19:31 +00:00
Kpa-clawbot	9383201c07	refactor(db): finish #1283 — Option 4: ingestor owns neighbor-graph + schema migrations; server is read-only (fixes #1287 ) (#1289 ) Red commit: https://github.com/Kpa-clawbot/CoreScope/commit/eae179b99b5fd34924547632aa8f8025c405aa53 (CI: pending — opens with this PR) Finishes #1283. RED test `TestServerSourceHasNoCachedRWCalls` goes from failing (13 writer call-sites) to GREEN (zero). Per #1287 Option 4 (https://github.com/Kpa-clawbot/CoreScope/issues/1287#issuecomment-4485099992): ingestor owns the neighbor graph build + persist; server reads the snapshot. Category A — Schema migrations → new `internal/dbschema` package. `dbschema.Apply(rw)` runs in `cmd/ingestor` startup (in `OpenStore`). `dbschema.AssertReady(ro)` runs in `cmd/server/main.go` and FATAL-LOG-EXITS if any expected column/index/table is missing — the operator must restart the ingestor first. Covers indexes, `neighbor_edges`, `observations.resolved_path`, `observers.{inactive,last_packet_at,iata}`, `(inactive_)nodes.foreign_advert`, `transmissions.from_pubkey`. Category B — Backfill → ingestor. `BackfillFromPubkey` and observer-blacklist soft-delete moved to `cmd/ingestor/maintenance.go`. Server keeps an inert `fromPubkeyBackfillSnapshot` stub for `/api/healthz` API compatibility. Category C — Neighbor-graph persistence (Option 4) → ingestor writes, server reads. - Ingestor (`cmd/ingestor/neighbor_builder.go`): every 60s scans `observations + transmissions`, extracts edges (originator↔first-hop for ADVERTs; observer↔last-hop for all), resolves hop prefixes via a node-table prefix index, upserts into `neighbor_edges`. - Server (`cmd/server/neighbor_recomputer.go`): every 60s re-reads `neighbor_edges` and atomic-swaps the resulting `NeighborGraph` into `s.graph`. Initial load is synchronous on startup. All server-side incremental edge writers (the two `asyncPersistResolvedPathsAndEdges` paths in `cmd/server/store.go`) are gone. - Neighbor-edge daily prune (`PruneNeighborEdges`) moved to ingestor. Why Option 4: clean read/write separation, no startup CPU spike (server loads existing snapshot instead of rebuilding from history), no IPC/delta-protocol churn. Staleness budget ~60s — same model as the analytics recomputers in #1240 / #1248 / #672 axis 2. Recomputer interval default for neighbor graph: 60s (`NeighborGraphRecomputerDefaultInterval`, `NeighborEdgesBuilderInterval`). Invariants added: - `TestServerSourceHasNoCachedRWCalls` (RED commit `eae179b9`): grep enforces zero `cachedRW(`, `mode=rw`, or `sql.Open(_journal_mode=WAL…)` in non-test `cmd/server/` sources. - `TestServerStartupRequiresMigratedSchema`: server refuses to start against an unmigrated DB. - `TestNeighborGraphRecomputerLoadsSnapshot`: post-write snapshot is picked up on the next refresh. - `TestNeighborEdgesBuilderUpsertsFromObservations`: end-to-end pipeline writes the expected edge. `grep cachedRW cmd/server/*.go \| grep -v _test.go` → 0 matches. Fixes #1287. --------- Co-authored-by: MeshCore Bot <bot@meshcore.local> Co-authored-by: Kpa-clawbot <Kpa-clawbot@users.noreply.github.com> Co-authored-by: corescope-bot <bot@corescope.local>	2026-05-19 23:53:41 -07:00
Kpa-clawbot	1da2034341	refactor(db): move all writes from server to ingestor; server truly read-only (fixes #1283 ) (#1286 ) Red commit: `f6290b63` — CI run will appear at https://github.com/Kpa-clawbot/CoreScope/actions Fixes #1283. ## What Moves all four DB write operations out of `cmd/server/` into `cmd/ingestor/`, making the server truly read-only and eliminating the SQLITE_BUSY VACUUM bug at its root: the server can no longer race the ingestor for the write lock because the server has no write path. ## The four operations \| # \| Was in \| Now in \| \|---\|--------\|--------\| \| 1 \| `cmd/server/vacuum.go` (`checkAutoVacuum`, full VACUUM + `auto_vacuum=INCREMENTAL` migration) \| `cmd/ingestor/db.go` `Store.CheckAutoVacuum` (already existed; ingestor runs it at startup before the MQTT subscriber starts → no contention) \| \| 2 \| `cmd/server/db.go` `PruneOldPackets` (`DELETE FROM transmissions`) \| `cmd/ingestor/maintenance.go` `Store.PruneOldPackets` (new) + 24h ticker in `cmd/ingestor/main.go` \| \| 3 \| `cmd/server/db.go` `PruneOldMetrics` (`DELETE FROM observer_metrics`) \| `cmd/ingestor/db.go` `Store.PruneOldMetrics` (already existed) \| \| 4 \| `cmd/server/db.go` `RemoveStaleObservers` (`UPDATE observers SET inactive=1`) \| `cmd/ingestor/db.go` `Store.RemoveStaleObservers` (already existed) \| ## HTTP surface - Removed: `POST /api/admin/prune` (`handleAdminPrune`, route, openapi entry). Operators trigger an ad-hoc prune by restarting the ingestor. - Kept: `GET /api/backup` — uses `VACUUM INTO` which writes to a separate file, not the live DB; read-only-safe. ## Tests - `cmd/server/readonly_invariant_test.go` (RED gate) — reflect-asserts `PruneOldPackets`/`PruneOldMetrics`/`RemoveStaleObservers` are NOT methods on the server's `DB`. Fails on master, passes after this PR. - `cmd/ingestor/issue1283_test.go` — exercises `Store.PruneOldPackets` and the auto_vacuum=NONE → INCREMENTAL migration through `Store.CheckAutoVacuum` with `vacuumOnStartup=true`. ## Why the bug is gone The SQLITE_BUSY VACUUM failure happened because supervisord launched both ingestor + server in one container; the ingestor took the write lock for INSERTs and the server's `checkAutoVacuum` then failed to acquire it within `busy_timeout=5000`. After this PR, only the ingestor ever opens a writable connection, and it runs `CheckAutoVacuum` before* spawning the MQTT subscriber → no contention possible. ## Scope notes - `cachedRW()` still has three pre-existing callers in `cmd/server/` (`neighbor_persist.go`, `ensure_indexes.go`, `from_pubkey_migration.go`). These pre-date #1283 and are not in the issue's four-operation list. Leaving them for follow-up keeps this PR honest about scope; AGENTS.md documents the invariant so new write paths can't sneak in. - PII preflight reports false positives on the Go method name `requireAPIKey` in `routes.go` diff context — no real PII. - Server-side neighbor-edge prune (`PruneNeighborEdges`) intentionally left in place — out of scope of #1283. --------- Co-authored-by: MeshCore Bot <bot@meshcore.local>	2026-05-18 23:52:27 -07:00
Kpa-clawbot	8bf7709970	feat(repeater): usefulness score — bridge axis (#672 axis 2 of 4) (#1275 ) RED test commit: `fd661569` — CI will fail on this (stub returns empty map; assertions fail by design). GREEN: `bf4b8592`. ## What Implements axis 2 of 4 for the repeater usefulness score per #672 ([status comment](https://github.com/Kpa-clawbot/CoreScope/issues/672#issuecomment-4484635378)). The Bridge axis measures structural importance: how many shortest paths between other nodes route through this one. A high-traffic redundant node and a low-traffic critical bridge will no longer look identical. ## Algorithm Brandes' weighted betweenness centrality with Dijkstra for shortest paths (`cmd/server/bridge_score.go`). - Nodes: pubkeys in the `neighbor_edges` graph - Edge weight: `Score(now) * Confidence()` — per the convention from #1235 (count + recency decay scaled by observer-diversity confidence). Geo-rejected edges already excluded at graph build time (#1230) so we don't re-filter here. - Dijkstra distance: `1 / max(epsilon, weight)` — high affinity = cheap cost. - Normalize: divide by max observed centrality so output is in `[0, 1]`. Cost: `O(V · (E + V log V))`. Staging-scale (~600 nodes / ~2 000 edges) ≈ ~4.8M ops, completes in milliseconds. ## Where it lives - `cmd/server/bridge_score.go` — pure algorithm, no locks - `cmd/server/bridge_recomputer.go` — background recomputer (mirrors #1240/#1262 pattern), 5-min default interval, initial sync prewarm, snapshot stored in `s.bridgeScoreMap atomic.Pointer[map[string]float64]` - `cmd/server/routes.go` — `handleNodes` adds `node["bridge_score"]` on repeater/room rows; node-detail handler adds it on the single-node path - `public/nodes.js` — separate Bridge row in the node detail panel, alongside the existing Usefulness (Traffic) row. Distinct colour-coded bar. ## What's NOT in this PR (still pending for #672) - Coverage axis (axis 3) — unique observer-pair connectivity - Redundancy axis (axis 4) — simulated node-removal impact - Composite — once all 4 axes ship, swap the `usefulness_score` formula from "traffic-only" to the weighted composite `Refs #672` (not `Fixes` — issue stays open until all 4 axes + composite ship). ## Tests - `TestComputeBridgeScores_LineGraph` — 4-node line: middles non-zero, leaves zero, max normalized to 1.0 - `TestComputeBridgeScores_TriangleNoBridge` — clique has zero bridges - `TestComputeBridgeScores_Empty` — defensive nil-safety - `TestComputeBridgeScores_WeightSensitive` — mutation guard: revert the `1/w` inversion and this test fails - `TestBridgeScore_HandleNodesSurface` — integration: `/api/nodes` returns `bridge_score` on repeater rows; middle nodes > 0, ends == 0 --------- Co-authored-by: clawbot <bot@meshcore.local>	2026-05-18 22:51:23 -07:00
Kpa-clawbot	ae17a2be12	perf(#1262 ): /api/nodes?limit=2000 cold-miss 15.7s → <100ms — prewarm repeater enrichment cache (#1263 ) RED commit: `22ce5736066142583017cad7303fa48d9e00ccf0` — CI on red: https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1262 ## Problem After #1260 added a 15s-TTL bulk cache for repeater enrichment in `handleNodes`, `/api/nodes` (default limit) dropped to ~500ms. But `/api/nodes?limit=2000` — called by `public/live.js` at SPA startup for hop resolution — still took 15.7s cold on staging (75k tx, 600 nodes). Warm hits were ~40ms. Root cause: the bulk cache was lazily populated on the first request after TTL expiry. The rebuild ran on the request-serving goroutine. Every cold SPA load triggered the rebuild and ate 15s. ## Fix Add `StartRepeaterEnrichmentRecomputer` — a steady-state background recomputer that mirrors the `analytics_recomputer.go` pattern from #1240: - Prewarm: initial synchronous compute on Start so the first request hits a populated cache. - Steady-state: ticker refreshes the snapshot every 5min (configurable via the existing analytics recompute interval knob). - Panic-safe + idempotent Start. Wired into `main.go` right after `StartAnalyticsRecomputers`, using `cfg.GetHealthThresholds().RelayActiveHours` as the window. ## Test `TestHandleNodesLimit2000ColdMiss` — seeds 600 nodes + 150k non-advert tx with repeaters indexed under a shared 1-byte hop prefix (matches production hop-prefix collisions), starts the recomputer, then issues `/api/nodes?limit=2000` with no HTTP warmup. \| State \| Latency \| \|---\|---\| \| Before (master, on-thread rebuild) \| 3.37s \| \| After (prewarm + steady-state) \| 56ms \| \| Budget \| 2s \| Staging end-to-end: 15.7s → expected sub-100ms on the same call path. Red commit (`22ce5736066142583017cad7303fa48d9e00ccf0`) compiles with a no-op stub of the new method so the test fails on the latency assertion, not a missing symbol. Fixes #1262 --------- Co-authored-by: corescope-bot <bot@corescope.local>	2026-05-18 09:22:27 -07:00
Kpa-clawbot	356f001027	perf(#1240 ): steady-state background recompute for analytics endpoints (#1248 ) RED commit: `27630f6a` — adds latency test that fails on master (p99=225ms > 50ms budget) and a stub `StartAnalyticsRecomputers` that returns a no-op so the assertion (not a build error) gates the change. GREEN commit: `20fbbceb` — wires real background recompute infrastructure. Test passes at p99=~1µs. ## What changed Replaces the on-request "compute-then-cache" pattern for the default-shape analytics queries with a steady-state background recompute loop. Reads always hit an `atomic.Value` snapshot in <1µs regardless of compute cost or writer contention. Operator principle: serving slightly stale data quickly beats real-time data slowly. ## Endpoints converted (default 5min interval each) \| Endpoint \| Cold compute \| Recomputer interval \| \|---\|---\|---\| \| `/api/analytics/topology` \| ~5s \| 5 min \| \| `/api/analytics/rf` \| ~4s \| 5 min \| \| `/api/analytics/distance` \| ~3s \| 5 min \| \| `/api/analytics/channels` \| ~0.5s \| 5 min \| \| `/api/analytics/hash-collisions` \| ~0.5s \| 5 min \| \| `/api/analytics/hash-sizes` \| ~22ms \| 5 min \| All intervals configurable per-endpoint via `analytics.recomputeIntervalSeconds.<name>` in `config.json`; documented in `config.example.json`. Default override via `analytics.defaultIntervalSeconds`. ## Scope: default query only Only the canonical shape `(region="", window=zero)` is precomputed. Region- or window-filtered requests fall back to the legacy TTL cache + on-request compute — keeps recomputer count bounded (6, not 6×N×M). ## Latency Test `TestAnalyticsRecomputerSteadyStateLatency`: 100 concurrent readers + 4 writers churning `s.mu.Lock` on 20k distHops. - Before: p50=188ms p99=225ms (assertion failed) - After: p50=240ns p99=1.1µs (atomic load + map return) ## Shutdown integration `StartAnalyticsRecomputers` returns a stop closure invoked from `main.go`'s SIGTERM handler BEFORE `dbClose()` so any in-flight SQLite compute drains cleanly. `TestAnalyticsRecomputerShutdownNoLeak` confirms all 6 goroutines are reaped (Δ=6 within 2s). ## Safety details - Initial compute is synchronous in `Start()` — first read after startup never sees nil. - `recover()` inside `runOnce` keeps a compute panic from killing the goroutine; previous snapshot remains valid. - `analyticsRecomputerMu` is a sync.RWMutex; recomputer pointers are read-locked in the hot path. The atomic.Value swap inside `runOnce` is lock-free. Fixes #1240. --------- Co-authored-by: OpenClaw Bot <bot@openclaw.local>	2026-05-17 17:33:30 +00:00
Kpa-clawbot	b881a09f02	feat(#1188 ): show observer IATA on packets + filter grammar (#1189 ) Red commit: `4ed272761b` (CI run: https://github.com/Kpa-clawbot/CoreScope/actions/runs/25651898290) Fixes #1188 — observer IATA on packets in three UI surfaces + filter grammar. cross-stack: justified — feature spans API shape (Go), store, filter grammar (JS), three packets UI surfaces. ## Scope shipped - Packets table row: `.badge-iata` pill inline next to observer name - Expanded observation rows: per-observation IATA badge - Detail pane: Observer dd + per-observation list both render the badge - Filter grammar: `observer_iata` field + `iata` alias; `==`/`!=`/`contains`, plus a new `in (a, b, c)` list operator. Both names appear in autocomplete with descriptions. ## TDD red→green pairs 1. `271d72f` filter-grammar tests → `2c182eb` evaluator + suggest entries 2. `4ed2727` backend `observer_iata` API tests → `7856914` SQL join + struct/store wiring 3. `0e09371` display E2E → `7a3f45d` packets.js + style.css badge (E2E swapped for string-contract unit test in `ee414b4` — fixture `observations.observer_idx` stores text pubkeys, blocking the join the badge depends on) ## Backend - `cmd/server/db.go`: SELECT `obs.iata AS observer_iata` in `transmissionBaseSQL`, grouped query, observations-by-transmissions - `cmd/server/store.go`: `ObserverIATA` on `StoreTx`/`StoreObs`, load via all three ingest paths, surface in `txToMap`/`enrichObs`/`groupedTxsToPage` - `cmd/server/types.go`: field added to `TransmissionResp`/`ObservationResp`/`GroupedPacketResp` - Test fixture schemas declare `iata` on observers ## Perf Per #383, `obsIataBadge(packet)` reads `packet.observer_iata` directly (server-joined). Falls back to `observerMap.get(id).iata` only if absent — hot row-render loop avoids per-row Map lookup on fresh data. ## Display rules Missing IATA: nothing inline (Region column still shows `—`). No new hex — `.badge-iata` uses `var(--nav-bg)` / `var(--nav-text)`. E2E assertion added: test-observer-iata-1188.js:51 --------- Co-authored-by: OpenClaw Bot <bot@openclaw.dev> Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-17 16:13:11 +00:00
Kpa-clawbot	eba9e89a72	fix(#1203 ): path-inspector — singleflight + stale-while-revalidate (#1208 ) Red commit: `c84a8f575a` (CI run: pending push) Fixes #1203 — path-inspector 503 storm. Three sub-fixes, each shipped as red→green per AGENTS TDD: A. Singleflight on rebuild (`ensureNeighborGraph`) Hand-rolled `sync.Mutex + chan` singleflight — no new deps (x/sync was not in cmd/server's go.mod). Concurrent callers attach to one in-flight rebuild instead of N parallel `BuildFromStore` goroutines. - Red: `7340f23b` — test asserts ≤1 build under 10 concurrent callers (saw 10 on master) - Green: `abac6b3c` B. Stale-while-revalidate (`handlePathInspect`) Stale non-nil graph is served immediately with `"stale": true` while a background rebuild runs (deduped by A). The 2s synchronous gate is gone. Stale responses are not cached, so the next request after rebuild lands fresh. - Red: `c84a8f57` — test asserts 200+`stale:true`+rebuild-kickoff (master returned 503) - Green: `5eb86975` C. Cold-start 503 still kicks rebuild True cold start (`graph == nil`) is the only path that still returns 503 `{"retry": true}`, but it now spawns an async `ensureNeighborGraph` so the very next request warms up. - Green test: `f5ac7059` (passed on top of A+B) Singleflight verified: `TestEnsureNeighborGraph_Singleflight` Stale-while-revalidate verified: `TestHandlePathInspect_StaleWhileRevalidate` Cold-start verified: `TestHandlePathInspect_ColdStartKicksRebuild` Acceptance criteria (issue #1203): - [x] Concurrent requests share ONE rebuild - [x] Stale non-nil graph served with `stale:true` async - [x] 503 only on true cold-start - [x] Cold-start 503 kicks rebuild → follow-up warm - [ ] p99 < 500ms under load (not unit-testable; design satisfies it) - [x] No regression in existing tests Out of scope (per issue): 5-min TTL constant, `BuildFromStore` perf, `/api/analytics/topology`, persist-lock contention. No new deps. --------- Co-authored-by: corescope-bot <bot@corescope.local> Co-authored-by: corescope-bot <bot@corescope.dev>	2026-05-15 22:46:28 -07:00
efiten	11d2026bb1	feat(startup): hot startup — load hotStartupHours synchronously, fill retentionHours in background (#1187 ) Closes #1183 ## Summary - Adds `packetStore.hotStartupHours` config key (float64, default 0 = disabled). When set, `Load()` loads only that many hours of data synchronously, reducing startup time on large DBs. Background goroutine fills the remaining `retentionHours` window in daily chunks after startup completes. - A background goroutine (`loadBackgroundChunks`) fills the remaining `retentionHours` window in daily chunks after startup completes. Analytics indexes are rebuilt once at the end. - `QueryPackets` and `QueryGroupedPackets` check `oldestLoaded` and fall back to `db.QueryPackets()` for any query whose `Since`/`Until` predates the in-memory window — covering days 8–30 permanently (beyond `retentionHours`) and the background-fill gap during startup. - `/api/perf` gains `hotStartupHours`, `backgroundLoadComplete`, and `backgroundLoadProgress` fields inside `packetStore` so operators can monitor the fill. ### Drive-by fixes - E2E: added `gotoPackets` navigation helper used across packet-related tests - E2E: rewrote stripe assertion to check per-row stripe parity rather than a fragile computed-style comparison - E2E: theme test updated to use `#/home` as the initial route (was `#/`) - `db.go`: removed the RFC3339→unix-timestamp subquery path in `buildTransmissionWhere`; `t.first_seen` is now always compared directly as a string for both RFC3339 and non-RFC3339 inputs ## Configuration ```json "packetStore": { "retentionHours": 168, "hotStartupHours": 24 } ``` `hotStartupHours: 0` (default) preserves existing behavior exactly. Recommended for large DBs to reduce startup time; set to 0 to disable (loads full retentionHours at startup, legacy behavior). ## Test plan - [x] `TestHotStartupConfig_Clamp` — clamping when `hotStartupHours > retentionHours` - [x] `TestHotStartupConfig_ZeroIsDisabled` — zero leaves feature disabled - [x] `TestHotStartup_LoadsOnlyHotWindow` — only hot-window packets in memory after `Load()` - [x] `TestHotStartup_DisabledWhenZero` — all retention packets loaded when disabled - [x] `TestHotStartup_loadChunk_AddsOlderData` — chunk merges correctly, ASC order maintained - [x] `TestHotStartup_BackgroundFillsToRetention` — background goroutine fills to `retentionHours` - [x] `TestHotStartup_ChunkErrorRecovery` — chunk SQL failure logged and skipped, loop terminates - [x] `TestHotStartup_SQLFallback_TriggeredForOldDate` — query before `oldestLoaded` routes to SQL - [x] `TestHotStartup_SQLFallback_NotTriggeredForRecentDate` — recent query stays in-memory - [x] `TestHotStartup_PerfStats` — new fields present in `GetPerfStoreStats()` (backs the perf endpoint) - [x] `TestHotStartup_PerfStoreHTTP` — HTTP-level: GET /api/perf returns `hotStartupHours`, `backgroundLoadComplete`, `backgroundLoadProgress` in `packetStore` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: CoreScope Bot <bot@corescope.local>	2026-05-15 22:46:25 -07:00
Kpa-clawbot	fb744d895f	fix(#1143 ): structural pubkey attribution via from_pubkey column (#1152 ) Fixes #1143. ## Summary Replaces the structurally unsound `decoded_json LIKE '%pubkey%'` (and `OR LIKE '%name%'`) attribution path with an exact-match lookup on a dedicated, indexed `transmissions.from_pubkey` column. This closes both holes documented in #1143: - Hole 1 — same-name false positives via `OR LIKE '%name%'` - Hole 2a — adversarial spoofing: a malicious node names itself with another node's pubkey and gets attributed to the victim - Hole 2b — accidental false positive when any free-text field (path elements, channel names, message bodies) contains a 64-char hex substring matching a real pubkey - Perf — query now uses an index instead of a full-table scan against `LIKE '%substring%'` ## TDD Two-commit history shows red-then-green: \| Commit \| Status \| Purpose \| \|---\|---\|---\| \| `7f0f08e` \| RED — tests assertion-fail on master behaviour \| Adversarial fixtures + spec \| \| `59327db` \| GREEN — schema + ingestor + server + migration \| Implementation \| The red commit's test schema includes the new column so the file compiles, but the production code still uses LIKE — the assertions fail because the malicious / same-name / free-text rows are returned. The green commit changes the query plus adds the migration/ingest path. ## Changes ### Schema - new column `transmissions.from_pubkey TEXT` - new index `idx_transmissions_from_pubkey` ### Ingestor (`cmd/ingestor/`) - `PacketData.FromPubkey` populated from decoded ADVERT `pubKey` at write time. Cheap — already parsing `decoded_json`. Non-ADVERTs stay NULL. - `stmtInsertTransmission` writes the column. - Migration `from_pubkey_v1` ALTERs legacy DBs to add the column + index. - Bonus: rewrote the recipe in the gated one-shot `advert_count_unique_v1` migration to use `from_pubkey` (already marked done on existing DBs; kept correct for fresh installs). ### Server (`cmd/server/`) - `ensureFromPubkeyColumn` mirrors the ingestor migration so the server can boot against a DB the ingestor has never touched (e2e fixture, fresh installs). - `backfillFromPubkeyAsync` runs after HTTP starts. Scans `WHERE from_pubkey IS NULL AND payload_type = 4` in 5000-row chunks with a 100ms yield between chunks. Cannot block boot even on prod-sized DBs (100K+ transmissions). Queries handle NULL gracefully (return empty for that pubkey, same as today's unknown-pubkey path). - All in-scope LIKE call sites switched to exact match: \| Site \| Before \| After \| \|---\|---\|---\| \| `buildPacketWhere` (was db.go:582) \| `decoded_json LIKE '%pubkey%'` \| `from_pubkey = ?` \| \| `buildTransmissionWhere` (was db.go:626) \| `t.decoded_json LIKE '%pubkey%'` \| `t.from_pubkey = ?` \| \| `GetRecentTransmissionsForNode` (was db.go:910) \| `LIKE '%pubkey%' OR LIKE '%name%'` \| `t.from_pubkey = ?` \| \| `QueryMultiNodePackets` (was db.go:1785) \| `decoded_json LIKE '%pubkey%' OR ...` \| `t.from_pubkey IN (?, ?, ...)` \| \| `advert_count_unique_v1` (was ingestor/db.go:257) \| `decoded_json LIKE '%' \\|\\| nodes.public_key \\|\\| '%'` \| `t.from_pubkey = nodes.public_key` \| `GetRecentTransmissionsForNode` signature simplifies: the `name` parameter is gone (it was only ever used for the legacy `OR LIKE '%name%'` fallback). Sole caller in `routes.go:1243` updated. ### Tests - `cmd/server/from_pubkey_attribution_test.go` — adversarial fixtures + Hole 1/2a/2b/QueryMultiNodePackets exact-match assertions, EXPLAIN QUERY PLAN index check, migration backfill correctness. - `cmd/ingestor/from_pubkey_test.go` — write-time correctness (BuildPacketData populates FromPubkey for ADVERT only; InsertTransmission persists it; non-ADVERTs stay NULL). - Existing test schemas (server v2, server v3, coverage) get the new column plus a SQLite trigger that auto-populates `from_pubkey` from `decoded_json` on ADVERT inserts. This means existing fixtures (which only seed `decoded_json`) keep attributing correctly without per-test edits. - `seedTestData`'s ADVERTs explicitly set `from_pubkey`. ## Performance — index is used ``` $ EXPLAIN QUERY PLAN SELECT id FROM transmissions WHERE from_pubkey = ? SEARCH transmissions USING INDEX idx_transmissions_from_pubkey (from_pubkey=?) ``` Asserted in `TestFromPubkeyIndexUsed`. ## Migration approach - Sync at boot: `ALTER TABLE transmissions ADD COLUMN from_pubkey TEXT` is a metadata-only operation in SQLite — microseconds regardless of table size. `CREATE INDEX IF NOT EXISTS idx_transmissions_from_pubkey` is not metadata-only: it scans the table once. Empirically a few hundred ms on a 100K-row table; expect a few seconds on a 10M-row table (one-time cost, blocking boot during that window). Subsequent boots no-op via `IF NOT EXISTS`. If this boot delay becomes an operational concern at prod scale we can defer the `CREATE INDEX` to a goroutine — for now a few-second one-time delay is acceptable. - Async: row-level backfill of legacy NULL ADVERTs (chunked 5000 / 100ms yield). On a 100K-ADVERT prod DB, this completes in seconds in the background; HTTP is fully available throughout. - Safety: queries handle NULL gracefully — a node whose ADVERTs haven't backfilled yet returns empty, identical to today's behaviour for unknown pubkeys. No half-state regression. ## Out of scope (intentionally) The free-text `LIKE` paths the issue explicitly leaves alone (e.g. user-typed packet search) are untouched. Only the pubkey-attribution sites get the column treatment. ## Cycle-3 review fixes \| Finding \| Status \| Commit \| \|---\|---\|---\| \| M1c — async-contract test was tautological (test's own `go`, not production's) \| Fixed \| `23ace71` (red) → `a05b50c` (green) \| \| m1c — package-global atomic resets unsafe under `t.Parallel()` \| Fixed (`// DO NOT t.Parallel` comment + `Reset()` helper) \| rolled into `23ace71` / `241ec69` \| \| m2c — `/api/healthz` read 3 atomics non-atomically (torn snapshot) \| Fixed (single RWMutex-guarded snapshot + race test) \| `241ec69` \| \| n3c.m1 — vestigial OR-scaffolding in `QueryMultiNodePackets` \| Fixed (cleanup) \| `5a53ceb` \| \| n3c.m2 — verify PR body language about `ALTER` vs `CREATE INDEX` \| Verified accurate (already corrected in cycle 2) \| (no change) \| \| n3c.m3 — `json.Unmarshal` per row in backfill → could use SQL `json_extract` \| Deferred as known followup — pure perf optimization (current per-row Unmarshal is correct, just slower); SQL rewrite would unwind the chunked-yield architecture and is non-trivial. Acceptable for one-time backfill at boot on legacy DBs. \| ### M1c implementation detail `startFromPubkeyBackfill(dbPath, chunkSize, yieldDuration)` is now the single production entry point used by `main.go`. It internally does `go backfillFromPubkeyAsync(...)`. The test calls `startFromPubkeyBackfill` (no `go` prefix) and asserts the dispatch returns within 50ms — so if anyone removes the `go` keyword inside the wrapper, the test fails. Manually verified: removing the `go` keyword causes `TestBackfillFromPubkey_DoesNotBlockBoot` to fail with "backfill dispatch took ~1s (>50ms): not async — would block boot." ### m2c implementation detail `fromPubkeyBackfillTotal/Processed/Done` are now plain `int64`/`bool` package globals guarded by a single `sync.RWMutex`. `fromPubkeyBackfillSnapshot()` returns all three under one RLock. `TestHealthzFromPubkeyBackfillConsistentSnapshot` races a writer (lock-step total/processed updates with periodic done flips) against 8 readers hammering `/api/healthz`, asserting `processed<=total` and `(done => processed==total)` on every response. Verified the test catches torn reads (manually injected a 3-RLock implementation; test failed within milliseconds with "processed>total" and "done=true but processed!=total" errors). --------- Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: openclaw-bot <bot@openclaw.dev>	2026-05-06 23:50:44 -07:00
Kpa-clawbot	136e1d23c8	feat(#730 ): foreign-advert detection — flag instead of silent drop (#1084 ) ## Summary Partial fix for #730 (M1 only — M2 frontend and M3 alerting deferred). Today the ingestor silently drops ADVERTs whose GPS lies outside the configured `geo_filter` polygon. That's the wrong default for an analytics tool — operators get zero visibility into bridged or leaked meshes. This PR makes the new default flag, don't drop: foreign adverts are stored, the node row is tagged `foreign_advert=1`, and the API surfaces `"foreign": true` so dashboards / map overlays can be built on top. ## Behavior \| Mode \| What happens to an ADVERT outside `geo_filter` \| \|---\|---\| \| (default) flag \| Stored, marked `foreign_advert=1`, exposed via API \| \| drop (legacy) \| Silently dropped (preserves old behavior for ops who want it) \| ## What's done (M1 — Backend) - ingestor stores foreign adverts instead of dropping - `nodes.foreign_advert` column added (migration) - `/api/nodes` and `/api/nodes/{pk}` expose `foreign: true` field - Config: `geofilter.action: "flag"\|"drop"` (default `flag`) - Tests + config docs ## What's NOT done (deferred to M2 + M3) - M2 — Frontend: Map overlay showing foreign adverts as distinct markers, foreign-advert filter on packets/nodes pages, dedicated foreign-advert dashboard - M3 — Alerting: Time-series detection of bridging events, alert when foreign advert rate spikes, identify bridge entry-point nodes Issue #730 remains open for M2 and M3. --------- Co-authored-by: corescope-bot <bot@corescope>	2026-05-05 01:58:52 -07:00
Kpa-clawbot	d05e468598	feat(memlimit): GOMEMLIMIT support, derive from packetStore.maxMemoryMB (#836 ) (#1077 ) ## Summary Implements part 1 of #836 — `GOMEMLIMIT` support so the Go runtime self-throttles GC under cgroup memory pressure instead of getting SIGKILLed. (Parts 2 & 3 — bounded cold-load batching + README ops docs — land in follow-up PRs.) ## Behavior On startup `cmd/server/main.go` now calls `applyMemoryLimit(maxMemoryMB, envSet)`: \| Condition \| Action \| Log \| \|---\|---\|---\| \| `GOMEMLIMIT` env set \| Honor the runtime's parse, do nothing \| `[memlimit] using GOMEMLIMIT from environment (...)` \| \| env unset, `packetStore.maxMemoryMB > 0` \| `debug.SetMemoryLimit(maxMB * 1.5 MiB)` \| `[memlimit] derived from packetStore.maxMemoryMB=512 → 768 MiB (1.5x headroom)` \| \| env unset, `maxMemoryMB == 0` \| No-op \| `[memlimit] no soft memory limit set ... recommend setting one to avoid container OOM-kill` \| The 1.5x headroom covers Go's NextGC trigger at ~2× live heap (per #836 heap profile: 680 MB live → 1.38 GB NextGC). ## Tests (TDD red→green visible in commit history) - `TestApplyMemoryLimit_FromEnv` — env wins, function does not override - `TestApplyMemoryLimit_DerivedFromMaxMemoryMB` — verifies bytes computation + `debug.SetMemoryLimit` actually applied at runtime - `TestApplyMemoryLimit_None` — no env, no config → reports `"none"`, no side effect Red commit: `7de3c62` (assertion failures, builds clean) Green commit: `454516d` ## Config docs `config.example.json` `packetStore._comment_gomemlimit` documents env/derived/override behavior. ## Out of scope - Cold-load transient bounding (item 2 in #836) - README container-size table (item 3) - QA §1.1 rewrite Closes part 1 of #836. --------- Co-authored-by: corescope-bot <bot@corescope>	2026-05-05 01:33:23 -07:00
Kpa-clawbot	dd2f044f2b	fix: cache RW SQLite connection + dedup DBConfig (closes #921 ) (#982 ) Closes #921 ## Summary Follow-up to #920 (incremental auto-vacuum). Addresses both items from the adversarial review: ### 1. RW connection caching Previously, every call to `openRW(dbPath)` opened a new SQLite RW connection and closed it after use. This happened in: - `runIncrementalVacuum` (~4x/hour) - `PruneOldPackets`, `PruneOldMetrics`, `RemoveStaleObservers` - `buildAndPersistEdges`, `PruneNeighborEdges` - All neighbor persist operations Now a single `*sql.DB` handle (with `MaxOpenConns(1)`) is cached process-wide via `cachedRW(dbPath)`. The underlying connection pool manages serialization. The original `openRW()` function is retained for one-shot test usage. ### 2. DBConfig dedup `DBConfig` was defined identically in both `cmd/server/config.go` and `cmd/ingestor/config.go`. Extracted to `internal/dbconfig/` as a shared package; both binaries now use a type alias (`type DBConfig = dbconfig.DBConfig`). ## Tests added \| Test \| File \| \|------\|------\| \| `TestCachedRW_ReturnsSameHandle` \| `cmd/server/rw_cache_test.go` \| \| `TestCachedRW_100Calls_SingleConnection` \| `cmd/server/rw_cache_test.go` \| \| `TestGetIncrementalVacuumPages_Default` \| `internal/dbconfig/dbconfig_test.go` \| \| `TestGetIncrementalVacuumPages_Configured` \| `internal/dbconfig/dbconfig_test.go` \| ## Verification ``` ok github.com/corescope/server 20.069s ok github.com/corescope/ingestor 47.117s ok github.com/meshcore-analyzer/dbconfig 0.003s ``` Both binaries build cleanly. 100 sequential `cachedRW()` calls return the same handle with exactly 1 entry in the cache map. --------- Co-authored-by: you <you@example.com>	2026-05-02 20:15:30 -07:00
Kpa-clawbot	3364eed303	feat: separate "Last Status Update" from "Last Packet Observation" for observers (v3 rebase) (#969 ) Rebased version of #968 (which was itself a rebase of #905) — resolves merge conflict with #906 (clock-skew UI) that landed on master. ## Conflict resolution `public/observers.js` — master (#906) added "Clock Offset" column to observer table; #968 split "Last Seen" into "Last Status" + "Last Packet" columns. Combined both: the table now has Status \| Name \| Region \| Last Status \| Last Packet \| Packets \| Packets/Hour \| Clock Offset \| Uptime. ## What this PR adds (unchanged from #968/#905) - `last_packet_at` column in observers DB table - Separate "Last Status Update" and "Last Packet Observation" display in observers list and detail page - Server-side migration to add the column automatically - Backfill heuristic for existing data - Tests for ingestor and server ## Verification - All Go tests pass (`cmd/server`, `cmd/ingestor`) - Frontend tests pass (`test-packets.js`, `test-hash-color.js`) - Built server, hit `/api/observers` — `last_packet_at` field present in JSON - Observer table header has all 9 columns including both Last Packet and Clock Offset ## Prior PRs - #905 — original (conflicts with master) - #968 — first rebase (conflicts after #906 landed) - This PR — second rebase, resolves #906 conflict Supersedes #968. Closes #905. --------- Co-authored-by: you <you@example.com>	2026-05-02 12:03:42 -07:00
Kpa-clawbot	b3a9677c52	feat(ingestor + server): observerBlacklist config (#962 ) (#963 ) ## Summary Implements `observerBlacklist` config — mirrors the existing `nodeBlacklist` pattern for observers. Drop observers by pubkey at ingest, with defense-in-depth filtering on the server side. Closes #962 ## Changes ### Ingestor (`cmd/ingestor/`) - `config.go`: Added `ObserverBlacklist []string` field + `IsObserverBlacklisted()` method (case-insensitive, whitespace-trimmed) - `main.go`: Early return in `handleMessage` when `parts[2]` (observer ID from MQTT topic) matches blacklist — before status handling, before IATA filter. No UpsertObserver, no observations, no metrics insert. Log line: `observer <pubkey-short> blacklisted, dropping` ### Server (`cmd/server/`) - `config.go`: Same `ObserverBlacklist` field + `IsObserverBlacklisted()` with `sync.Once` cached set (same pattern as `nodeBlacklist`) - `routes.go`: Defense-in-depth filtering in `handleObservers` (skip blacklisted in list) and `handleObserverDetail` (404 for blacklisted ID) - `main.go`: Startup `softDeleteBlacklistedObservers()` marks matching rows `inactive=1` so historical data is hidden - `neighbor_persist.go`: `softDeleteBlacklistedObservers()` implementation ### Tests - `cmd/ingestor/observer_blacklist_test.go`: config method tests (case-insensitive, empty, nil) - `cmd/server/observer_blacklist_test.go`: config tests + HTTP handler tests (list excludes blacklisted, detail returns 404, no-blacklist passes all, concurrent safety) ## Config ```json { "observerBlacklist": [ "EE550DE547D7B94848A952C98F585881FCF946A128E72905E95517475F83CFB1" ] } ``` ## Verification (Rule 18 — actual server output) Before blacklist (no config): ``` Total: 31 DUBLIN in list: True ``` After blacklist (DUBLIN Observer pubkey in `observerBlacklist`): ``` [observer-blacklist] soft-deleted 1 blacklisted observer(s) Total: 30 DUBLIN in list: False ``` Detail endpoint for blacklisted observer returns 404. All existing tests pass (`go test ./...` for both server and ingestor). --------- Co-authored-by: you <you@example.com>	2026-05-01 23:11:27 -07:00
Kpa-clawbot	e1a1be1735	fix(server): add observers.inactive column at startup if missing (root cause of CI flake) (#961 ) ## The actual root cause PR #954 added `WHERE inactive IS NULL OR inactive = 0` to the server's observer queries, but the `inactive` column is only added by the ingestor migration (`cmd/ingestor/db.go:344-354`). When the server runs against a DB the ingestor never touched (e.g. the e2e fixture), the column doesn't exist: ``` $ sqlite3 test-fixtures/e2e-fixture.db "SELECT COUNT(*) FROM observers WHERE inactive IS NULL OR inactive = 0;" Error: no such column: inactive ``` The server's `db.QueryRow().Scan()` swallows that error → `totalObservers` stays 0 → `/api/observers` returns empty → map test fails with "No map markers/overlays found". This explains all the failing CI runs since #954 merged. PR #957 (freshen fixture) helped with the `nodes` time-rot but couldn't fix the missing-column problem. PR #960 (freshen observers) added the right timestamps but the column was still missing. PR #959 (data-loaded in finally) fixed a different real bug. None of those touched the actual mechanism. ## Fix Mirror the existing `ensureResolvedPathColumn` pattern: add `ensureObserverInactiveColumn` that runs at server startup, checks if the column exists via `PRAGMA table_info`, adds it with `ALTER TABLE observers ADD COLUMN inactive INTEGER DEFAULT 0` if missing. Wired into `cmd/server/main.go` immediately after `ensureResolvedPathColumn`. ## Verification End-to-end on a freshened fixture: ``` $ sqlite3 /tmp/e2e-verify.db "PRAGMA table_info(observers);" \| grep inactive (no output — column absent) $ ./cs-fixed -port 13702 -db /tmp/e2e-verify.db -public public & [store] Added inactive column to observers $ curl 'http://localhost:13702/api/observers' returned=31 # was 0 before fix ``` `go test ./...` passes (19.8s). ## Lessons I should have run `sqlite3 fixture "SELECT ... WHERE inactive ..."` directly the first time the map test failed after #954 instead of writing four "fix" PRs that didn't address the actual mechanism. Apologies for the wild goose chase. Co-authored-by: Kpa-clawbot <bot@example.invalid>	2026-05-01 19:04:23 -07:00
Kpa-clawbot	57e272494d	feat(server): /api/healthz readiness endpoint gated on store load (#955 ) (#956 ) ## Summary Fixes RCA #2 from #955: the HTTP listener and `/api/stats` go live before background goroutines (pickBestObservation, neighbor graph build) finish, causing CI readiness checks to pass prematurely. ## Changes 1. `cmd/server/healthz.go` — New `GET /api/healthz` endpoint: - Returns `503 {"ready":false,"reason":"loading"}` while background init is running - Returns `200 {"ready":true,"loadedTx":N,"loadedObs":N}` once ready 2. `cmd/server/main.go` — Added `sync.WaitGroup` tracking pickBestObservation and neighbor graph build goroutines. A coordinator goroutine sets `readiness.Store(1)` when all complete. `backfillResolvedPathsAsync` is NOT gated (async by design, can take 20+ min). 3. `cmd/server/routes.go` — Wired `/api/healthz` before system endpoints. 4. `.github/workflows/deploy.yml` — CI wait-for-ready loop now polls `/api/healthz` instead of `/api/stats`. 5. `cmd/server/healthz_test.go` — Tests for 503-before-ready, 200-after-ready, JSON shape, and anti-tautology gate. ## Rule 18 Verification Built and ran against `test-fixtures/e2e-fixture.db` (499 tx): - With the small fixture DB, init completes in <300ms so both immediate and delayed curls return 200 - Unit tests confirm 503 behavior when `readiness=0` (simulating slow init) - On production DBs with 100K+ txs, the 503 window would be 5-15s (pickBestObservation processes in 5000-tx chunks with 10ms yields) ## Test Results ``` === RUN TestHealthzNotReady --- PASS === RUN TestHealthzReady --- PASS === RUN TestHealthzAntiTautology --- PASS ok github.com/corescope/server 19.662s (full suite) ``` Co-authored-by: you <you@example.com>	2026-05-01 07:55:57 -07:00
Kpa-clawbot	aeae7813bc	fix: enable SQLite incremental auto-vacuum so DB shrinks after retention (#919 ) (#920 ) Closes #919 ## Summary Enables SQLite incremental auto-vacuum so the database file actually shrinks after retention reaper deletes old data. Previously, `DELETE` operations freed pages internally but never returned disk space to the OS. ## Changes ### 1. Auto-vacuum on new databases - `PRAGMA auto_vacuum = INCREMENTAL` set via DSN pragma before `journal_mode(WAL)` in the ingestor's `OpenStoreWithInterval` - Must be set before any tables are created; DSN ordering ensures this ### 2. Post-reaper incremental vacuum - `PRAGMA incremental_vacuum(N)` runs after every retention reaper cycle (packets, metrics, observers, neighbor edges) - N defaults to 1024 pages, configurable via `db.incrementalVacuumPages` - Noop on `auto_vacuum=NONE` databases (safe before migration) - Added to both server and ingestor ### 3. Opt-in full VACUUM for existing databases - Startup check logs a clear warning if `auto_vacuum != INCREMENTAL` - `db.vacuumOnStartup: true` config triggers one-time `PRAGMA auto_vacuum = INCREMENTAL; VACUUM` - Logs start/end time for operator visibility ### 4. Documentation - `docs/user-guide/configuration.md`: retention section notes that lowering retention doesn't immediately shrink the DB - `docs/user-guide/database.md`: new guide covering WAL, auto-vacuum, migration, manual VACUUM ### 5. Tests - `TestNewDBHasIncrementalAutoVacuum` — fresh DB gets `auto_vacuum=2` - `TestExistingDBHasAutoVacuumNone` — old DB stays at `auto_vacuum=0` - `TestVacuumOnStartupMigratesDB` — full VACUUM sets `auto_vacuum=2` - `TestIncrementalVacuumReducesFreelist` — DELETE + vacuum shrinks freelist - `TestCheckAutoVacuumLogs` — handles both modes without panic - `TestConfigIncrementalVacuumPages` — config defaults and overrides ## Migration path for existing databases 1. On startup, CoreScope logs: `[db] auto_vacuum=NONE — DB needs one-time VACUUM...` 2. Set `db.vacuumOnStartup: true` in config.json 3. Restart — VACUUM runs (blocks startup, minutes on large DBs) 4. Remove `vacuumOnStartup` after migration ## Test results ``` ok github.com/corescope/server 19.448s ok github.com/corescope/ingestor 30.682s ``` --------- Co-authored-by: you <you@example.com>	2026-04-30 23:45:00 -07:00
Kpa-clawbot	a8e1cea683	fix: use payload type bits only in content hash (not full header byte) (#787 ) ## Problem The firmware computes packet content hash as: ``` SHA256(payload_type_byte + [path_len for TRACE] + payload) ``` Where `payload_type_byte = (header >> 2) & 0x0F` — just the payload type bits (2-5). CoreScope was using the full header byte in its hash computation, which includes route type bits (0-1) and version bits (6-7). This meant the same logical packet produced different content hashes depending on route type — breaking dedup and packet lookup. Firmware reference: `Packet.cpp::calculatePacketHash()` uses `getPayloadType()` which returns `(header >> PH_TYPE_SHIFT) & PH_TYPE_MASK`. ## Fix - Extract only payload type bits: `payloadType := (headerByte >> 2) & 0x0F` - Include `path_len` byte in hash for TRACE packets (matching firmware behavior) - Applied to both `cmd/server/decoder.go` and `cmd/ingestor/decoder.go` ## Tests Added - Route type independence: Same payload with FLOOD vs DIRECT route types produces identical hash - TRACE path_len inclusion: TRACE packets with different `path_len` produce different hashes - Firmware compatibility: Hash output matches manual computation of firmware algorithm ## Migration Impact Existing packets in the DB have content hashes computed with the old (incorrect) formula. Options: 1. Recompute hashes via migration (recommended for clean state) 2. Dual lookup — check both old and new hash on queries (backward compat) 3. Accept the break — old hashes become stale, new packets get correct hashes Recommend option 1 (migration) as a follow-up. The volume of affected packets depends on how many distinct route types were seen for the same logical packet. Fixes #786 --------- Co-authored-by: you <you@example.com>	2026-04-18 11:52:22 -07:00
Joel Claw	b9ba447046	feat: add nodeBlacklist config to hide abusive/troll nodes (#742 ) ## Problem Some mesh participants set offensive names, report deliberately false GPS positions, or otherwise troll the network. Instance operators currently have no way to hide these nodes from public-facing APIs without deleting the underlying data. ## Solution Add a `nodeBlacklist` array to `config.json` containing public keys of nodes to exclude from all API responses. ### Blacklisted nodes are filtered from: - `GET /api/nodes` — list endpoint - `GET /api/nodes/search` — search results - `GET /api/nodes/{pubkey}` — detail (returns 404) - `GET /api/nodes/{pubkey}/health` — returns 404 - `GET /api/nodes/{pubkey}/paths` — returns 404 - `GET /api/nodes/{pubkey}/analytics` — returns 404 - `GET /api/nodes/{pubkey}/neighbors` — returns 404 - `GET /api/nodes/bulk-health` — filtered from results ### Config example ```json { "nodeBlacklist": [ "aabbccdd...", "11223344..." ] } ``` ### Design decisions - Case-insensitive — public keys normalized to lowercase - Whitespace trimming — leading/trailing whitespace handled - Empty entries ignored — `""` or `" "` do not cause false positives - Nil-safe — `IsBlacklisted()` on nil Config returns false - Backward-compatible — empty/missing `nodeBlacklist` has zero effect - Lazy-cached set — blacklist converted to `map[string]bool` on first lookup ### What this does NOT do (intentionally) - Does not delete or modify database data — only filters API responses - Does not block packet ingestion — data still flows for analytics - Does not filter `/api/packets` — only node-facing endpoints are affected ## Testing - Unit tests for `Config.IsBlacklisted()` (case sensitivity, whitespace, empty entries, nil config) - Integration tests for `/api/nodes`, `/api/nodes/{pubkey}`, `/api/nodes/search` - Full test suite passes with no regressions	2026-04-17 23:43:05 +00:00
Joel Claw	fa3f623bd6	feat: add observer retention — remove stale observers after configurable days (#764 ) ## Summary Observers that stop actively sending data now get removed after a configurable retention period (default 14 days). Previously, observers remained in the `observers` table forever. This meant nodes that were once observers for an instance but are no longer connected (even if still active in the mesh elsewhere) would continue appearing in the observer list indefinitely. ## Key Design Decisions - Active data requirement: `last_seen` is only updated when the observer itself sends packets (via `stmtUpdateObserverLastSeen`). Being seen by another node does NOT update this field. So an observer must actively send data to stay listed. - Default: 14 days — observers not seen in 14 days are removed - `-1` = keep forever — for users who want observers to never be removed - `0` = use default (14 days) — same as not setting the field - Runs on startup + daily ticker — staggered 3 minutes after metrics prune to avoid DB contention ## Changes \| File \| Change \| \|------\|--------\| \| `cmd/ingestor/config.go` \| Add `ObserverDays` to `RetentionConfig`, add `ObserverDaysOrDefault()` \| \| `cmd/ingestor/db.go` \| Add `RemoveStaleObservers()` — deletes observers with `last_seen` before cutoff \| \| `cmd/ingestor/main.go` \| Wire up startup + daily ticker for observer retention \| \| `cmd/server/config.go` \| Add `ObserverDays` to `RetentionConfig`, add `ObserverDaysOrDefault()` \| \| `cmd/server/db.go` \| Add `RemoveStaleObservers()` (server-side, uses read-write connection) \| \| `cmd/server/main.go` \| Wire up startup + daily ticker, shutdown cleanup \| \| `cmd/server/routes.go` \| Admin prune API now also removes stale observers \| \| `config.example.json` \| Add `observerDays: 14` with documentation \| \| `cmd/ingestor/coverage_boost_test.go` \| 4 tests: basic removal, empty store, keep forever (-1), default (0→14) \| \| `cmd/server/config_test.go` \| 4 tests: `ObserverDaysOrDefault` edge cases \| ## Config Example ```json { "retention": { "nodeDays": 7, "observerDays": 14, "packetDays": 30, "_comment": "observerDays: -1 = keep forever, 0 = use default (14)" } } ``` ## Admin API The `/api/admin/prune` endpoint now also removes stale observers (using `observerDays` from config) and reports `observers_removed` in the response alongside `packets_deleted`. ## Test Plan - [x] `TestRemoveStaleObservers` — old observer removed, recent observer kept - [x] `TestRemoveStaleObserversNone` — empty store, no errors - [x] `TestRemoveStaleObserversKeepForever` — `-1` keeps even year-old observers - [x] `TestRemoveStaleObserversDefault` — `0` defaults to 14 days - [x] `TestObserverDaysOrDefault` (ingestor) — nil/zero/positive/keep-forever - [x] `TestObserverDaysOrDefault` (server) — nil/zero/positive/keep-forever - [x] Both binaries compile cleanly (`go build`) - [ ] Manual: verify observer count decreases after retention period on a live instance	2026-04-17 09:24:40 -07:00
Kpa-clawbot	dc5b5ce9a0	fix: reject weak/default API keys + startup warning (#532 ) (#628 ) ## Summary Hardens API key security for write endpoints (fixes #532): 1. Constant-time comparison — uses `crypto/subtle.ConstantTimeCompare` to prevent timing attacks on API key validation 2. Weak key blocklist — rejects known default/example keys (`test`, `password`, `change-me`, `your-secret-api-key-here`, etc.) 3. Minimum length enforcement — keys shorter than 16 characters are rejected 4. Startup warning — logs a clear warning if the configured key is weak or a known default 5. Generic error messages — HTTP 403 response uses opaque "forbidden" message to prevent information leakage about why a key was rejected ### Security Model - Empty key → all write endpoints disabled (403) - Weak/default key → all write endpoints disabled (403), startup warning logged - Wrong key → 401 unauthorized - Strong correct key → request proceeds ### Files Changed - `cmd/server/config.go` — `IsWeakAPIKey()` function + blocklist - `cmd/server/routes.go` — constant-time comparison via `constantTimeEqual()`, weak key rejection - `cmd/server/main.go` — startup warning for weak keys - `cmd/server/apikey_security_test.go` — comprehensive test coverage - `cmd/server/routes_test.go` — existing tests updated to use strong keys ### Reviews - ✅ Self-review: all security properties verified - ✅ djb Final Review: timing fix correct, blocklist pragmatic, error messages opaque, tests comprehensive. Verdict: Ship it. ### Test Results All existing + new tests pass. Coverage includes: weak key detection (blocklist + length + case-insensitive), empty key handling, strong key acceptance, wrong key rejection, and constant-time comparison. --------- Co-authored-by: you <you@example.com>	2026-04-05 14:50:40 -07:00
Kpa-clawbot	05fbcb09dd	fix: wire cacheTTL.analyticsHashSizes config to collision cache (#420 ) (#622 ) ## Summary Fixes #420 — wires `cacheTTL` config values to server-side cache durations that were previously hardcoded. ## Problem `collisionCacheTTL` was hardcoded at 60s in `store.go`. The config has `cacheTTL.analyticsHashSizes: 3600` (1 hour) but it was never read — the `/api/config/cache` endpoint just passed the raw map to the client without applying values server-side. ## Changes - `store.go`: Add `cacheTTLSec()` helper to safely extract duration values from the `cacheTTL` config map. `NewPacketStore` now accepts an optional `cacheTTL` map (variadic, backward-compatible) and wires: - `cacheTTL.analyticsHashSizes` → `collisionCacheTTL` - `cacheTTL.analyticsRF` → `rfCacheTTL` - Default changed: `collisionCacheTTL` default raised from 60s → 3600s (1 hour). Hash collision computation is expensive and data changes rarely — 60s was causing unnecessary recomputation. - `main.go`: Pass `cfg.CacheTTL` to `NewPacketStore`. - Tests: Added `TestCacheTTLFromConfig` and `TestCacheTTLDefaults` in eviction_test.go. Updated existing `TestHashCollisionsCacheTTL` for the new default. ## Audit of other cacheTTL values The remaining `cacheTTL` keys (`stats`, `nodeDetail`, `nodeHealth`, `nodeList`, `bulkHealth`, `networkStatus`, `observers`, `channels`, `channelMessages`, `analyticsTopology`, `analyticsChannels`, `analyticsSubpaths`, `analyticsSubpathDetail`, `nodeAnalytics`, `nodeSearch`, `invalidationDebounce`) are client-side only — served via `/api/config/cache` and consumed by the frontend. They don't have corresponding server-side caches to wire to. The only server-side caches (`rfCache`, `topoCache`, `hashCache`, `chanCache`, `distCache`, `subpathCache`, `collisionCache`) all use either `rfCacheTTL` or `collisionCacheTTL`, both now configurable. ## Complexity O(1) config lookup at store init time. No hot-path impact. Co-authored-by: you <you@example.com>	2026-04-05 12:49:46 -07:00
Kpa-clawbot	767c8a5a3e	perf: async chunked backfill — HTTP serves within 2 minutes (#612 ) (#614 ) ## Summary Adds two config knobs for controlling backfill scope and neighbor graph data retention, plus removes the dead synchronous backfill function. ## Changes ### Config knobs #### `resolvedPath.backfillHours` (default: 24) Controls how far back (in hours) the async backfill scans for observations with NULL `resolved_path`. Transmissions with `first_seen` older than this window are skipped, reducing startup time for instances with large historical datasets. #### `neighborGraph.maxAgeDays` (default: 30) Controls the maximum age of `neighbor_edges` entries. Edges with `last_seen` older than this are pruned from both SQLite and the in-memory graph. Pruning runs on startup (after a 4-minute stagger) and every 24 hours thereafter. ### Dead code removal - Removed the synchronous `backfillResolvedPaths` function that was replaced by the async version. ### Implementation details - `backfillResolvedPathsAsync` now accepts a `backfillHours` parameter and filters by `tx.FirstSeen` - `NeighborGraph.PruneOlderThan(cutoff)` removes stale edges from the in-memory graph - `PruneNeighborEdges(conn, graph, maxAgeDays)` prunes both DB and in-memory graph - Periodic pruning ticker follows the same pattern as metrics pruning (24h interval, staggered start) - Graceful shutdown stops the edge prune ticker ### Config example Both knobs added to `config.example.json` with `_comment` fields. ## Tests - Config default/override tests for both knobs - `TestGraphPruneOlderThan` — in-memory edge pruning - `TestPruneNeighborEdgesDB` — SQLite + in-memory pruning together - `TestBackfillRespectsHourWindow` — verifies old transmissions are excluded by backfill window --------- Co-authored-by: you <you@example.com>	2026-04-05 09:49:39 -07:00
Kpa-clawbot	6f35d4d417	feat: RF Health Dashboard M1 — observer metrics + small multiples grid (#604 ) ## RF Health Dashboard — M1: Observer Metrics Storage, API & Small Multiples Grid Implements M1 of #600. ### What this does Adds a complete RF health monitoring pipeline: MQTT stats ingestion → SQLite storage → REST API → interactive dashboard with small multiples grid. ### Backend Changes Ingestor (`cmd/ingestor/`) - New `observer_metrics` table via migration system (`_migrations` pattern) - Parse `tx_air_secs`, `rx_air_secs`, `recv_errors` from MQTT status messages (same pattern as existing `noise_floor` and `battery_mv`) - `INSERT OR REPLACE` with timestamps rounded to nearest 5-min interval boundary (using ingestor wall clock, not observer timestamps) - Missing fields stored as NULLs — partial data is always better than no data - Configurable retention pruning: `retention.metricsDays` (default 30), runs on startup + every 24h Server (`cmd/server/`) - `GET /api/observers/{id}/metrics?since=...&until=...` — per-observer time-series data - `GET /api/observers/metrics/summary?window=24h` — fleet summary with current NF, avg/max NF, sample count - `parseWindowDuration()` supports `1h`, `24h`, `3d`, `7d`, `30d` etc. - Server-side metrics retention pruning (same config, staggered 2min after packet prune) ### Frontend Changes RF Health tab (`public/analytics.js`, `public/style.css`) - Small multiples grid showing all observers simultaneously — anomalies pop out visually - Per-observer cell: name, current NF value, battery voltage, sparkline, avg/max stats - NF status coloring: warning (amber) at ≥-100 dBm, critical (red) at ≥-85 dBm — text color only, no background fills - Click any cell → expanded detail view with full noise floor line chart - Reference lines with direct text labels (`-100 warning`, `-85 critical`) — not color bands - Min/max points labeled directly on the chart - Time range selector: preset buttons (1h/3h/6h/12h/24h/3d/7d/30d) + custom from/to datetime picker - Deep linking: `#/analytics?tab=rf-health&observer=...&range=...` - All charts use SVG, matching existing analytics.js patterns - Responsive: 3-4 columns on desktop, 1 on mobile ### Design Decisions (from spec) - Labels directly on data, not in legends - Reference lines with text labels, not color bands - Small multiples grid, not card+accordion (Tufte: instant visual fleet comparison) - Ingestor wall clock for all timestamps (observer clocks may drift) ### Tests Added Ingestor tests: - `TestRoundToInterval` — 5 cases for rounding to 5-min boundaries - `TestInsertMetrics` — basic insertion with all fields - `TestInsertMetricsIdempotent` — INSERT OR REPLACE deduplication - `TestInsertMetricsNullFields` — partial data with NULLs - `TestPruneOldMetrics` — retention pruning - `TestExtractObserverMetaNewFields` — parsing tx_air_secs, rx_air_secs, recv_errors Server tests: - `TestGetObserverMetrics` — time-series query with since/until filters, NULL handling - `TestGetMetricsSummary` — fleet summary aggregation - `TestObserverMetricsAPIEndpoints` — DB query verification - `TestMetricsAPIEndpoints` — HTTP endpoint response shape - `TestParseWindowDuration` — duration parsing for h/d formats ### Test Results ``` cd cmd/ingestor && go test ./... → PASS (26s) cd cmd/server && go test ./... → PASS (5s) ``` ### What's NOT in this PR (deferred to M2+) - Server-side delta computation for cumulative counters - Airtime charts (TX/RX percentage lines) - Channel quality chart (recv_error_rate) - Battery voltage chart - Reboot detection and chart annotations - Resolution downsampling (1h, 1d aggregates) - Pattern detection / automated diagnosis --------- Co-authored-by: you <you@example.com>	2026-04-04 22:21:35 -07:00
Kpa-clawbot	b0862f7a41	fix: replace time.Tick with NewTicker in prune goroutine for graceful shutdown (#593 ) ## Summary Replace `time.Tick()` with `time.NewTicker()` in the auto-prune goroutine so it stops cleanly during graceful shutdown. ## Problem `time.Tick` creates a ticker that can never be garbage collected or stopped. While the prune goroutine runs for the process lifetime, it won't stop during graceful shutdown — the goroutine leaks past the shutdown sequence. ## Fix - Create a `time.NewTicker` and a done channel - Use `select` to listen on both the ticker and done channel - Stop the ticker and close the done channel in the shutdown path (after `poller.Stop()`) - Pattern matches the existing `StartEvictionTicker()` approach ## Testing - `go build ./...` — compiles cleanly - `go test ./...` — all tests pass Fixes #377 Co-authored-by: you <you@example.com>	2026-04-04 10:38:37 -07:00
you	0c340e1eb6	fix: set hasResolvedPath flag after ensuring column exists detectSchema() runs at DB open time before ensureResolvedPathColumn() adds the column during Load(). On first run (or any run where the column was just added), hasResolvedPath stayed false, causing Load() to skip reading resolved_path from SQLite. This forced a full backfill of all observations on every restart, burning CPU for minutes on large DBs. Fix: set hasResolvedPath = true after ensureResolvedPathColumn succeeds.	2026-04-04 07:46:25 +00:00
Kpa-clawbot	ae38cdefb4	feat: server-side hop resolution at ingest — resolved_path (#556 ) ## Summary Implements server-side hop prefix resolution at ingest time with a persisted neighbor graph. Hop prefixes in `path_json` are now resolved to full 64-char pubkeys at ingest and stored as `resolved_path` on each observation, eliminating the need for client-side resolution via `HopResolver`. Fixes #555 ## What changed ### New file: `cmd/server/neighbor_persist.go` SQLite persistence layer for the neighbor graph and resolved paths: - `neighbor_edges` table creation and management - Load/build/persist neighbor edges from/to SQLite - `resolved_path` column migration on observations - `resolvePathForObs()` — resolves hop prefixes using `resolveWithContext` with 4-tier priority (affinity → geo → GPS → first match) - Cold startup backfill for observations missing `resolved_path` - Async persistence of edges and resolved paths during ingest (non-blocking) ### Modified: `cmd/server/store.go` - `StoreObs` gains `ResolvedPath []string` field - `StoreTx` gains `ResolvedPath []string` (cached from best observation) - `Load()` dynamically includes `resolved_path` in SQL query when column exists - `IngestNewFromDB()` resolves paths at ingest time and persists asynchronously - `pickBestObservation()` propagates `ResolvedPath` to transmission - `txToMap()` and `enrichObs()` include `resolved_path` in API responses - All 7 `pm.resolve()` call sites migrated to `pm.resolveWithContext()` with the persisted graph - Broadcast maps include `resolved_path` per observation ### Modified: `cmd/server/db.go` - `DB` struct gains `hasResolvedPath bool` flag - `detectSchema()` checks for `resolved_path` column existence - Graceful degradation when column is absent (test DBs, old schemas) ### Modified: `cmd/server/main.go` - Startup sequence: ensure tables → load/build graph → backfill resolved paths → re-pick best observations ### Modified: `cmd/server/routes.go` - `mapSliceToTransmissions()` and `mapSliceToObservations()` propagate `resolved_path` - Node paths handler uses `resolveWithContext` with graph ### Modified: `cmd/server/types.go` - `TransmissionResp` and `ObservationResp` gain `ResolvedPath []string` with `omitempty` ### New file: `cmd/server/neighbor_persist_test.go` 16 tests covering: - Path resolution (unambiguous, empty, unresolvable prefixes) - Marshal/unmarshal of resolved_path JSON - SQLite table creation and column migration (idempotent) - Edge persistence and loading - Schema detection - Full Load() with resolved_path - API response serialization (present when set, omitted when nil) ## Design decisions 1. Async persistence* — resolved paths and neighbor edges are written to SQLite in a goroutine to avoid blocking the ingest loop. The in-memory state is authoritative. 2. Schema compatibility — `DB.hasResolvedPath` flag allows the server to work with databases that don't yet have the `resolved_path` column. SQL queries dynamically include/exclude the column. 3. `pm.resolve()` retained — Not removed as dead code because existing tests use it directly. All production call sites now use `resolveWithContext` with the persisted graph. 4. Edge persistence is conservative — Only unambiguous edges (single candidate) are persisted to `neighbor_edges`. Ambiguous prefixes are handled by the in-memory `NeighborGraph` via Jaccard disambiguation. 5. `null` = unresolved — Ambiguous prefixes store `null` in the resolved_path array. Frontend falls back to prefix display. ## Performance - `resolveWithContext` per hop: ~1-5μs (map lookups, no DB queries) - Typical packet has 0-5 hops → <25μs total resolution overhead per packet - Edge/path persistence is async → zero impact on ingest latency - Backfill is one-time on first startup with the new column ## Test results ``` cd cmd/server && go test ./... -count=1 → ok (4.4s) cd cmd/ingestor && go test ./... -count=1 → ok (25.5s) ``` --------- Co-authored-by: you <you@example.com>	2026-04-04 00:20:59 -07:00
Kpa-clawbot	bf2e721dd7	feat: auto-inject cache busters at server startup — eliminates merge conflicts (#481 ) ## Problem Every PR that touches `public/` files requires manually bumping cache buster timestamps in `index.html` (e.g. `?v=1775111407`). Since all PRs change the same lines in the same file, this causes constant merge conflicts — it's been the #1 source of unnecessary PR friction. ## Solution Replace all hardcoded `?v=TIMESTAMP` values in `index.html` with a `?v=__BUST__` placeholder. The Go server replaces `__BUST__` with the current Unix timestamp once at startup when it reads `index.html`, then serves the pre-processed HTML from memory. Every server restart automatically picks up fresh cache busters — no manual intervention needed. ## What changed \| File \| Change \| \|------\|--------\| \| `public/index.html` \| All `v=1775111407` → `v=__BUST__` (28 occurrences) \| \| `cmd/server/main.go` \| `spaHandler` reads index.html at init, replaces `__BUST__` with Unix timestamp, serves from memory for `/`, `/index.html`, and SPA fallback \| \| `cmd/server/helpers_test.go` \| New `TestSpaHandlerCacheBust` — verifies placeholder replacement works for root, SPA fallback, and direct `/index.html` requests. Also added tests for root `/` and `/index.html` routes \| \| `AGENTS.md` \| Rule 3 updated: cache busters are now automatic, agents should not manually edit them \| ## Testing - `go build ./...` — compiles cleanly - `go test ./...` — all tests pass (including new cache-bust tests) - `node test-frontend-helpers.js && node test-packet-filter.js && node test-aging.js` — all frontend tests pass - No hardcoded timestamps remain in `index.html` --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: you <you@example.com>	2026-04-01 23:59:59 -07:00
Kpa-clawbot	f87eb3601c	fix: graceful container shutdown for reliable deployments (#453 ) ## Summary Fixes #450 — staging deployment flaky due to container not shutting down cleanly. ## Root Causes 1. Server never closed DB on shutdown — SQLite WAL lock held indefinitely, blocking new container startup 2. `httpServer.Close()` instead of `Shutdown()` — abruptly kills connections instead of draining them 3. No `stop_grace_period` in compose configs — Docker sends SIGTERM then immediately SIGKILL (default 10s is often not enough for WAL checkpoint) 4. Supervisor didn't forward SIGTERM — missing `stopsignal`/`stopwaitsecs` meant Go processes got SIGKILL instead of graceful shutdown 5. Deploy scripts used default `docker stop` timeout — only 10s grace period ## Changes ### Go Server (`cmd/server/`) - Graceful HTTP shutdown: `httpServer.Shutdown(ctx)` with 15s context timeout — drains in-flight requests before closing - WebSocket cleanup: New `Hub.Close()` method sends `CloseGoingAway` frames to all connected clients - DB close on shutdown: Explicitly closes DB after HTTP server stops (was never closed before) - WAL checkpoint: `PRAGMA wal_checkpoint(TRUNCATE)` before DB close — flushes WAL to main DB file and removes WAL/SHM lock files ### Go Ingestor (`cmd/ingestor/`) - WAL checkpoint on shutdown: New `Store.Checkpoint()` method, called before `Close()` - Longer MQTT disconnect timeout: 5s (was 1s) to allow in-flight messages to drain ### Docker Compose (all 4 variants) - Added `stop_grace_period: 30s` and `stop_signal: SIGTERM` ### Supervisor Configs (both variants) - Added `stopsignal=TERM` and `stopwaitsecs=20` to server and ingestor programs ### Deploy Scripts - `deploy-staging.sh`: `docker stop -t 30` with explicit grace period - `deploy-live.sh`: `docker stop -t 30` with explicit grace period ## Shutdown Sequence (after fix) 1. Docker sends SIGTERM to supervisord (PID 1) 2. Supervisord forwards SIGTERM to server + ingestor (waits up to 20s each) 3. Server: stops poller → drains HTTP (15s) → closes WS clients → checkpoints WAL → closes DB 4. Ingestor: stops tickers → disconnects MQTT (5s) → checkpoints WAL → closes DB 5. Docker waits up to 30s total before SIGKILL ## Tests All existing tests pass: - `cd cmd/server && go test ./...` ✅ - `cd cmd/ingestor && go test ./...` ✅ --------- Co-authored-by: you <you@example.com> Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>	2026-04-01 12:19:20 -07:00
efiten	fe314be3a8	feat: geo_filter enforcement, DB pruning, geofilter-builder tool, HB column (#215 ) ## Summary Several features and fixes from a live deployment of the Go v3.0.0 backend. ### geo_filter — full enforcement - Go backend config (`cmd/server/config.go`, `cmd/ingestor/config.go`): added `GeoFilterConfig` struct so `geo_filter.polygon` and `bufferKm` from `config.json` are parsed by both the server and ingestor - Ingestor (`cmd/ingestor/geo_filter.go`, `cmd/ingestor/main.go`): ADVERT packets from nodes outside the configured polygon + buffer are dropped before any DB write — no transmission, node, or observation data is stored - Server API (`cmd/server/geo_filter.go`, `cmd/server/routes.go`): `GET /api/config/geo-filter` endpoint returns the polygon + bufferKm to the frontend; `/api/nodes` responses filter out any out-of-area nodes already in the DB - Frontend (`public/map.js`, `public/live.js`): blue polygon overlay (solid inner + dashed buffer zone) on Map and Live pages, toggled via "Mesh live area" checkbox, state shared via localStorage ### Automatic DB pruning - Add `retention.packetDays` to `config.json` to delete transmissions + observations older than N days on a daily schedule (1 min after startup, then every 24h). Nodes and observers are never pruned. - `POST /api/admin/prune?days=N` for manual runs (requires `X-API-Key` header if `apiKey` is set) ```json "retention": { "nodeDays": 7, "packetDays": 30 } ``` ### tools/geofilter-builder.html Standalone HTML tool (no server needed) — open in browser, click to place polygon points on a Leaflet map, set `bufferKm`, copy the generated `geo_filter` JSON block into `config.json`. ### scripts/prune-nodes-outside-geo-filter.py Utility script to clean existing out-of-area nodes from the database (dry-run + confirm). Useful after first enabling geo_filter on a populated DB. ### HB column in packets table Shows the hop hash size in bytes (1–4) decoded from the path byte of each packet's raw hex. Displayed as HB between Size and Type columns, hidden on small screens. ## Test plan - [x] ADVERT from node outside polygon is not stored (no new row in nodes or transmissions) - [x] `GET /api/config/geo-filter` returns polygon + bufferKm when configured, `{polygon: null, bufferKm: 0}` when not - [x] `/api/nodes` excludes nodes outside polygon even if present in DB - [x] Map and Live pages show blue polygon overlay when configured; checkbox toggles it - [x] `retention.packetDays: 30` deletes old transmissions/observations on startup and daily - [x] `POST /api/admin/prune?days=30` returns `{deleted: N, days: 30}` - [x] `tools/geofilter-builder.html` opens standalone, draws polygon, copies valid JSON - [x] HB column shows 1–4 for all packets in grouped and flat view 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 01:10:56 -07:00
Kpa-clawbot	5aa4fbb600	chore: normalize all files to LF line endings	2026-03-30 22:52:46 -07:00
Kpa-clawbot	93f85dee6e	Add API key auth to Go write endpoints (#283 ) ## Summary - added API key middleware for write routes in cmd/server/routes.go - protected all current non-GET API routes (POST /api/packets, POST /api/perf/reset, POST /api/decode) - middleware enforces X-API-Key against cfg.APIKey and returns 401 JSON error on missing/wrong key - preserves backward compatibility: if piKey is empty, requests pass through - added startup warning log in cmd/server/main.go when no API key is configured: - [security] WARNING: no apiKey configured — write endpoints are unprotected - added route tests for missing/wrong/correct key and empty-apiKey compatibility ## Validation - cd cmd/server && go test ./... ✅ ## Notes - config.example.json already contains piKey, so no changes were required. --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-30 11:53:35 -07:00
Kpa-clawbot	77d8f35a04	feat: implement packet store eviction/aging to prevent OOM (#273 ) ## Summary The in-memory `PacketStore` had no eviction or aging — it grew unbounded until OOM killed the process. At ~3K packets/hour and ~5KB per packet (not the 450 bytes previously estimated), an 8GB VM would OOM in a few days. ## Changes ### Time-based eviction - Configurable via `config.json`: `"packetStore": { "retentionHours": 24 }` - Packets older than the retention window are evicted from the head of the sorted slice ### Memory-based cap - Configurable via `"packetStore": { "maxMemoryMB": 1024 }` - Hard ceiling — evicts oldest packets when estimated memory exceeds the cap ### Index cleanup When a `StoreTx` is evicted, ALL associated data is removed from: - `byHash`, `byTxID`, `byObsID`, `byObserver`, `byNode`, `byPayloadType` - `nodeHashes`, `distHops`, `distPaths`, `spIndex` ### Periodic execution - Background ticker runs eviction every 60 seconds - Analytics caches and hash size cache are invalidated after eviction ### Stats fixes - `estimatedMB` now uses ~5KB/packet + ~500B/observation (was 430B + 200B) - `evicted` counter reflects actual evictions (was hardcoded to 0) - Removed fake `maxPackets: 2386092` and `maxMB: 1024` from stats ### Config example ```json { "packetStore": { "retentionHours": 24, "maxMemoryMB": 1024 } } ``` Both values default to 0 (unlimited) for backward compatibility. ## Tests - 7 new tests in `eviction_test.go` covering time-based, memory-based, index cleanup, thread safety, config parsing, and no-op when disabled - All existing tests pass unchanged Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>	2026-03-30 03:42:11 +00:00
Kpa-clawbot	cdcaa476f2	rename: MeshCore Analyzer → CoreScope (Phase 1 — backend + infra) Rename product branding, binary names, Docker images, container names, Go modules, proto go_package, CI, manage.sh, and documentation. Preserved (backward compat): - meshcore.db database filename - meshcore-data / meshcore-staging-data directory paths - MQTT topics (meshcore/#, meshcore/+/+/packets, etc.) - proto package namespace (meshcore.v1) - localStorage keys Changes by category: - Go modules: github.com/corescope/{server,ingestor} - Binaries: corescope-server, corescope-ingestor - Docker images: corescope:latest, corescope-go:latest - Containers: corescope-prod, corescope-staging, corescope-staging-go - Supervisord programs: corescope, corescope-server, corescope-ingestor - Branding: siteName, heroTitle, startup logs, fallback HTML - Proto go_package: github.com/corescope/proto/v1 - CI: container refs, deploy path - Docs: 8 markdown files updated Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-28 14:08:15 -07:00
Kpa-clawbot	b326e3f1a6	fix: pprof port conflict crashed Go server — non-fatal bind + separate ports Server defaults to 6060, ingestor to 6061. Removed shared PPROF_PORT env var. Bind failure logs warning instead of log.Fatal killing the process. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-28 13:01:41 -07:00
Kpa-clawbot	6d31cb2ad6	feat: add pprof profiling controlled by ENABLE_PPROF env var Add net/http/pprof support to both Go server (default port 6060) and ingestor (default port 6061). Profiling is off by default — only starts the pprof HTTP listener when ENABLE_PPROF=true. PPROF_PORT env var overrides the default port for each binary. Enable on staging-go in docker-compose with exposed ports 6060/6061. Not enabled on prod. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-28 11:18:33 -07:00
Kpa-clawbot	e300228874	perf: add TTL cache for subpaths API + build timestamp in stats/health - Add 15s TTL cache to GetAnalyticsSubpaths with composite key (region\|minLen\|maxLen\|limit), matching the existing cache pattern used by RF, topology, hash, channel, and distance analytics. Cache hits return instantly vs 900ms+ computation. fixes #168 - Add BuildTime to /api/stats and /api/health responses, injected via ldflags at build time. Dockerfile.go now accepts BUILD_TIME build arg. fixes #165 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 15:44:06 -07:00
Kpa-clawbot	0d9b535451	feat: add version and git commit to /api/stats and /api/health Node.js: reads version from package.json, commit from .git-commit file or git rev-parse --short HEAD at runtime, with unknown fallback. Go: uses -ldflags build-time variables (Version, Commit) with fallback to .git-commit file and git command at runtime. Dockerfile: copies .git-commit if present (CI bakes it before build). Dockerfile.go: passes APP_VERSION and GIT_COMMIT as build args to ldflags. deploy.yml: writes GITHUB_SHA to .git-commit before docker build steps. docker-compose.yml: passes build args to Go staging build. Tests updated to verify version and commit fields in both endpoints. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 09:39:49 -07:00
Kpa-clawbot	afe16db960	feat(go-server): in-memory packet store — port of packet-store.js Streams transmissions + observations from SQLite at startup into 5 indexed in-memory structures. QueryPackets and QueryGroupedPackets now serve from RAM (<10ms) instead of hitting SQLite (2.3s). - store.go: PacketStore with byHash, byTxID, byObsID, byObserver, byNode indexes - main.go: create + load store at startup - routes.go: dispatch to store for packet/stats endpoints - websocket.go: poller ingests new transmissions into store Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 08:52:07 -07:00
Kpa-clawbot	b2e6c8105b	fix: handle WebSocket upgrade at root path (client connects to ws://host/) Node.js upgrades WS at /, Go was only at /ws. Now the static file handler checks for Upgrade header first and routes to WebSocket. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 01:25:35 -07:00
Kpa-clawbot	742ed86596	feat: add Go web server (cmd/server/) — full API + WebSocket + static files 35+ REST endpoints matching Node.js server, WebSocket broadcast, static file serving with SPA fallback, config.json support. Uses modernc.org/sqlite (pure Go, no CGO required). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 01:16:59 -07:00

44 Commits