meshcore-analyzer

mirror of https://github.com/Kpa-clawbot/meshcore-analyzer.git synced 2026-06-03 21:51:47 +00:00

Author	SHA1	Message	Date
efiten	0b35c7eef3	feat(server): persist multi-byte capability across restart + O(1) per-key lookup (#903 ) (#1324 ) ## Summary Follows the reconciliation recommendation in #916 — extracts only the NET-NEW persistence layer from that PR (which is now superseded by #1002 for the overlay UI) into a focused 6-file change against current master. What this adds: - `multibyte_sup_v1` migration: `multibyte_sup INTEGER NOT NULL DEFAULT 0` + `multibyte_evidence TEXT` on `nodes`/`inactive_nodes` so capability survives restart - `hasMultibyteSupCols` schema detection gates the persist/load paths - `loadMultibyteCapFromDB()`: pre-populates `mbCapSnapshot`/`mbCapIndex` at startup — cold starts serve last-known capability without waiting for the first ~15s analytics cycle - `maybePersistMultibyteCapability()` + `persistMultibyteCapability()`: after each analytics cycle; TryLock-gated (concurrent cycles coalesce); skips `sup==0` entries (data-destruction guard) - `GetMultibyteCapFor(pk)`: O(1) map lookup; both `handleNodes` and node-detail call sites updated from the O(N)-alloc `GetMultiByteCapMap()` What this explicitly does NOT change: - API field names (`multi_byte_status`, `multi_byte_evidence`, `multi_byte_max_hash_size`) - `EnrichNodeWithMultiByte` — unchanged - `GetMultiByteCapMap` — still present for any external callers - `public/map.js`, `public/live.css`, `Dockerfile`, `docs/` — zero frontend churn ## Test plan - [x] `TestMultibyteCapPersistRoundTrip` — confirmed values survive persist → fresh-store load - [x] `TestMultibyteCapPersistSkipsUnknown` — data-destruction guard: `sup==0` entry does not overwrite DB-confirmed value - [x] `TestMultibyteCapMaybePersistCoalesces` — TryLock coalesces 10 concurrent callers without deadlock - [x] `TestMultibyteCapGetMultibyteCapForO1` — O(1) index returns correct entry / false for unknown pubkey - [x] `TestMultibyteCapLoadFromDB` — only `sup>0` rows loaded; `sup==0` row excluded - [x] `TestSchemaMultibyteSupColumns` — migration adds columns to both tables; idempotent on second `OpenStore` - [x] All existing `TestMultiByteCapability_*` tests pass unchanged - [x] Full ingestor test suite: `ok` in 27s - [x] `go build ./cmd/server/ && go build ./cmd/ingestor/` clean 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: openclaw-bot <bot@openclaw>	2026-05-25 22:35:35 -07:00
Kpa-clawbot	9d3dd8df0a	fix(packets): order by ingest id, not rxTime — fresh activity visible on packets page (#1345 ) (#1349 ) ## Summary Fixes #1345 — the packets page shows "no recent activity" while MQTT ingest is healthy because the default `/api/packets` query was `ORDER BY first_seen DESC`, and PR #1233 redefined `first_seen` as the observer's radio receive time (rxTime). When an observer buffers offline and uploads hours later, its packets land with hours-old `first_seen` values; older-ingested packets with fresher rxTime then crowd the top of the list and the visually freshest activity disappears. ## Fix Switch the default ordering to `t.id DESC` (ingest order) on `/api/packets` and the closely-related endpoints. `id` is monotonic with ingest time and immune to buffered uploads. Endpoints changed (all use the same fix for the same reason): \| Path \| Function \| File \| \|------\|----------\|------\| \| `GET /api/packets` (default) \| `DB.QueryPackets`, `Store.QueryPackets` \| `cmd/server/db.go`, `cmd/server/store.go` \| \| `GET /api/packets?nodes=…` \| `DB.QueryMultiNodePackets`, `Store.QueryMultiNodePackets` \| same \| \| Node detail "recent transmissions" \| `DB.GetRecentTransmissionsForNode` \| `cmd/server/db.go` \| ## `since=` semantic — preserved `since=` still filters by `first_seen` (RFC3339 path uses the observations.timestamp subquery), i.e. "packets the network received since X." Buffered uploads of older packets are still excluded from a `since=15m` view even if they were ingested in the last 15 minutes. Only the display order changes; filtering by receive time is unchanged. ## Audit — NOT changed - `Store.QueryGroupedPackets` already sorts by `LatestSeen` (max observation timestamp), which is correct for the grouped view and immune to the buffered-upload regression. - `GetChannelMessages` and channel `sample_json` subqueries keep `first_seen DESC` — channel message chronology is meaningful for message UX; if buffered uploads become a problem here too it's a separate UX call (out of scope for #1345). - `s.packets` insertion ordering (Load + ingest) — untouched. The fix sorts at query time so we don't perturb `oldestLoaded` invariants. ## Tests — TDD red → green - Red: `508f4371` adds `cmd/server/packets_order_test.go` with two cases — order assertion (failed on master with `[fresh, buffered]`) and since-filter semantic (RFC3339 path uses observation timestamps). - Green: `0fd685e7` switches the SQL + in-memory ordering. Tests pass; full `cmd/server` suite green locally (44s). ## Out of scope - Re-thinking #1233's first_seen semantics - Adding a UI sort toggle (issue's option 2) - Channel-message page ordering ## Preflight Clean (`bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master`). --------- Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-25 22:32:00 -07:00
Kpa-clawbot	c7ab5f3eb9	fix(#1366 ): channels view shows latest message time — backend emits LatestSeen, not FirstSeen (#1368 ) Red commit: `702d82eb5e` (CI: see Actions tab for fix/issue-1366) ## What Channel view emits the max observation timestamp (`tx.LatestSeen`) instead of the analyzer's first-observation time (`tx.FirstSeen`) as the rendered `timestamp` field. A new `first_seen` field is exposed alongside for debug surfaces. `sender_timestamp` continues to be returned in the JSON response but is intentionally NOT used as the rendered time (client clocks are unreliable). ## Root cause Two parallel call sites both emitted the wrong field: - `cmd/server/store.go` — `GetChannelMessages` (~line 4807): set `entry.Data["timestamp"] = strOrNil(tx.FirstSeen)` for every new dedup entry. `tx.FirstSeen` is the analyzer's first-ever observation time of a `transmissions.hash` row; for heartbeat-style packets (e.g. `BlorkoBot 🤖` posting the same status line periodically), the hash is stable, so FirstSeen stays pinned at the very first observation while the message keeps retransmitting hours later. Operator sees "old" message timestamps for live messages. - `cmd/server/db.go` — `GetChannelMessages` (~line 1757): same problem against the SQLite-backed query path. Used `nullStr(fs)` (where `fs` is `t.first_seen`) for the `timestamp` field. ### Repro from staging Same packet, same hash `aba4f0493249de57`, sender `BlorkoBot 🤖`: - `/api/channels/%23test/messages` → `timestamp: "2026-05-25T15:53:20Z"` (FirstSeen, 7h+ in the past) - `/api/packets?hash=aba4f0493249de57` → `first_seen: "2026-05-25T22:53:19Z"` (latest obs), `observation_count: 84` The packets view used max-obs correctly; the channels view did not. 7h gap matches operator screenshot. ## TDD red → green Red: `cmd/server/channels_message_order_1366_test.go` — three tests: - `TestChannelMessages_TimestampUsesLatestSeen`: seeds a CHAN tx with observations 7h apart, asserts returned `timestamp` ≈ latest observation epoch (±1s). Fails under FirstSeen with Δ=−25200s. - `TestChannelMessages_TimestampNotSenderTimestamp`: seeds a CHAN tx whose decoded `sender_timestamp` is year-2000 (bad RTC). Asserts the rendered `timestamp` parses to current year — guards against the tempting "just use sender_timestamp" alt-fix that would let bad client clocks corrupt the view. - `TestChannelMessages_TimestampIsUTCZ`: asserts the emitted string is unambiguously UTC (suffix `Z` or `+00:00`) so browsers don't apply a local-zone shift. Green commit changes: - `store.go`: emit `tx.LatestSeen` (with FirstSeen fallback if no obs); add `first_seen` field. - `db.go`: join `o.timestamp` per-observation, track max epoch per tx, emit RFC3339 UTC at the end; add `first_seen` field. `sender_timestamp` remains in the response — unchanged shape, frontend never read it for the rendered time (verified: only `msg.timestamp` is consumed in `public/channels.js:1902`). ## Manual verification (post-merge) 1. Deploy to staging. 2. Curl `/api/channels/%23test/messages?limit=5` and `/api/packets?hash=<recent>`. The channel `timestamp` field MUST equal the packets `first_seen` (max obs) for the same hash, NOT lag it. 3. Send a fresh GRP_TXT via a MeshCore client into a watched channel. Within 15s, refresh the Channels view at `/channels`. The new message MUST render at the bottom with the correct (current) time. ## Why not `sender_timestamp`? It's a per-client field, decoded from the payload. Many MeshCore firmware builds run without RTC/NTP/GPS and report bogus values. Trusting it for display would propagate bad client clocks into the analyzer UI — the analyzer is the source of truth for UTC, not the client. Fixes #1366 --------- Co-authored-by: CoreScope Bot <bot@corescope> Co-authored-by: bot <bot@kpa-clawbot.dev> Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-25 17:45:32 -07:00
efiten	317b59ab10	feat: area-based visual node filter — attribute packets by transmitter GPS (#804 ) (#839 ) ## Summary - Adds configurable GPS polygon areas to `config.json`; nodes are attributed to an area if their last-known position falls inside the polygon - New `Area: …` dropdown filter (matching the existing region filter style) appears on all analytics, nodes, packets, map, and live screens when areas are configured - Backend resolves area membership with a 30s TTL cache; area filter bypasses the 500-node cap on `/api/bulk-health` so all area nodes are always returned - Includes a polygon builder tool (`/area-map.html`) for drawing and exporting area boundaries ## Changes Backend - `AreaEntry` type + `Areas` config field - `GetNodePubkeysInArea` DB query + `resolveAreaNodes` (30s TTL, `areaNodeMu` RWMutex) - `PacketQuery.Area` + `filterPackets` polygon check - `?area=` param propagated through all analytics, topology, clock-health, and bulk-health routes - `/api/config/areas` endpoint Frontend - `area-filter.js`: single-select dropdown, persists to localStorage, cleans up stale keys on load - Wired into analytics, nodes, packets, channels, map, and live pages - Live map clears node markers on area change Docs & tools - `docs/user-guide/area-filter.md` — configuration and usage guide - `docs/api-spec.md` — updated with new endpoint and `?area=` param table - `tools/area-map.html` — polygon builder for defining area boundaries - Demo areas added to `config.example.json` ## Test plan - [x] No areas configured → filter dropdown does not appear on any page - [x] Areas configured → dropdown appears, "All" selected by default - [x] Selecting an area filters nodes/packets/topology/map correctly - [x] Selecting "All" restores unfiltered view - [x] Selection persists across page reloads (localStorage) - [x] Stale localStorage key (area removed from config) is cleared on load - [x] `/api/bulk-health?area=X` returns all nodes in area (no 500-node cap) - [x] `/api/config/areas` returns correct list 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Kpa-clawbot <kpaclawbot@outlook.com> Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-21 14:00:15 -07:00
efiten	2329639f45	feat: scoped/unscoped transport-route statistics (#899 ) (#915 ) @ ## What this PR does Implements region-scoped transport-route packet tracking with two sub-features: ### Feature 1 — Scope statistics (`scope_name`) - At ingest, transport-route packets (route_type 0/3) with Code1 != `0000` are HMAC-matched against configured `hashRegions` keys (mirroring the `hashChannels` pattern). Matched region name (or `""` for unknown) stored in new `transmissions.scope_name` column via migration `scope_name_v1`. - New `GET /api/scope-stats?window=` endpoint (1h/24h/7d, 30s server-side TTL) returning transport totals, scoped/unscoped counts, per-region breakdown, and time-series. - New Scopes tab in Analytics with summary cards, per-region table, and two-line SVG chart. Auto-refreshes every 60s. ### Feature 2 — Node default scope (`default_scope`) - Per-node `default_scope` column on `nodes`/`inactive_nodes` (migration `nodes_default_scope_v1`) tracks the most recently matched region for each node, derived from transport-scoped ADVERT packets. - `GET /api/nodes` response includes `default_scope` field when column is present. - Node detail panel displays the default scope badge. - Async startup backfill (`BackfillDefaultScopeAsync`) populates the column for nodes with pre-existing ADVERT data. ### Config Add `hashRegions` to `config.json` (see `config.example.json`). One entry per region name (with or without leading `#`). @ --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Kpa-clawbot <kpaclawbot@outlook.com> Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-21 14:00:06 -07:00
efiten	51f823bf7e	feat: one-click prune nodes outside geofilter (#669 M4) (#738 ) ## Summary - Adds `POST /api/admin/prune-geo-filter` endpoint — dry-run by default, `?confirm=true` to permanently delete nodes outside the current geofilter polygon + buffer. Requires `X-API-Key` header. - Adds Prune nodes section inside the GeoFilter customizer tab (write-access only, same `writeEnabled` gate as PUT). Preview lists affected nodes; Confirm delete removes them. - Adds `GetNodesForGeoPrune` and `DeleteNodesByPubkeys` DB helpers. - Updates `docs/user-guide/geofilter.md` — documents the UI button as primary workflow, CLI script as alternative. > Depends on M3 (`feat/geofilter-m3-customizer`, PR #736). Merge M3 first. ## Test plan - [x] `cd cmd/server && go test ./...` — all pass - [x] Customizer GeoFilter tab without `apiKey` — Prune section not visible - [x] With `apiKey` + polygon active — Prune section visible - [x] Preview returns list of nodes outside polygon (no deletions) - [x] Confirm delete removes nodes, list clears - [x] `POST /api/admin/prune-geo-filter` without `X-API-Key` → 401 - [x] `POST /api/admin/prune-geo-filter` with no polygon configured → 400 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 03:19:31 +00:00
Kpa-clawbot	1da2034341	refactor(db): move all writes from server to ingestor; server truly read-only (fixes #1283 ) (#1286 ) Red commit: `f6290b63` — CI run will appear at https://github.com/Kpa-clawbot/CoreScope/actions Fixes #1283. ## What Moves all four DB write operations out of `cmd/server/` into `cmd/ingestor/`, making the server truly read-only and eliminating the SQLITE_BUSY VACUUM bug at its root: the server can no longer race the ingestor for the write lock because the server has no write path. ## The four operations \| # \| Was in \| Now in \| \|---\|--------\|--------\| \| 1 \| `cmd/server/vacuum.go` (`checkAutoVacuum`, full VACUUM + `auto_vacuum=INCREMENTAL` migration) \| `cmd/ingestor/db.go` `Store.CheckAutoVacuum` (already existed; ingestor runs it at startup before the MQTT subscriber starts → no contention) \| \| 2 \| `cmd/server/db.go` `PruneOldPackets` (`DELETE FROM transmissions`) \| `cmd/ingestor/maintenance.go` `Store.PruneOldPackets` (new) + 24h ticker in `cmd/ingestor/main.go` \| \| 3 \| `cmd/server/db.go` `PruneOldMetrics` (`DELETE FROM observer_metrics`) \| `cmd/ingestor/db.go` `Store.PruneOldMetrics` (already existed) \| \| 4 \| `cmd/server/db.go` `RemoveStaleObservers` (`UPDATE observers SET inactive=1`) \| `cmd/ingestor/db.go` `Store.RemoveStaleObservers` (already existed) \| ## HTTP surface - Removed: `POST /api/admin/prune` (`handleAdminPrune`, route, openapi entry). Operators trigger an ad-hoc prune by restarting the ingestor. - Kept: `GET /api/backup` — uses `VACUUM INTO` which writes to a separate file, not the live DB; read-only-safe. ## Tests - `cmd/server/readonly_invariant_test.go` (RED gate) — reflect-asserts `PruneOldPackets`/`PruneOldMetrics`/`RemoveStaleObservers` are NOT methods on the server's `DB`. Fails on master, passes after this PR. - `cmd/ingestor/issue1283_test.go` — exercises `Store.PruneOldPackets` and the auto_vacuum=NONE → INCREMENTAL migration through `Store.CheckAutoVacuum` with `vacuumOnStartup=true`. ## Why the bug is gone The SQLITE_BUSY VACUUM failure happened because supervisord launched both ingestor + server in one container; the ingestor took the write lock for INSERTs and the server's `checkAutoVacuum` then failed to acquire it within `busy_timeout=5000`. After this PR, only the ingestor ever opens a writable connection, and it runs `CheckAutoVacuum` before* spawning the MQTT subscriber → no contention possible. ## Scope notes - `cachedRW()` still has three pre-existing callers in `cmd/server/` (`neighbor_persist.go`, `ensure_indexes.go`, `from_pubkey_migration.go`). These pre-date #1283 and are not in the issue's four-operation list. Leaving them for follow-up keeps this PR honest about scope; AGENTS.md documents the invariant so new write paths can't sneak in. - PII preflight reports false positives on the Go method name `requireAPIKey` in `routes.go` diff context — no real PII. - Server-side neighbor-edge prune (`PruneNeighborEdges`) intentionally left in place — out of scope of #1283. --------- Co-authored-by: MeshCore Bot <bot@meshcore.local>	2026-05-18 23:52:27 -07:00
Kpa-clawbot	b881a09f02	feat(#1188 ): show observer IATA on packets + filter grammar (#1189 ) Red commit: `4ed272761b` (CI run: https://github.com/Kpa-clawbot/CoreScope/actions/runs/25651898290) Fixes #1188 — observer IATA on packets in three UI surfaces + filter grammar. cross-stack: justified — feature spans API shape (Go), store, filter grammar (JS), three packets UI surfaces. ## Scope shipped - Packets table row: `.badge-iata` pill inline next to observer name - Expanded observation rows: per-observation IATA badge - Detail pane: Observer dd + per-observation list both render the badge - Filter grammar: `observer_iata` field + `iata` alias; `==`/`!=`/`contains`, plus a new `in (a, b, c)` list operator. Both names appear in autocomplete with descriptions. ## TDD red→green pairs 1. `271d72f` filter-grammar tests → `2c182eb` evaluator + suggest entries 2. `4ed2727` backend `observer_iata` API tests → `7856914` SQL join + struct/store wiring 3. `0e09371` display E2E → `7a3f45d` packets.js + style.css badge (E2E swapped for string-contract unit test in `ee414b4` — fixture `observations.observer_idx` stores text pubkeys, blocking the join the badge depends on) ## Backend - `cmd/server/db.go`: SELECT `obs.iata AS observer_iata` in `transmissionBaseSQL`, grouped query, observations-by-transmissions - `cmd/server/store.go`: `ObserverIATA` on `StoreTx`/`StoreObs`, load via all three ingest paths, surface in `txToMap`/`enrichObs`/`groupedTxsToPage` - `cmd/server/types.go`: field added to `TransmissionResp`/`ObservationResp`/`GroupedPacketResp` - Test fixture schemas declare `iata` on observers ## Perf Per #383, `obsIataBadge(packet)` reads `packet.observer_iata` directly (server-joined). Falls back to `observerMap.get(id).iata` only if absent — hot row-render loop avoids per-row Map lookup on fresh data. ## Display rules Missing IATA: nothing inline (Region column still shows `—`). No new hex — `.badge-iata` uses `var(--nav-bg)` / `var(--nav-text)`. E2E assertion added: test-observer-iata-1188.js:51 --------- Co-authored-by: OpenClaw Bot <bot@openclaw.dev> Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-17 16:13:11 +00:00
Kpa-clawbot	b21badbcbd	fix(#1225 ): paginate channel messages at SQL level — 30s → <500ms (#1226 ) ## Summary Fixes #1225 — channel messages endpoint took ~30s on staging. ## Root cause `(DB).GetChannelMessages` SELECTed every observation row for the channel (one row per observation, not per transmission), JSON-unmarshalled each row into a Go map, dedupe-folded by `(sender, packetHash)`, then sliced the tail in Go for pagination. On staging `#wardriving`: - `transmissions` rows with `channel_hash='#wardriving' AND payload_type=5`: 5,703* - `observations` joined to those: 274,632 (~48× amplification) - `time curl /api/channels/%23wardriving/messages?limit=50`: 30.04s / 31.41s / 31.48s / 35.33s / 34.05s (5 calls before I killed the loop) `EXPLAIN QUERY PLAN` showed the index `idx_tx_channel_hash` was being used — the cost was entirely in fetching, unmarshalling, and folding the full observation set per request even for `limit=50`. Hypothesis #1 from the issue (full table scan on `messages/decoded`) is rejected; #2 (missing index) is rejected; the actual cause was pagination in Go instead of SQL — request cost was O(observations) not O(limit). ## Fix Move pagination into SQL on the `transmissions` table. Because `transmissions.hash` is `UNIQUE` and the original dedup key was `(sender, hash)`, each transmission collapses to exactly one logical message — paginating on transmissions is semantically equivalent to the prior in-Go dedup + tail slice. New shape: 1. `COUNT()` on transmissions for total (uses `idx_tx_channel_hash`). 2. `SELECT id FROM transmissions … ORDER BY first_seen DESC LIMIT ? OFFSET ?` to pick the page of newest transmissions. 3. `SELECT … FROM observations WHERE transmission_id IN (…page ids…)` — typically 50 ids → a few hundred observation rows. 4. Reassemble in pageIDs order, preserving the ASC-by-`first_seen` API contract. Region filtering, observation-count-as-`repeats`, and "first observation wins for hops/snr/observer" semantics are preserved (observations are scanned `ORDER BY o.id ASC`). ## Perf measurements Before* (staging `#wardriving`, limit=50, 5 samples killed mid-loop): 30.04s, 31.41s, 31.48s, 35.33s, 34.05s. Synthetic regression test (`TestGetChannelMessagesPerfLargeChannel`): 3000 tx × 50 obs. - Broken impl: ~4.5s (test fails the 500ms budget — the RED commit). - Fixed impl: well under 500ms (test passes). After (staging): will measure post-deploy and post-comment on issue with numbers. Synthetic scaling: staging is ~2× the test's transmission count, fixed-path cost scales with `limit` (50) + `COUNT()` (~5k rows on index) — expect <100ms p99. ## TDD - RED: `697c290d` — perf test asserts <500ms on 3k×50 dataset; fails at ~4.5s. - GREEN: `3f1f82d3` — fix; full suite green, perf test passes. ## Hypotheses status \| # \| Hypothesis \| Verdict \| \|---\|---\|---\| \| 1 \| Endpoint slow on prod-sized data \| CONFIRMED* (different mechanism — see root cause) \| \| 2 \| Missing channel_hash index \| Rejected (`idx_tx_channel_hash` exists & used) \| \| 3 \| Frontend re-render storm \| Not investigated (backend was clearly the bottleneck) \| \| 4 \| Decode in request path \| Rejected (decode is at ingest time; JSON unmarshal of cached `decoded_json` is the cost, addressed by reducing row count) \| \| 5 \| WS subscription failure \| Rejected \| \| 6 \| Staging artifact \| Rejected (reproducible) \| ## Out of scope - The in-memory `(*PacketStore).GetChannelMessages` path (used when `s.db == nil`) has the same shape but operates on bounded in-memory data; not touched. If we ever fall back to it in production we'll revisit. --------- Co-authored-by: clawbot <bot@corescope>	2026-05-16 17:28:40 +00:00
efiten	11d2026bb1	feat(startup): hot startup — load hotStartupHours synchronously, fill retentionHours in background (#1187 ) Closes #1183 ## Summary - Adds `packetStore.hotStartupHours` config key (float64, default 0 = disabled). When set, `Load()` loads only that many hours of data synchronously, reducing startup time on large DBs. Background goroutine fills the remaining `retentionHours` window in daily chunks after startup completes. - A background goroutine (`loadBackgroundChunks`) fills the remaining `retentionHours` window in daily chunks after startup completes. Analytics indexes are rebuilt once at the end. - `QueryPackets` and `QueryGroupedPackets` check `oldestLoaded` and fall back to `db.QueryPackets()` for any query whose `Since`/`Until` predates the in-memory window — covering days 8–30 permanently (beyond `retentionHours`) and the background-fill gap during startup. - `/api/perf` gains `hotStartupHours`, `backgroundLoadComplete`, and `backgroundLoadProgress` fields inside `packetStore` so operators can monitor the fill. ### Drive-by fixes - E2E: added `gotoPackets` navigation helper used across packet-related tests - E2E: rewrote stripe assertion to check per-row stripe parity rather than a fragile computed-style comparison - E2E: theme test updated to use `#/home` as the initial route (was `#/`) - `db.go`: removed the RFC3339→unix-timestamp subquery path in `buildTransmissionWhere`; `t.first_seen` is now always compared directly as a string for both RFC3339 and non-RFC3339 inputs ## Configuration ```json "packetStore": { "retentionHours": 168, "hotStartupHours": 24 } ``` `hotStartupHours: 0` (default) preserves existing behavior exactly. Recommended for large DBs to reduce startup time; set to 0 to disable (loads full retentionHours at startup, legacy behavior). ## Test plan - [x] `TestHotStartupConfig_Clamp` — clamping when `hotStartupHours > retentionHours` - [x] `TestHotStartupConfig_ZeroIsDisabled` — zero leaves feature disabled - [x] `TestHotStartup_LoadsOnlyHotWindow` — only hot-window packets in memory after `Load()` - [x] `TestHotStartup_DisabledWhenZero` — all retention packets loaded when disabled - [x] `TestHotStartup_loadChunk_AddsOlderData` — chunk merges correctly, ASC order maintained - [x] `TestHotStartup_BackgroundFillsToRetention` — background goroutine fills to `retentionHours` - [x] `TestHotStartup_ChunkErrorRecovery` — chunk SQL failure logged and skipped, loop terminates - [x] `TestHotStartup_SQLFallback_TriggeredForOldDate` — query before `oldestLoaded` routes to SQL - [x] `TestHotStartup_SQLFallback_NotTriggeredForRecentDate` — recent query stays in-memory - [x] `TestHotStartup_PerfStats` — new fields present in `GetPerfStoreStats()` (backs the perf endpoint) - [x] `TestHotStartup_PerfStoreHTTP` — HTTP-level: GET /api/perf returns `hotStartupHours`, `backgroundLoadComplete`, `backgroundLoadProgress` in `packetStore` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: CoreScope Bot <bot@corescope.local>	2026-05-15 22:46:25 -07:00
Kpa-clawbot	fb744d895f	fix(#1143 ): structural pubkey attribution via from_pubkey column (#1152 ) Fixes #1143. ## Summary Replaces the structurally unsound `decoded_json LIKE '%pubkey%'` (and `OR LIKE '%name%'`) attribution path with an exact-match lookup on a dedicated, indexed `transmissions.from_pubkey` column. This closes both holes documented in #1143: - Hole 1 — same-name false positives via `OR LIKE '%name%'` - Hole 2a — adversarial spoofing: a malicious node names itself with another node's pubkey and gets attributed to the victim - Hole 2b — accidental false positive when any free-text field (path elements, channel names, message bodies) contains a 64-char hex substring matching a real pubkey - Perf — query now uses an index instead of a full-table scan against `LIKE '%substring%'` ## TDD Two-commit history shows red-then-green: \| Commit \| Status \| Purpose \| \|---\|---\|---\| \| `7f0f08e` \| RED — tests assertion-fail on master behaviour \| Adversarial fixtures + spec \| \| `59327db` \| GREEN — schema + ingestor + server + migration \| Implementation \| The red commit's test schema includes the new column so the file compiles, but the production code still uses LIKE — the assertions fail because the malicious / same-name / free-text rows are returned. The green commit changes the query plus adds the migration/ingest path. ## Changes ### Schema - new column `transmissions.from_pubkey TEXT` - new index `idx_transmissions_from_pubkey` ### Ingestor (`cmd/ingestor/`) - `PacketData.FromPubkey` populated from decoded ADVERT `pubKey` at write time. Cheap — already parsing `decoded_json`. Non-ADVERTs stay NULL. - `stmtInsertTransmission` writes the column. - Migration `from_pubkey_v1` ALTERs legacy DBs to add the column + index. - Bonus: rewrote the recipe in the gated one-shot `advert_count_unique_v1` migration to use `from_pubkey` (already marked done on existing DBs; kept correct for fresh installs). ### Server (`cmd/server/`) - `ensureFromPubkeyColumn` mirrors the ingestor migration so the server can boot against a DB the ingestor has never touched (e2e fixture, fresh installs). - `backfillFromPubkeyAsync` runs after HTTP starts. Scans `WHERE from_pubkey IS NULL AND payload_type = 4` in 5000-row chunks with a 100ms yield between chunks. Cannot block boot even on prod-sized DBs (100K+ transmissions). Queries handle NULL gracefully (return empty for that pubkey, same as today's unknown-pubkey path). - All in-scope LIKE call sites switched to exact match: \| Site \| Before \| After \| \|---\|---\|---\| \| `buildPacketWhere` (was db.go:582) \| `decoded_json LIKE '%pubkey%'` \| `from_pubkey = ?` \| \| `buildTransmissionWhere` (was db.go:626) \| `t.decoded_json LIKE '%pubkey%'` \| `t.from_pubkey = ?` \| \| `GetRecentTransmissionsForNode` (was db.go:910) \| `LIKE '%pubkey%' OR LIKE '%name%'` \| `t.from_pubkey = ?` \| \| `QueryMultiNodePackets` (was db.go:1785) \| `decoded_json LIKE '%pubkey%' OR ...` \| `t.from_pubkey IN (?, ?, ...)` \| \| `advert_count_unique_v1` (was ingestor/db.go:257) \| `decoded_json LIKE '%' \\|\\| nodes.public_key \\|\\| '%'` \| `t.from_pubkey = nodes.public_key` \| `GetRecentTransmissionsForNode` signature simplifies: the `name` parameter is gone (it was only ever used for the legacy `OR LIKE '%name%'` fallback). Sole caller in `routes.go:1243` updated. ### Tests - `cmd/server/from_pubkey_attribution_test.go` — adversarial fixtures + Hole 1/2a/2b/QueryMultiNodePackets exact-match assertions, EXPLAIN QUERY PLAN index check, migration backfill correctness. - `cmd/ingestor/from_pubkey_test.go` — write-time correctness (BuildPacketData populates FromPubkey for ADVERT only; InsertTransmission persists it; non-ADVERTs stay NULL). - Existing test schemas (server v2, server v3, coverage) get the new column plus a SQLite trigger that auto-populates `from_pubkey` from `decoded_json` on ADVERT inserts. This means existing fixtures (which only seed `decoded_json`) keep attributing correctly without per-test edits. - `seedTestData`'s ADVERTs explicitly set `from_pubkey`. ## Performance — index is used ``` $ EXPLAIN QUERY PLAN SELECT id FROM transmissions WHERE from_pubkey = ? SEARCH transmissions USING INDEX idx_transmissions_from_pubkey (from_pubkey=?) ``` Asserted in `TestFromPubkeyIndexUsed`. ## Migration approach - Sync at boot: `ALTER TABLE transmissions ADD COLUMN from_pubkey TEXT` is a metadata-only operation in SQLite — microseconds regardless of table size. `CREATE INDEX IF NOT EXISTS idx_transmissions_from_pubkey` is not metadata-only: it scans the table once. Empirically a few hundred ms on a 100K-row table; expect a few seconds on a 10M-row table (one-time cost, blocking boot during that window). Subsequent boots no-op via `IF NOT EXISTS`. If this boot delay becomes an operational concern at prod scale we can defer the `CREATE INDEX` to a goroutine — for now a few-second one-time delay is acceptable. - Async: row-level backfill of legacy NULL ADVERTs (chunked 5000 / 100ms yield). On a 100K-ADVERT prod DB, this completes in seconds in the background; HTTP is fully available throughout. - Safety: queries handle NULL gracefully — a node whose ADVERTs haven't backfilled yet returns empty, identical to today's behaviour for unknown pubkeys. No half-state regression. ## Out of scope (intentionally) The free-text `LIKE` paths the issue explicitly leaves alone (e.g. user-typed packet search) are untouched. Only the pubkey-attribution sites get the column treatment. ## Cycle-3 review fixes \| Finding \| Status \| Commit \| \|---\|---\|---\| \| M1c — async-contract test was tautological (test's own `go`, not production's) \| Fixed \| `23ace71` (red) → `a05b50c` (green) \| \| m1c — package-global atomic resets unsafe under `t.Parallel()` \| Fixed (`// DO NOT t.Parallel` comment + `Reset()` helper) \| rolled into `23ace71` / `241ec69` \| \| m2c — `/api/healthz` read 3 atomics non-atomically (torn snapshot) \| Fixed (single RWMutex-guarded snapshot + race test) \| `241ec69` \| \| n3c.m1 — vestigial OR-scaffolding in `QueryMultiNodePackets` \| Fixed (cleanup) \| `5a53ceb` \| \| n3c.m2 — verify PR body language about `ALTER` vs `CREATE INDEX` \| Verified accurate (already corrected in cycle 2) \| (no change) \| \| n3c.m3 — `json.Unmarshal` per row in backfill → could use SQL `json_extract` \| Deferred as known followup — pure perf optimization (current per-row Unmarshal is correct, just slower); SQL rewrite would unwind the chunked-yield architecture and is non-trivial. Acceptable for one-time backfill at boot on legacy DBs. \| ### M1c implementation detail `startFromPubkeyBackfill(dbPath, chunkSize, yieldDuration)` is now the single production entry point used by `main.go`. It internally does `go backfillFromPubkeyAsync(...)`. The test calls `startFromPubkeyBackfill` (no `go` prefix) and asserts the dispatch returns within 50ms — so if anyone removes the `go` keyword inside the wrapper, the test fails. Manually verified: removing the `go` keyword causes `TestBackfillFromPubkey_DoesNotBlockBoot` to fail with "backfill dispatch took ~1s (>50ms): not async — would block boot." ### m2c implementation detail `fromPubkeyBackfillTotal/Processed/Done` are now plain `int64`/`bool` package globals guarded by a single `sync.RWMutex`. `fromPubkeyBackfillSnapshot()` returns all three under one RLock. `TestHealthzFromPubkeyBackfillConsistentSnapshot` races a writer (lock-step total/processed updates with periodic done flips) against 8 readers hammering `/api/healthz`, asserting `processed<=total` and `(done => processed==total)` on every response. Verified the test catches torn reads (manually injected a 3-RLock implementation; test failed within milliseconds with "processed>total" and "done=true but processed!=total" errors). --------- Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: openclaw-bot <bot@openclaw.dev>	2026-05-06 23:50:44 -07:00
Kpa-clawbot	136e1d23c8	feat(#730 ): foreign-advert detection — flag instead of silent drop (#1084 ) ## Summary Partial fix for #730 (M1 only — M2 frontend and M3 alerting deferred). Today the ingestor silently drops ADVERTs whose GPS lies outside the configured `geo_filter` polygon. That's the wrong default for an analytics tool — operators get zero visibility into bridged or leaked meshes. This PR makes the new default flag, don't drop: foreign adverts are stored, the node row is tagged `foreign_advert=1`, and the API surfaces `"foreign": true` so dashboards / map overlays can be built on top. ## Behavior \| Mode \| What happens to an ADVERT outside `geo_filter` \| \|---\|---\| \| (default) flag \| Stored, marked `foreign_advert=1`, exposed via API \| \| drop (legacy) \| Silently dropped (preserves old behavior for ops who want it) \| ## What's done (M1 — Backend) - ingestor stores foreign adverts instead of dropping - `nodes.foreign_advert` column added (migration) - `/api/nodes` and `/api/nodes/{pk}` expose `foreign: true` field - Config: `geofilter.action: "flag"\|"drop"` (default `flag`) - Tests + config docs ## What's NOT done (deferred to M2 + M3) - M2 — Frontend: Map overlay showing foreign adverts as distinct markers, foreign-advert filter on packets/nodes pages, dedicated foreign-advert dashboard - M3 — Alerting: Time-series detection of bridging events, alert when foreign advert rate spikes, identify bridge entry-point nodes Issue #730 remains open for M2 and M3. --------- Co-authored-by: corescope-bot <bot@corescope>	2026-05-05 01:58:52 -07:00
Kpa-clawbot	1f4969c1a6	fix(#770 ): treat region 'All' as no-filter + document region behavior (#1026 ) ## Summary Fixes #770 — selecting "All" in the region filter dropdown produced an empty channel list. ## Root cause `normalizeRegionCodes` (cmd/server/db.go) treated any non-empty input as a literal IATA code. The frontend region filter labels its catch-all option "All"; while `region-filter.js` normally sends an empty string when "All" is selected, any code path that ends up sending `?region=All` (deep-link URLs, manual queries, future callers) caused the function to return `["ALL"]`. Downstream queries then filtered observers for `iata = 'ALL'`, which never matches anything → empty response. ## Fix `normalizeRegionCodes` now treats `All` / `ALL` / `all` (case-insensitive, with optional whitespace, mixed in CSV) as equivalent to an empty value, returning `nil` to signal "no filter". Real IATA codes (`SJC`, `PDX`, `sjc,PDX` → `[SJC PDX]`) still pass through unchanged. This is a defensive server-side fix: a single chokepoint that all region-aware endpoints already flow through (channels, packets, analytics, encrypted channels, observer ID resolution). ## Documentation Expanded `_comment_regions` in `config.example.json` to explain: - How IATA codes are resolved (payload > topic > source config — set in #1012) - What the `regions` map controls (display labels) vs runtime-discovered codes - That observers without an IATA tag only appear under "All Regions" - That the `All` sentinel is server-side safe ## TDD - Red commit (`4f65bf4`): `cmd/server/region_filter_test.go` — `TestNormalizeRegionCodes_AllIsNoFilter` asserts `All` / `ALL` / `all` / `""` / `"All,"` all collapse to `nil`. Compiles, runs, fails on assertion (`got [ALL], want nil`). Companion test `TestNormalizeRegionCodes_RealCodesPreserved` locks in that `sjc,PDX` still returns `[SJC PDX]`. - Green commit (`c9fb965`): two-line change in `normalizeRegionCodes` + docs update. ## Verification ``` $ go test -run TestNormalizeRegionCodes -count=1 ./cmd/server ok github.com/corescope/server 0.023s $ go test -count=1 ./cmd/server ok github.com/corescope/server 21.454s ``` Full suite green; no existing region tests regressed. Fixes #770 --------- Co-authored-by: Kpa-clawbot <bot@corescope>	2026-05-03 19:50:01 -07:00
Kpa-clawbot	df69a17718	feat(#772 ): short pubkey-prefix URLs for mesh sharing (#1016 ) ## Summary Fixes #772 — adds a short-URL form for node detail pages so operators can paste node links into a mesh chat without bringing along a 64-hex-char public key. ## Approach Pubkey-prefix resolution (no allocator, no lookup table). - The SPA hash route `#/nodes/<key>` already accepts whatever pubkey-shaped string the user pastes; the front end forwards it to `GET /api/nodes/<key>`. - When that lookup misses and the path is 8..63 hex chars, the backend now calls `DB.GetNodeByPrefix` and: - returns the matching node when exactly one node has that prefix, - returns 409 Conflict when multiple nodes share the prefix (with a "use a longer prefix" hint), - falls through to the existing 404 otherwise. - 8 hex chars = 32 bits of entropy, which is enough for fleets in the low thousands. Operators can extend to 10–12 chars if collisions become common. - The full-screen node detail card gets a new 📡 Copy short URL button that copies `…/#/nodes/<first 8 hex chars>`. ### Why not an opaque ID table (`/s/<id>`)? Considered and rejected: - Needs persistence + an allocator + cleanup story. - IDs aren't self-describing — operators can't sanity-check them. - IDs don't survive a DB rebuild. - 32 bits of pubkey already buys us collision resistance with zero moving parts. If the directory grows past the point where 8-char prefixes routinely collide, we can extend the minimum length without changing the URL shape. ## Changes - `cmd/server/db.go` — new `GetNodeByPrefix(prefix)` returning `(node, ambiguous, error)`. Validates hex; rejects <8 chars; `LIMIT 2` to detect collisions cheaply. - `cmd/server/routes.go` — `handleNodeDetail` falls back to prefix resolution; canonicalizes pubkey downstream; emits 409 on ambiguity; honors blacklist on the resolved pubkey. - `public/nodes.js` — adds 📡 Copy short URL button + handler on the full-screen node detail card. - `cmd/server/short_url_test.go` — Go tests (red-then-green). - `test-e2e-playwright.js` — E2E: navigates via prefix-only URL and asserts the new button surfaces. ## TDD evidence - Red commit: `2dea97a` — tests added with a stub `GetNodeByPrefix` returning `(nil, false, nil)`. All four assertions failed (assertion failures, not build errors): expected node got nil; expected ambiguous=true got false; route 404 vs expected 200/409. - Green commit: `9b8f146` — implementation lands; `go test ./...` passes locally in `cmd/server`. ## Compatibility - Existing 64-char pubkey URLs are untouched (exact lookup runs first). - Blacklist is enforced both on the raw input and on the resolved pubkey. - No new config knobs. ## What I did not touch - `cmd/server/db_test.go`, other route tests — unchanged. - Packet-detail short URLs (issue scopes nodes; revisit in a follow-up if asked). Fixes #772 --------- Co-authored-by: clawbot <bot@corescope.local>	2026-05-03 17:40:54 -07:00
Kpa-clawbot	dd2f044f2b	fix: cache RW SQLite connection + dedup DBConfig (closes #921 ) (#982 ) Closes #921 ## Summary Follow-up to #920 (incremental auto-vacuum). Addresses both items from the adversarial review: ### 1. RW connection caching Previously, every call to `openRW(dbPath)` opened a new SQLite RW connection and closed it after use. This happened in: - `runIncrementalVacuum` (~4x/hour) - `PruneOldPackets`, `PruneOldMetrics`, `RemoveStaleObservers` - `buildAndPersistEdges`, `PruneNeighborEdges` - All neighbor persist operations Now a single `*sql.DB` handle (with `MaxOpenConns(1)`) is cached process-wide via `cachedRW(dbPath)`. The underlying connection pool manages serialization. The original `openRW()` function is retained for one-shot test usage. ### 2. DBConfig dedup `DBConfig` was defined identically in both `cmd/server/config.go` and `cmd/ingestor/config.go`. Extracted to `internal/dbconfig/` as a shared package; both binaries now use a type alias (`type DBConfig = dbconfig.DBConfig`). ## Tests added \| Test \| File \| \|------\|------\| \| `TestCachedRW_ReturnsSameHandle` \| `cmd/server/rw_cache_test.go` \| \| `TestCachedRW_100Calls_SingleConnection` \| `cmd/server/rw_cache_test.go` \| \| `TestGetIncrementalVacuumPages_Default` \| `internal/dbconfig/dbconfig_test.go` \| \| `TestGetIncrementalVacuumPages_Configured` \| `internal/dbconfig/dbconfig_test.go` \| ## Verification ``` ok github.com/corescope/server 20.069s ok github.com/corescope/ingestor 47.117s ok github.com/meshcore-analyzer/dbconfig 0.003s ``` Both binaries build cleanly. 100 sequential `cachedRW()` calls return the same handle with exactly 1 entry in the cache map. --------- Co-authored-by: you <you@example.com>	2026-05-02 20:15:30 -07:00
Kpa-clawbot	3364eed303	feat: separate "Last Status Update" from "Last Packet Observation" for observers (v3 rebase) (#969 ) Rebased version of #968 (which was itself a rebase of #905) — resolves merge conflict with #906 (clock-skew UI) that landed on master. ## Conflict resolution `public/observers.js` — master (#906) added "Clock Offset" column to observer table; #968 split "Last Seen" into "Last Status" + "Last Packet" columns. Combined both: the table now has Status \| Name \| Region \| Last Status \| Last Packet \| Packets \| Packets/Hour \| Clock Offset \| Uptime. ## What this PR adds (unchanged from #968/#905) - `last_packet_at` column in observers DB table - Separate "Last Status Update" and "Last Packet Observation" display in observers list and detail page - Server-side migration to add the column automatically - Backfill heuristic for existing data - Tests for ingestor and server ## Verification - All Go tests pass (`cmd/server`, `cmd/ingestor`) - Frontend tests pass (`test-packets.js`, `test-hash-color.js`) - Built server, hit `/api/observers` — `last_packet_at` field present in JSON - Observer table header has all 9 columns including both Last Packet and Clock Offset ## Prior PRs - #905 — original (conflicts with master) - #968 — first rebase (conflicts after #906 landed) - This PR — second rebase, resolves #906 conflict Supersedes #968. Closes #905. --------- Co-authored-by: you <you@example.com>	2026-05-02 12:03:42 -07:00
Kpa-clawbot	568de4b441	fix(observers): exclude soft-deleted observers from /api/observers and totalObservers (#954 ) ## Bug `/api/observers` returned soft-deleted (inactive=1) observers. Operators saw stale observers in the UI even after the auto-prune marked them inactive on schedule. Reproduced on staging: 14 observers older than 14 days returned by the API; all of them had `inactive=1` in the DB. ## Root cause `DB.GetObservers()` (`cmd/server/db.go:974`) ran `SELECT ... FROM observers ORDER BY last_seen DESC` with no WHERE filter. The `RemoveStaleObservers` path correctly soft-deletes by setting `inactive=1`, but the read path didn't honor it. `statsRow` (`cmd/server/db.go:234`) had the same bug — `totalObservers` count included soft-deleted rows. ## Fix Add `WHERE inactive IS NULL OR inactive = 0` to both: ```go // GetObservers "SELECT ... FROM observers WHERE inactive IS NULL OR inactive = 0 ORDER BY last_seen DESC" // statsRow.TotalObservers "SELECT COUNT() FROM observers WHERE inactive IS NULL OR inactive = 0" ``` `NULL` check preserves backward compatibility with rows from before the `inactive` migration. ## Tests Added regression `TestGetObservers_ExcludesInactive`: - Seed two observers, mark one inactive, assert `GetObservers()` returns only the other. - Anti-tautology gate verified*: reverting the WHERE clause causes the test to fail with `expected 1 observer, got 2` and `inactive observer obs2 should be excluded`. `go test ./...` passes (19.6s). ## Out of scope - `GetObserverByID` lookup at line 1009 still returns inactive observers — this is intentional, so an old deep link to `/observers/<id>` shows "inactive" rather than 404. - Frontend may also have its own caching layer; this fix is server-side only. --------- Co-authored-by: Kpa-clawbot <bot@example.invalid> Co-authored-by: you <you@example.com> Co-authored-by: KpaBap <kpabap@gmail.com>	2026-05-01 17:51:08 +00:00
Kpa-clawbot	a605518d6d	fix(#881 ): per-observation raw_hex — each observer sees different bytes on air (#882 ) ## Problem Each MeshCore observer receives a physically distinct over-the-air byte sequence for the same transmission (different path bytes, flags/hops remaining). The `observations` table stored only `path_json` per observer — all observations pointed at one `transmissions.raw_hex`. This prevented the hex pane from updating when switching observations in the packet detail view. ## Changes \| Layer \| Change \| \|-------\|--------\| \| Schema \| `ALTER TABLE observations ADD COLUMN raw_hex TEXT` (nullable). Migration: `observations_raw_hex_v1` \| \| Ingestor \| `stmtInsertObservation` now stores per-observer `raw_hex` from MQTT payload \| \| View \| `packets_v` uses `COALESCE(o.raw_hex, t.raw_hex)` — backward compatible with NULL historical rows \| \| Server \| `enrichObs` prefers `obs.RawHex` when non-empty, falls back to `tx.RawHex` \| \| Frontend \| No changes — `effectivePkt.raw_hex` already flows through `renderDetail` \| ## Tests - Ingestor: `TestPerObservationRawHex` — two MQTT packets for same hash from different observers → both stored with distinct raw_hex - Server: `TestPerObservationRawHexEnrich` — enrichObs returns per-obs raw_hex when present, tx fallback when NULL - E2E: Playwright assertion in `test-e2e-playwright.js` for hex pane update on observation switch E2E assertion added: `test-e2e-playwright.js:1794` ## Scope - Historical observations: raw_hex stays NULL, UI falls back to transmission raw_hex silently - No backfill, no path_json reconstruction, no frontend changes Closes #881 --------- Co-authored-by: you <you@example.com>	2026-04-21 13:45:29 -07:00
Kpa-clawbot	886aabf0ae	fix(#827 ): /api/packets/{hash} falls back to DB when in-memory store misses (#831 ) Closes #827. ## Problem `/api/packets/{hash}` only consulted the in-memory `PacketStore`. When a packet aged out of memory, the handler 404'd — even though SQLite still had it and `/api/nodes/{pubkey}` `recentAdverts` (which reads from the DB) was actively surfacing the hash. Net effect: the Analyze → link on older adverts in the node detail page led to a dead "Not found". Two-store inconsistency: DB has the packet, in-memory doesn't, node detail surfaces it from DB → packet detail can't serve it. ## Fix In `handlePacketDetail`: - After in-memory miss, fall back to `db.GetPacketByHash` (already existed) for hash lookups, and `db.GetTransmissionByID` for numeric IDs. - Track when the result came from the DB; if so and the store has no observations, populate from DB via a new `db.GetObservationsForHash` so the response shows real observations instead of the misleading `observation_count = 1` fallback. ## Tests - `TestPacketDetailFallsBackToDBWhenStoreMisses` — insert a packet directly into the DB after `store.Load()`, confirm store doesn't have it, assert 200 + populated observations. - `TestPacketDetail404WhenAbsentFromBoth` — neither store nor DB → 404 (no false positives). - `TestPacketDetailPrefersStoreOverDB` — both have it; store result wins (no double-fetch). - `TestHandlePacketDetailNoStore` updated: it previously asserted the old buggy 404 behavior; now asserts the correct DB-fallback 200. All `go test ./... -run "PacketDetail\|Packet\|GetPacket"` and the full `cmd/server` suite pass. ## Out of scope The `/api/packets?hash=` filter is the live in-memory list endpoint and intentionally store-only for performance. Not touched here — happy to file a follow-up if you'd rather harmonise. ## Repro context Verified against prod with a recently-adverting repeater whose recent advert hash lives in `recentAdverts` (DB) but had been evicted from the in-memory store; pre-fix 404, post-fix 200 with full observations. Co-authored-by: you <you@example.com>	2026-04-20 22:50:01 -07:00
Kpa-clawbot	d7fe24e2db	Fix channel filter on Packets page (UI + API) — #812 (#816 ) Closes #812 ## Root causes Server (`/api/packets?channel=…` returned identical totals): The handler in `cmd/server/routes.go` never read the `channel` query parameter into `PacketQuery`, so it was silently ignored by both the SQLite path (`db.go::buildTransmissionWhere`) and the in-memory path (`store.go::filterPackets`). The codebase already had everything else in place — the `channel_hash` column with an index from #762, decoded `channel` / `channelHashHex` fields on each packet — it just wasn't wired up. UI (`/#/packets` had no channel filter): `public/packets.js` rendered observer / type / time-window / region filters but no channel control, and didn't read `?channel=` from the URL. ## Fix ### Server - New `Channel` field on `PacketQuery`; `handlePackets` reads `r.URL.Query().Get("channel")`. - DB path filters by the indexed `channel_hash` column (exact match). - In-memory path: helper `packetMatchesChannel` matches `decoded.channel` (plaintext, e.g. `#test`, `public`) or `enc_<HEX>` against `channelHashHex` for undecryptable GRP_TXT. Uses cached `ParsedDecoded()` so it's O(1) after first parse. Fast-path index guards and the grouped-cache key updated to include channel. - Regression test (`channel_filter_test.go`): `channel=#test` returns ≥1 GRP_TXT packet and fewer than baseline; `channel=nonexistentchannel` returns `total=0`. ### UI - New `<select id="fChannel">` populated from `/api/channels`. - Round-trips via `?channel=…` on the URL hash (read on init, written on change). - Pre-seeds the current value as an option so encrypted hashes not in `/api/channels` still display as selected on reload. - On change, calls `loadPackets()` so the server-side filter applies before pagination. ## Perf Filter adds at most one cached map lookup per packet (DB path uses indexed column, store path uses `ParsedDecoded()` cache). Staging baseline 149–190 ms for `?channel=#test&limit=50`; the new comparison is negligible. Target ≤ 500 ms preserved. ## Tests `cd cmd/server && go test ./... -count=1 -timeout 120s` → PASS. --------- Co-authored-by: you <you@example.com>	2026-04-20 21:46:34 -07:00
Kpa-clawbot	bf674ebfa2	feat: validate advert signatures on ingest, reject corrupt packets (#794 ) ## Summary Validates ed25519 signatures on ADVERT packets during MQTT ingest. Packets with invalid signatures are rejected before storage, preventing corrupt/truncated adverts from polluting the database. ## Changes ### Ingestor (`cmd/ingestor/`) - Signature validation on ingest: After decoding an ADVERT, checks `SignatureValid` from the decoder. Invalid signatures → packet dropped, never stored. - Config flag: `validateSignatures` (default `true`). Set to `false` to disable validation for backward compatibility with existing installs. - `dropped_packets` table: New SQLite table recording every rejected packet with full attribution: - `hash`, `raw_hex`, `reason`, `observer_id`, `observer_name`, `node_pubkey`, `node_name`, `dropped_at` - Indexed on `observer_id` and `node_pubkey` for investigation queries - `SignatureDrops` counter: New atomic counter in `DBStats`, logged in periodic stats output as `sig_drops=N` - Retention: `dropped_packets` pruned alongside metrics on the same `retention.metricsDays` schedule ### Server (`cmd/server/`) - `GET /api/dropped-packets` (API key required): Returns recent drops with optional `?observer=` and `?pubkey=` filters, `?limit=` (default 100, max 500) - `signatureDrops` field added to `/api/stats` response (count from `dropped_packets` table) ### Tests (8 new) \| Test \| What it verifies \| \|------\|-----------------\| \| `TestSigValidation_ValidAdvertStored` \| Valid advert passes validation and is stored \| \| `TestSigValidation_TamperedSignatureDropped` \| Tampered signature → dropped, recorded in `dropped_packets` with correct fields \| \| `TestSigValidation_TruncatedAppdataDropped` \| Truncated appdata invalidates signature → dropped \| \| `TestSigValidation_DisabledByConfig` \| `validateSignatures: false` skips validation, stores tampered packet \| \| `TestSigValidation_DropCounterIncrements` \| Counter increments correctly across multiple drops \| \| `TestSigValidation_LogContainsFields` \| `dropped_packets` row contains hash, reason, observer, pubkey, name \| \| `TestPruneDroppedPackets` \| Old entries pruned, recent entries retained \| \| `TestShouldValidateSignatures_Default` \| Config helper returns correct defaults \| ### Config example ```json { "validateSignatures": true } ``` Fixes #793 --------- Co-authored-by: you <you@example.com>	2026-04-18 11:39:13 -07:00
Joel Claw	fa3f623bd6	feat: add observer retention — remove stale observers after configurable days (#764 ) ## Summary Observers that stop actively sending data now get removed after a configurable retention period (default 14 days). Previously, observers remained in the `observers` table forever. This meant nodes that were once observers for an instance but are no longer connected (even if still active in the mesh elsewhere) would continue appearing in the observer list indefinitely. ## Key Design Decisions - Active data requirement: `last_seen` is only updated when the observer itself sends packets (via `stmtUpdateObserverLastSeen`). Being seen by another node does NOT update this field. So an observer must actively send data to stay listed. - Default: 14 days — observers not seen in 14 days are removed - `-1` = keep forever — for users who want observers to never be removed - `0` = use default (14 days) — same as not setting the field - Runs on startup + daily ticker — staggered 3 minutes after metrics prune to avoid DB contention ## Changes \| File \| Change \| \|------\|--------\| \| `cmd/ingestor/config.go` \| Add `ObserverDays` to `RetentionConfig`, add `ObserverDaysOrDefault()` \| \| `cmd/ingestor/db.go` \| Add `RemoveStaleObservers()` — deletes observers with `last_seen` before cutoff \| \| `cmd/ingestor/main.go` \| Wire up startup + daily ticker for observer retention \| \| `cmd/server/config.go` \| Add `ObserverDays` to `RetentionConfig`, add `ObserverDaysOrDefault()` \| \| `cmd/server/db.go` \| Add `RemoveStaleObservers()` (server-side, uses read-write connection) \| \| `cmd/server/main.go` \| Wire up startup + daily ticker, shutdown cleanup \| \| `cmd/server/routes.go` \| Admin prune API now also removes stale observers \| \| `config.example.json` \| Add `observerDays: 14` with documentation \| \| `cmd/ingestor/coverage_boost_test.go` \| 4 tests: basic removal, empty store, keep forever (-1), default (0→14) \| \| `cmd/server/config_test.go` \| 4 tests: `ObserverDaysOrDefault` edge cases \| ## Config Example ```json { "retention": { "nodeDays": 7, "observerDays": 14, "packetDays": 30, "_comment": "observerDays: -1 = keep forever, 0 = use default (14)" } } ``` ## Admin API The `/api/admin/prune` endpoint now also removes stale observers (using `observerDays` from config) and reports `observers_removed` in the response alongside `packets_deleted`. ## Test Plan - [x] `TestRemoveStaleObservers` — old observer removed, recent observer kept - [x] `TestRemoveStaleObserversNone` — empty store, no errors - [x] `TestRemoveStaleObserversKeepForever` — `-1` keeps even year-old observers - [x] `TestRemoveStaleObserversDefault` — `0` defaults to 14 days - [x] `TestObserverDaysOrDefault` (ingestor) — nil/zero/positive/keep-forever - [x] `TestObserverDaysOrDefault` (server) — nil/zero/positive/keep-forever - [x] Both binaries compile cleanly (`go build`) - [ ] Manual: verify observer count decreases after retention period on a live instance	2026-04-17 09:24:40 -07:00
Kpa-clawbot	0e286d85fd	fix: channel query performance — add channel_hash column, SQL-level filtering (#762 ) (#763 ) ## Problem Channel API endpoints scan entire DB — 2.4s for channel list, 30s for messages. ## Fix - Added `channel_hash` column to transmissions (populated on ingest, backfilled on startup) - `GetChannels()` rewrites to GROUP BY channel_hash (one row per channel vs scanning every packet) - `GetChannelMessages()` filters by channel_hash at SQL level with proper LIMIT/OFFSET - 60s cache for channel list - Index: `idx_tx_channel_hash` for fast lookups Expected: 2.4s → <100ms for list, 30s → <500ms for messages. Fixes #762 --------- Co-authored-by: you <you@example.com>	2026-04-16 00:09:36 -07:00
Kpa-clawbot	84f03f4f41	fix: hide undecryptable channel messages by default (#727 ) (#728 ) ## Problem Channels page shows 53K 'Unknown' messages — undecryptable GRP_TXT packets with no content. Pure noise. ## Fix - Backend: channels API filters out undecrypted messages by default - `?includeEncrypted=true` param to include them - Frontend: 'Show encrypted' toggle in channels sidebar - Unknown channels grayed out with '(no key)' label - Toggle persists in localStorage Fixes #727 --------- Co-authored-by: you <you@example.com>	2026-04-13 19:40:20 +00:00
Kpa-clawbot	71be54f085	feat: DB-backed channel messages for full history (#725 M1) (#726 ) ## Summary Switches channel API endpoints to query SQLite instead of the in-memory packet store, giving users access to the full message history. Implements #725 (M1 only — DB-backed channel messages). Does NOT close #725 — M2-M5 (custom channels, PSK, persistence, retroactive decryption) remain. ## Problem Channel endpoints (`/api/channels`, `/api/channels/{hash}/messages`) preferred the in-memory packet store when available. The store is bounded by `packetStore.maxMemoryMB` — typically showing only recent messages. The SQLite database has the complete history (weeks/months of channel messages) but was only used as a fallback when the store was nil (never in production). ## Fix Reversed the preference order: DB first, in-memory store fallback. Region filtering added to the DB path. Co-authored-by: you <you@example.com>	2026-04-12 23:22:52 -07:00
Kpa-clawbot	e893a1b3c4	fix: index relay hops in byNode for liveness tracking (#708 ) ## Problem Nodes that only appear as relay hops in packet paths (via `resolved_path`) were never indexed in `byNode`, so `last_heard` was never computed for them. This made relay-only nodes show as dead/stale even when actively forwarding traffic. Fixes #660 ## Root Cause `indexByNode()` only indexed pubkeys from decoded JSON fields (`pubKey`, `destPubKey`, `srcPubKey`). Relay nodes appearing in `resolved_path` were ignored entirely. ## Fix `indexByNode()` now also iterates: 1. `ResolvedPath` entries from each observation 2. `tx.ResolvedPath` (best observation's resolved path, used for DB-loaded packets) A per-call `indexed` set prevents double-indexing when the same pubkey appears in both decoded JSON and resolved path. Extracted `addToByNode()` helper to deduplicate the nodeHashes/byNode append logic. ## Scope Phase 1 only — server-side in-memory indexing. No DB changes, no ingestor changes. This makes `last_heard` reflect relay activity with zero risk to persistence. ## Tests 5 new test cases in `TestIndexByNodeResolvedPath`: - Resolved path pubkeys from observations get indexed - Null entries in resolved path are skipped - Relay-only nodes (no decoded JSON match) appear in `byNode` - Dedup between decoded JSON and resolved path - `tx.ResolvedPath` indexed when observations are empty All existing tests pass unchanged. ## Complexity O(observations × path_length) per packet — typically 1-3 observations × 1-3 hops. No hot-path regression. --------- Co-authored-by: you <you@example.com>	2026-04-11 21:25:42 -07:00
Kpa-clawbot	fcba2a9f3d	fix: set PRAGMA busy_timeout on all RW SQLite connections (#707 ) ## Problem `SQLITE_BUSY` contention between the ingestor and server's async persistence goroutine drops `resolved_path` and `neighbor_edges` updates. The DSN parameter `_busy_timeout=10000` may not be honored by the modernc/sqlite driver. ## Fix - `openRW()` now sets `PRAGMA busy_timeout = 5000` after opening the connection, guaranteeing SQLite retries for up to 5 seconds before returning `SQLITE_BUSY` - Refactored `PruneOldPackets` and `PruneOldMetrics` to use `openRW()` instead of duplicating connection setup — all RW connections now get consistent busy_timeout handling - Added test verifying the pragma is set correctly ## Changes \| File \| Change \| \|------\|--------\| \| `cmd/server/neighbor_persist.go` \| `openRW()` sets `PRAGMA busy_timeout = 5000` after open \| \| `cmd/server/db.go` \| `PruneOldPackets` and `PruneOldMetrics` use `openRW()` instead of inline `sql.Open` \| \| `cmd/server/neighbor_persist_test.go` \| `TestOpenRW_BusyTimeout` verifies pragma is set \| ## Performance No performance impact — `PRAGMA busy_timeout` is a connection-level setting with zero overhead on uncontended writes. Under contention, it converts immediate `SQLITE_BUSY` failures into brief retries (up to 5s), which is strictly better than dropping data. Fixes #705 --------- Co-authored-by: you <you@example.com>	2026-04-11 21:25:23 -07:00
Kpa-clawbot	232770a858	feat(rf-health): M2 — airtime, error rate, battery charts with delta computation (#605 ) ## M2: Airtime + Channel Quality + Battery Charts Implements M2 of #600 — server-side delta computation and three new charts in the RF Health detail view. ### Backend Changes Delta computation for cumulative counters (`tx_air_secs`, `rx_air_secs`, `recv_errors`): - Computes per-interval deltas between consecutive samples - Reboot handling: detects counter reset (current < previous), skips that delta, records reboot timestamp - Gap handling: if time between samples > 2× interval, inserts null (no interpolation) - Returns `tx_airtime_pct` and `rx_airtime_pct` as percentages (delta_secs / interval_secs × 100) - Returns `recv_error_rate` as delta_errors / (delta_recv + delta_errors) × 100 `resolution` query param on `/api/observers/{id}/metrics`: - `5m` (default) — raw samples - `1h` — hourly aggregates (GROUP BY hour with AVG/MAX) - `1d` — daily aggregates Schema additions: - `packets_sent` and `packets_recv` columns added to `observer_metrics` (migration) - Ingestor parses these fields from MQTT stats messages API response now includes: - `tx_airtime_pct`, `rx_airtime_pct`, `recv_error_rate` (computed deltas) - `reboots` array with timestamps of detected reboots - `is_reboot_sample` flag on affected samples ### Frontend Changes Three new charts in the RF Health detail view, stacked vertically below noise floor: 1. Airtime chart — TX (red) + RX (blue) as separate SVG lines, Y-axis 0-100%, direct labels at endpoints 2. Error Rate chart — `recv_error_rate` line, shown only when data exists 3. Battery chart — voltage line with 3.3V low reference, shown only when battery_mv > 0 All charts: - Share X-axis and time range (aligned vertically) - Reboot markers as vertical hairlines spanning all charts - Direct labels on data (no legends) - Resolution auto-selected: `1h` for 7d/30d ranges - Charts hidden when no data exists ### Tests - `TestComputeDeltas`: normal deltas, reboot detection, gap detection - `TestGetObserverMetricsResolution`: 5m/1h/1d downsampling verification - Updated `TestGetObserverMetrics` for new API signature --------- Co-authored-by: you <you@example.com>	2026-04-04 23:17:17 -07:00
you	747aea37b7	fix(rf-health): add region filter support to metrics summary Frontend passes RegionFilter query string to summary API. Backend filters results by observer IATA region. Added iata field to MetricsSummaryRow.	2026-04-05 06:00:42 +00:00
Kpa-clawbot	6f35d4d417	feat: RF Health Dashboard M1 — observer metrics + small multiples grid (#604 ) ## RF Health Dashboard — M1: Observer Metrics Storage, API & Small Multiples Grid Implements M1 of #600. ### What this does Adds a complete RF health monitoring pipeline: MQTT stats ingestion → SQLite storage → REST API → interactive dashboard with small multiples grid. ### Backend Changes Ingestor (`cmd/ingestor/`) - New `observer_metrics` table via migration system (`_migrations` pattern) - Parse `tx_air_secs`, `rx_air_secs`, `recv_errors` from MQTT status messages (same pattern as existing `noise_floor` and `battery_mv`) - `INSERT OR REPLACE` with timestamps rounded to nearest 5-min interval boundary (using ingestor wall clock, not observer timestamps) - Missing fields stored as NULLs — partial data is always better than no data - Configurable retention pruning: `retention.metricsDays` (default 30), runs on startup + every 24h Server (`cmd/server/`) - `GET /api/observers/{id}/metrics?since=...&until=...` — per-observer time-series data - `GET /api/observers/metrics/summary?window=24h` — fleet summary with current NF, avg/max NF, sample count - `parseWindowDuration()` supports `1h`, `24h`, `3d`, `7d`, `30d` etc. - Server-side metrics retention pruning (same config, staggered 2min after packet prune) ### Frontend Changes RF Health tab (`public/analytics.js`, `public/style.css`) - Small multiples grid showing all observers simultaneously — anomalies pop out visually - Per-observer cell: name, current NF value, battery voltage, sparkline, avg/max stats - NF status coloring: warning (amber) at ≥-100 dBm, critical (red) at ≥-85 dBm — text color only, no background fills - Click any cell → expanded detail view with full noise floor line chart - Reference lines with direct text labels (`-100 warning`, `-85 critical`) — not color bands - Min/max points labeled directly on the chart - Time range selector: preset buttons (1h/3h/6h/12h/24h/3d/7d/30d) + custom from/to datetime picker - Deep linking: `#/analytics?tab=rf-health&observer=...&range=...` - All charts use SVG, matching existing analytics.js patterns - Responsive: 3-4 columns on desktop, 1 on mobile ### Design Decisions (from spec) - Labels directly on data, not in legends - Reference lines with text labels, not color bands - Small multiples grid, not card+accordion (Tufte: instant visual fleet comparison) - Ingestor wall clock for all timestamps (observer clocks may drift) ### Tests Added Ingestor tests: - `TestRoundToInterval` — 5 cases for rounding to 5-min boundaries - `TestInsertMetrics` — basic insertion with all fields - `TestInsertMetricsIdempotent` — INSERT OR REPLACE deduplication - `TestInsertMetricsNullFields` — partial data with NULLs - `TestPruneOldMetrics` — retention pruning - `TestExtractObserverMetaNewFields` — parsing tx_air_secs, rx_air_secs, recv_errors Server tests: - `TestGetObserverMetrics` — time-series query with since/until filters, NULL handling - `TestGetMetricsSummary` — fleet summary aggregation - `TestObserverMetricsAPIEndpoints` — DB query verification - `TestMetricsAPIEndpoints` — HTTP endpoint response shape - `TestParseWindowDuration` — duration parsing for h/d formats ### Test Results ``` cd cmd/ingestor && go test ./... → PASS (26s) cd cmd/server && go test ./... → PASS (5s) ``` ### What's NOT in this PR (deferred to M2+) - Server-side delta computation for cumulative counters - Airtime charts (TX/RX percentage lines) - Channel quality chart (recv_error_rate) - Battery voltage chart - Reboot detection and chart annotations - Resolution downsampling (1h, 1d aggregates) - Pattern detection / automated diagnosis --------- Co-authored-by: you <you@example.com>	2026-04-04 22:21:35 -07:00
Kpa-clawbot	6ae62ce535	perf: make txToMap observations lazy via ExpandObservations flag (#595 ) ## Summary `txToMap()` previously always allocated observation sub-maps for every packet, even though the `/api/packets` handler immediately stripped them via `delete(p, "observations")` unless `expand=observations` was requested. A typical page of 50 packets with ~5 observations each caused 300+ unnecessary map allocations per request. ## Changes - `txToMap`: Add variadic `includeObservations bool` parameter. Observations are only built when `true` is passed, eliminating allocations when they'd just be discarded. - `PacketQuery`: Add `ExpandObservations bool` field to thread the caller's intent through the query pipeline. - `routes.go`: Set `ExpandObservations` based on `expand=observations` query param. Removed the post-hoc `delete(p, "observations")` loop — observations are simply never created when not requested. - Single-packet lookups (`GetPacketByID`, `GetPacketByHash`): Always pass `true` since detail views need observations. - Multi-node/analytics queries: Default (no flag) = no observations, matching prior behavior. ## Testing - Added `TestTxToMapLazyObservations` covering all three cases: no flag, `false`, and `true`. - All existing tests pass (`go test ./...`). ## Perf Impact Eliminates ~250 observation map allocations per /api/packets request (at default page size of 50 with ~5 observations each). This is a constant-factor improvement per request — no algorithmic complexity change. Fixes #374 Co-authored-by: you <you@example.com>	2026-04-04 10:39:30 -07:00
Kpa-clawbot	45d8116880	perf: query only matching node locations in handleObservers (#579 ) ## Summary `handleObservers()` in `routes.go` was calling `GetNodeLocations()` which fetches ALL nodes from the DB just to match ~10 observer IDs against node public keys. With 500+ nodes this is wasteful. ## Changes - `db.go`: Added `GetNodeLocationsByKeys(keys []string)` — queries only the rows matching the given public keys using a parameterized `WHERE LOWER(public_key) IN (?, ?, ...)` clause. - `routes.go`: `handleObservers` now collects observer IDs and calls the targeted method instead of the full-table scan. - `coverage_test.go`: Added `TestGetNodeLocationsByKeys` covering known key, empty keys, and unknown key cases. ## Performance With ~10 observers and 500+ nodes, the query goes from scanning all 500 rows to fetching only ~10. The original `GetNodeLocations()` is preserved for any other callers. Fixes #378 Co-authored-by: you <you@example.com>	2026-04-04 10:14:37 -07:00
Kpa-clawbot	ae38cdefb4	feat: server-side hop resolution at ingest — resolved_path (#556 ) ## Summary Implements server-side hop prefix resolution at ingest time with a persisted neighbor graph. Hop prefixes in `path_json` are now resolved to full 64-char pubkeys at ingest and stored as `resolved_path` on each observation, eliminating the need for client-side resolution via `HopResolver`. Fixes #555 ## What changed ### New file: `cmd/server/neighbor_persist.go` SQLite persistence layer for the neighbor graph and resolved paths: - `neighbor_edges` table creation and management - Load/build/persist neighbor edges from/to SQLite - `resolved_path` column migration on observations - `resolvePathForObs()` — resolves hop prefixes using `resolveWithContext` with 4-tier priority (affinity → geo → GPS → first match) - Cold startup backfill for observations missing `resolved_path` - Async persistence of edges and resolved paths during ingest (non-blocking) ### Modified: `cmd/server/store.go` - `StoreObs` gains `ResolvedPath []string` field - `StoreTx` gains `ResolvedPath []string` (cached from best observation) - `Load()` dynamically includes `resolved_path` in SQL query when column exists - `IngestNewFromDB()` resolves paths at ingest time and persists asynchronously - `pickBestObservation()` propagates `ResolvedPath` to transmission - `txToMap()` and `enrichObs()` include `resolved_path` in API responses - All 7 `pm.resolve()` call sites migrated to `pm.resolveWithContext()` with the persisted graph - Broadcast maps include `resolved_path` per observation ### Modified: `cmd/server/db.go` - `DB` struct gains `hasResolvedPath bool` flag - `detectSchema()` checks for `resolved_path` column existence - Graceful degradation when column is absent (test DBs, old schemas) ### Modified: `cmd/server/main.go` - Startup sequence: ensure tables → load/build graph → backfill resolved paths → re-pick best observations ### Modified: `cmd/server/routes.go` - `mapSliceToTransmissions()` and `mapSliceToObservations()` propagate `resolved_path` - Node paths handler uses `resolveWithContext` with graph ### Modified: `cmd/server/types.go` - `TransmissionResp` and `ObservationResp` gain `ResolvedPath []string` with `omitempty` ### New file: `cmd/server/neighbor_persist_test.go` 16 tests covering: - Path resolution (unambiguous, empty, unresolvable prefixes) - Marshal/unmarshal of resolved_path JSON - SQLite table creation and column migration (idempotent) - Edge persistence and loading - Schema detection - Full Load() with resolved_path - API response serialization (present when set, omitted when nil) ## Design decisions 1. Async persistence* — resolved paths and neighbor edges are written to SQLite in a goroutine to avoid blocking the ingest loop. The in-memory state is authoritative. 2. Schema compatibility — `DB.hasResolvedPath` flag allows the server to work with databases that don't yet have the `resolved_path` column. SQL queries dynamically include/exclude the column. 3. `pm.resolve()` retained — Not removed as dead code because existing tests use it directly. All production call sites now use `resolveWithContext` with the persisted graph. 4. Edge persistence is conservative — Only unambiguous edges (single candidate) are persisted to `neighbor_edges`. Ambiguous prefixes are handled by the in-memory `NeighborGraph` via Jaccard disambiguation. 5. `null` = unresolved — Ambiguous prefixes store `null` in the resolved_path array. Frontend falls back to prefix display. ## Performance - `resolveWithContext` per hop: ~1-5μs (map lookups, no DB queries) - Typical packet has 0-5 hops → <25μs total resolution overhead per packet - Edge/path persistence is async → zero impact on ingest latency - Backfill is one-time on first startup with the new column ## Test results ``` cd cmd/server && go test ./... -count=1 → ok (4.4s) cd cmd/ingestor && go test ./... -count=1 → ok (25.5s) ``` --------- Co-authored-by: you <you@example.com>	2026-04-04 00:20:59 -07:00
efiten	709e5a4776	fix: observer filter drops groups in grouped packets view (#464 ) (#531 ) ## Summary - When `groupByHash=true`, each group only carries its representative (best-path) `observer_id`. The client-side filter was checking only that field, silently dropping groups that were seen by the selected observer but had a different representative. - `loadPackets` now passes the `observer` param to the server so `filterPackets`/`buildGroupedWhere` do the correct "any observation matches" check. - Client-side observer filter in `renderTableRows` is skipped for grouped mode (server already filtered correctly). - Both `db.go` and `store.go` observer filtering extended to support comma-separated IDs (multi-select UI). ## Test plan - [ ] Set an observer filter on the Packets screen with grouping enabled — all groups that have any observation from the selected observer(s) should appear, not just groups where that observer is the representative - [ ] Multi-select two observers — groups seen by either should appear - [ ] Toggle to flat (ungrouped) mode — per-observation filter still works correctly - [ ] Existing grouped packets tests pass: `cd cmd/server && go test ./...` Fixes #464 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: you <you@example.com>	2026-04-03 09:22:37 -07:00
efiten	b1d89d7d9f	fix: apply region filter in GetNodes — was silently ignored (#496 ) (#497 ) ## Summary - `db.GetNodes` accepted a `region` param from the HTTP handler but never used it — every region-filter selection was silently ignored and all nodes were always returned - Added a subquery filtering `nodes.public_key` against ADVERT transmissions (payload_type=4) observed by observers with matching IATA codes - Handles both v2 (`observer_id TEXT`) and v3 (`observer_idx INT`) schemas ## Test plan - [x] 4 new subtests added to `TestGetNodesFiltering`: SJC (1 node), SFO (1 node), SJC,SFO multi (1 node deduped), AMS unknown (0 nodes) - [x] All existing Go tests still pass - [x] Deploy to staging, open `/nodes`, select a region in the filter bar — only nodes observed by observers in that region should appear Closes #496 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: you <you@example.com>	2026-04-02 17:49:57 -07:00
Kpa-clawbot	f87eb3601c	fix: graceful container shutdown for reliable deployments (#453 ) ## Summary Fixes #450 — staging deployment flaky due to container not shutting down cleanly. ## Root Causes 1. Server never closed DB on shutdown — SQLite WAL lock held indefinitely, blocking new container startup 2. `httpServer.Close()` instead of `Shutdown()` — abruptly kills connections instead of draining them 3. No `stop_grace_period` in compose configs — Docker sends SIGTERM then immediately SIGKILL (default 10s is often not enough for WAL checkpoint) 4. Supervisor didn't forward SIGTERM — missing `stopsignal`/`stopwaitsecs` meant Go processes got SIGKILL instead of graceful shutdown 5. Deploy scripts used default `docker stop` timeout — only 10s grace period ## Changes ### Go Server (`cmd/server/`) - Graceful HTTP shutdown: `httpServer.Shutdown(ctx)` with 15s context timeout — drains in-flight requests before closing - WebSocket cleanup: New `Hub.Close()` method sends `CloseGoingAway` frames to all connected clients - DB close on shutdown: Explicitly closes DB after HTTP server stops (was never closed before) - WAL checkpoint: `PRAGMA wal_checkpoint(TRUNCATE)` before DB close — flushes WAL to main DB file and removes WAL/SHM lock files ### Go Ingestor (`cmd/ingestor/`) - WAL checkpoint on shutdown: New `Store.Checkpoint()` method, called before `Close()` - Longer MQTT disconnect timeout: 5s (was 1s) to allow in-flight messages to drain ### Docker Compose (all 4 variants) - Added `stop_grace_period: 30s` and `stop_signal: SIGTERM` ### Supervisor Configs (both variants) - Added `stopsignal=TERM` and `stopwaitsecs=20` to server and ingestor programs ### Deploy Scripts - `deploy-staging.sh`: `docker stop -t 30` with explicit grace period - `deploy-live.sh`: `docker stop -t 30` with explicit grace period ## Shutdown Sequence (after fix) 1. Docker sends SIGTERM to supervisord (PID 1) 2. Supervisord forwards SIGTERM to server + ingestor (waits up to 20s each) 3. Server: stops poller → drains HTTP (15s) → closes WS clients → checkpoints WAL → closes DB 4. Ingestor: stops tickers → disconnects MQTT (5s) → checkpoints WAL → closes DB 5. Docker waits up to 30s total before SIGKILL ## Tests All existing tests pass: - `cd cmd/server && go test ./...` ✅ - `cd cmd/ingestor && go test ./...` ✅ --------- Co-authored-by: you <you@example.com> Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>	2026-04-01 12:19:20 -07:00
efiten	fe314be3a8	feat: geo_filter enforcement, DB pruning, geofilter-builder tool, HB column (#215 ) ## Summary Several features and fixes from a live deployment of the Go v3.0.0 backend. ### geo_filter — full enforcement - Go backend config (`cmd/server/config.go`, `cmd/ingestor/config.go`): added `GeoFilterConfig` struct so `geo_filter.polygon` and `bufferKm` from `config.json` are parsed by both the server and ingestor - Ingestor (`cmd/ingestor/geo_filter.go`, `cmd/ingestor/main.go`): ADVERT packets from nodes outside the configured polygon + buffer are dropped before any DB write — no transmission, node, or observation data is stored - Server API (`cmd/server/geo_filter.go`, `cmd/server/routes.go`): `GET /api/config/geo-filter` endpoint returns the polygon + bufferKm to the frontend; `/api/nodes` responses filter out any out-of-area nodes already in the DB - Frontend (`public/map.js`, `public/live.js`): blue polygon overlay (solid inner + dashed buffer zone) on Map and Live pages, toggled via "Mesh live area" checkbox, state shared via localStorage ### Automatic DB pruning - Add `retention.packetDays` to `config.json` to delete transmissions + observations older than N days on a daily schedule (1 min after startup, then every 24h). Nodes and observers are never pruned. - `POST /api/admin/prune?days=N` for manual runs (requires `X-API-Key` header if `apiKey` is set) ```json "retention": { "nodeDays": 7, "packetDays": 30 } ``` ### tools/geofilter-builder.html Standalone HTML tool (no server needed) — open in browser, click to place polygon points on a Leaflet map, set `bufferKm`, copy the generated `geo_filter` JSON block into `config.json`. ### scripts/prune-nodes-outside-geo-filter.py Utility script to clean existing out-of-area nodes from the database (dry-run + confirm). Useful after first enabling geo_filter on a populated DB. ### HB column in packets table Shows the hop hash size in bytes (1–4) decoded from the path byte of each packet's raw hex. Displayed as HB between Size and Type columns, hidden on small screens. ## Test plan - [x] ADVERT from node outside polygon is not stored (no new row in nodes or transmissions) - [x] `GET /api/config/geo-filter` returns polygon + bufferKm when configured, `{polygon: null, bufferKm: 0}` when not - [x] `/api/nodes` excludes nodes outside polygon even if present in DB - [x] Map and Live pages show blue polygon overlay when configured; checkbox toggles it - [x] `retention.packetDays: 30` deletes old transmissions/observations on startup and daily - [x] `POST /api/admin/prune?days=30` returns `{deleted: N, days: 30}` - [x] `tools/geofilter-builder.html` opens standalone, draws polygon, copies valid JSON - [x] HB column shows 1–4 for all packets in grouped and flat view 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 01:10:56 -07:00
Kpa-clawbot	b51ced8655	Wire channel region filtering end-to-end Pass region through channel message routes, apply DB/store filtering, normalize IATA at read and write boundaries, and add regression coverage for routes/server/ingestor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-30 23:03:56 -07:00
Kpa-clawbot	5aa4fbb600	chore: normalize all files to LF line endings	2026-03-30 22:52:46 -07:00
Kpa-clawbot	f5d0ce066b	refactor: remove packets_v SQL fallbacks — store handles all queries (#220 ) * refactor: remove all packets_v SQL fallbacks — store handles all queries Remove DB fallback paths from all route handlers. The in-memory PacketStore now handles all packet/node/analytics queries. Handlers return empty results or 404 when no store is available instead of falling back to direct DB queries. - Remove else-DB branches from handlePacketDetail, handleNodeHealth, handleNodeAnalytics, handleBulkHealth, handlePacketTimestamps, etc. - Remove unused DB methods (GetPacketByHash, GetTransmissionByID, GetPacketByID, GetObservationsForHash, GetTimestamps, GetNodeHealth, GetNodeAnalytics, GetBulkHealth, etc.) - Remove packets_v VIEW creation from schema - Update tests for new behavior (no-store returns 404/empty, not 500) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address PR #220 review comments Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: KpaBap <kpabap@gmail.com>	2026-03-28 15:25:56 -07:00
Kpa-clawbot	54cbc648e0	feat: decode telemetry from adverts — battery voltage + temperature on nodes Sensor nodes embed telemetry (battery_mv, temperature_c) in their advert appdata after the null-terminated name. This commit adds decoding and storage for both the Go ingestor and Node.js backend. Changes: - decoder.go/decoder.js: Parse telemetry bytes from advert appdata (battery_mv as uint16 LE millivolts, temperature_c as int16 LE /100) - db.go/db.js: Add battery_mv INTEGER and temperature_c REAL columns to nodes and inactive_nodes tables, with migration for existing DBs - main.go/server.js: Update node telemetry on advert processing - server db.go: Include battery_mv/temperature_c in node API responses - Tests: Decoder telemetry tests (positive, negative temp, no telemetry), DB migration test, node telemetry update test, server API shape tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-28 12:07:42 -07:00
Kpa-clawbot	f374a4a775	fix: enforce consistent types between Go ingestor writes and server reads Schema: - observers.noise_floor: INTEGER → REAL (dBm has decimals) - battery_mv, uptime_secs remain INTEGER (always whole numbers) Ingestor write side (cmd/ingestor/db.go): - UpsertObserver now accepts ObserverMeta with battery_mv (int), uptime_secs (int64), noise_floor (float64) - COALESCE preserves existing values when meta is nil - Added migration: cast integer noise_floor values to REAL Ingestor MQTT handler (cmd/ingestor/main.go — already updated): - extractObserverMeta extracts hardware fields from status messages - battery_mv/uptime_secs cast via math.Round to int on write Server read side (cmd/server/db.go): - Observer.BatteryMv: float64 → int (matches INTEGER storage) - Observer.UptimeSecs: float64 → int64 (matches INTEGER storage) - Observer.NoiseFloor: *float64 (unchanged, matches REAL storage) - GetObservers/GetObserverByID: use sql.NullInt64 intermediaries for battery_mv/uptime_secs, sql.NullFloat64 for noise_floor Proto (proto/observer.proto — already correct): - battery_mv: int32, uptime_secs: int64, noise_floor: double Tests: - TestUpsertObserverWithMeta: verifies correct SQLite types via typeof() - TestUpsertObserverMetaPreservesExisting: nil-meta preserves values - TestExtractObserverMeta: float-to-int rounding, empty message - TestSchemaNoiseFloorIsReal: PRAGMA table_info validation - TestObserverTypeConsistency: server reads typed values correctly - TestObserverTypesInGetObservers: list endpoint type consistency Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-28 11:22:14 -07:00
Kpa-clawbot	1619f4857e	fix: noise_floor/battery_mv/uptime_secs scanned as float64 to handle REAL values SQLite stores these as REAL on some instances. Go *int scan silently fails, dropping the entire observer row (404 on detail, missing from list). Reported for YC-Base-Repeater and YC-Work-Repeater. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-28 11:04:49 -07:00
Kpa-clawbot	9ebfd40aa0	fix: filter garbage channel names from /api/channels, fixes #201 Channels with garbage-decrypted names (pre-#197 data still in DB) are now filtered at the API level using the same non-printable character heuristic from #197. Applied in both Node.js server.js and Go server (store.go, db.go). No data is deleted — only filtered from API responses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 22:49:45 -07:00
Kpa-clawbot	77988ded3e	fix: #184-#189 — sanitize names, packetsLast24h, ReadMemStats cache, dup name indicator, heatmap warning #184: Strip non-printable chars (<0x20 except tab/newline) from ADVERT names in Go server decoder, Go ingestor decoder, and Node decoder.js. #185: Add visual (N) badge next to node names when multiple nodes share the same display name (case-insensitive). Shows in list, side pane, and full detail page with 'also known as' links to other keys. #186: Add packetsLast24h field to /api/stats response. #187 #188: Cache runtime.ReadMemStats() with 5s TTL in Go server. #189: Temporarily patch HTMLCanvasElement.prototype.getContext during L.heatLayer().addTo(map) to pass { willReadFrequently: true }, preventing Chrome console warning about canvas readback performance. Tests: 10 new tests for buildDupNameMap + dupNameBadge (143 total frontend). Cache busters bumped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 20:50:08 -07:00
Kpa-clawbot	2435f2eaaf	fix: observation timestamps, leaked fields, perf path normalization - #178: Use strftime ISO 8601 format instead of datetime() for observation timestamps in all SQL queries (v3 + v2 views). Add normalizeTimestamp() helper for non-v3 paths that may store space-separated timestamps. - #179: Strip internal fields (decoded_json, direction, payload_type, raw_hex, route_type, score, created_at) from ObservationResp. Only expose id, transmission_id, observer_id, observer_name, snr, rssi, path_json, timestamp — matching Node.js parity. - #180: Remove _parsedDecoded and _parsedPath from node detail recentAdverts response. These internal/computed fields were leaking to the API. Updated golden shapes.json accordingly. - #181: Use mux route template (GetPathTemplate) for perf stats path normalization, converting {param} to :param for Node.js parity. Fallback to hex regex for unmatched routes. Compile regexes once at package level instead of per-request. fixes #178, fixes #179, fixes #180, fixes #181 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 18:09:36 -07:00
Kpa-clawbot	df63efa78d	fix: poll new observations for existing transmissions (fixes #174 ) The poller only queried WHERE t.id > sinceID, which missed new observations added to transmissions already in the store. The trace page was correct because it always queries the DB directly. Add IngestNewObservations() that polls observations by o.id watermark, adds them to existing StoreTx entries, re-picks best observation, and invalidates analytics caches. The Poller now tracks both lastTxID and lastObsID watermarks. Includes tests for v3, v2, dedup, best-path re-pick, and GetMaxObservationID. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 17:26:26 -07:00
Kpa-clawbot	64bf3744e2	fix: channels stale latest message from observation-timestamp ordering, fixes #171 db.GetChannels() queried packets_v (observation-level rows) ordered by observation timestamp and always overwrote lastMessage. When an older message had a later re-observation, it would overwrite the correct latest message with stale data. Fix: query transmissions table directly (one row per unique message) ordered by first_seen. This ensures lastMessage always reflects the most recently sent message, not the most recently observed one. Also fix db.GetChannelMessages() to use first_seen ordering with schema-aware queries (v2/v3), and add missing distCache/subpathCache invalidation on packet ingestion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 16:01:54 -07:00
Kpa-clawbot	f55a3454aa	feat(go): replace map[string]interface{} with typed Go structs in route handlers Phase 1: Create cmd/server/types.go with ~80 typed response structs matching all proto definitions. Every API response shape is now a compile-time checked struct. Phase 2: Rewire all route handlers in routes.go to construct typed structs instead of map[string]interface{} for response building: - /api/stats -> StatsResponse - /api/health -> HealthResponse - /api/perf -> PerfResponse - /api/config/* -> typed config responses - /api/nodes/* -> NodeListResponse, NodeDetailResponse, etc. - /api/packets/* -> PacketListResponse, PacketDetailResponse - /api/analytics/* -> RFAnalyticsResponse, TopologyResponse, etc. - /api/observers/* -> ObserverListResponse, ObserverResp - /api/channels/* -> ChannelListResponse, ChannelMessagesResponse - /api/traces/* -> TraceResponse - /api/resolve-hops -> ResolveHopsResponse - /api/iata-coords -> IataCoordsResponse (typed IataCoord) - /api/audio-lab/buckets -> AudioLabBucketsResponse - WebSocket broadcast -> WSMessage struct - SlowQuery tracking -> SlowQuery struct (was map) Phase 3 (partial): Add typed store/db methods: - PacketStore.GetCacheStatsTyped() -> CacheStats - PacketStore.GetPerfStoreStatsTyped() -> PerfPacketStoreStats - DB.GetDBSizeStatsTyped() -> SqliteStats Remaining map usage is in store/db data flow (PacketResult.Packets still uses maps) — these will be addressed in a follow-up. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 15:17:21 -07:00
Kpa-clawbot	2f5404edc3	fix: close last parity gaps in /api/perf and /api/nodes/:pubkey - db.go: Add freelistMB (PRAGMA freelist_count * page_size) and walPages (PRAGMA wal_checkpoint(PASSIVE)) to GetDBSizeStats - store.go: Add advertByObserver count to GetPerfStoreStats indexes (count distinct pubkeys with ADVERT observations) - db.go: Add getObservationsForTransmissions helper; enrich GetRecentTransmissionsForNode results with observations array, _parsedPath, and _parsedDecoded - db_test.go: Add second ADVERT with different hash_size to seed data so hash_sizes_seen is populated; enrich decoded_json with full ADVERT fields; update count assertions for new seed row fixes #151, fixes #152 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 11:57:35 -07:00

1 2

56 Commits