mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-05-24 23:26:18 +00:00
dbadef3e2fbe4e9372dfc39f29c15985d85ec63c
50 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
dbadef3e2f |
refactor(db): move all server writes to ingestor; server truly read-only (#1283)
Eliminates the SQLITE_BUSY VACUUM bug from #1283 by making cmd/server truly read-only. The bug surfaced when supervisord launched both ingestor + server in one container: the ingestor took the write lock for INSERTs, then the server's VACUUM-on-startup immediately failed with SQLITE_BUSY. Same race latently affected three other server-side writes. Four write operations moved out of cmd/server/: 1. VACUUM / auto_vacuum migration (cmd/server/vacuum.go, entire file) → cmd/ingestor/db.go Store.CheckAutoVacuum (already existed; ingestor runs it BEFORE the MQTT subscriber starts so there is no contention with concurrent writes). 2. PruneOldPackets (DELETE FROM transmissions) cmd/server/db.go → cmd/ingestor/maintenance.go (new file, Store.PruneOldPackets) + main.go scheduler. 3. PruneOldMetrics (DELETE FROM observer_metrics) cmd/server/db.go → cmd/ingestor/db.go Store.PruneOldMetrics (already existed). 4. RemoveStaleObservers (UPDATE observers SET inactive = 1) cmd/server/db.go → cmd/ingestor/db.go Store.RemoveStaleObservers (already existed). Server-side changes: - vacuum.go deleted; checkAutoVacuum / runIncrementalVacuum gone. - cmd/server/db.go: PruneOldPackets, PruneOldMetrics, RemoveStaleObservers deleted. - cmd/server/main.go: packet/metrics/observer prune schedulers removed; the neighbor-edge prune scheduler (PruneNeighborEdges) is intentionally left in place — outside scope of #1283, tracked separately. - routes.go + openapi.go: /api/admin/prune endpoint removed (prune is scheduled by the ingestor now; operators restart the ingestor for an ad-hoc pass). Ingestor changes: - New cmd/ingestor/maintenance.go with Store.PruneOldPackets. - cmd/ingestor/config.go gains RetentionConfig.PacketDays and Config.PacketDaysOrZero(). - cmd/ingestor/main.go runs PruneOldPackets at startup (if packetDays > 0) and on a 24h ticker. Docs: - AGENTS.md: documents the read/write separation invariant. - config.example.json: notes that retention + vacuumOnStartup are consumed by the ingestor. TDD: - Red: bb1d749a — invariant tests + Store.PruneOldPackets stub. - Green: this commit — real implementation + server-side removals. Note: cachedRW() still has three out-of-scope callers in cmd/server (neighbor_persist.go, ensure_indexes.go, from_pubkey_migration.go). Those are pre-existing write paths not covered by issue #1283 and are left untouched per the issue scope. Future work can relocate them under the same invariant. |
||
|
|
b881a09f02 |
feat(#1188): show observer IATA on packets + filter grammar (#1189)
Red commit:
|
||
|
|
b21badbcbd |
fix(#1225): paginate channel messages at SQL level — 30s → <500ms (#1226)
## Summary Fixes #1225 — channel messages endpoint took ~30s on staging. ## Root cause `(*DB).GetChannelMessages` SELECTed every observation row for the channel (one row per observation, not per transmission), JSON-unmarshalled each row into a Go map, dedupe-folded by `(sender, packetHash)`, then sliced the tail in Go for pagination. On staging `#wardriving`: - `transmissions` rows with `channel_hash='#wardriving' AND payload_type=5`: **5,703** - `observations` joined to those: **274,632** (~48× amplification) - `time curl /api/channels/%23wardriving/messages?limit=50`: **30.04s / 31.41s / 31.48s / 35.33s / 34.05s** (5 calls before I killed the loop) `EXPLAIN QUERY PLAN` showed the index `idx_tx_channel_hash` was being used — the cost was entirely in fetching, unmarshalling, and folding the full observation set per request even for `limit=50`. Hypothesis #1 from the issue (full table scan on `messages/decoded`) is rejected; #2 (missing index) is rejected; the actual cause was **pagination in Go instead of SQL** — request cost was O(observations) not O(limit). ## Fix Move pagination into SQL on the `transmissions` table. Because `transmissions.hash` is `UNIQUE` and the original dedup key was `(sender, hash)`, each transmission collapses to exactly one logical message — paginating on transmissions is semantically equivalent to the prior in-Go dedup + tail slice. New shape: 1. `COUNT(*)` on transmissions for total (uses `idx_tx_channel_hash`). 2. `SELECT id FROM transmissions … ORDER BY first_seen DESC LIMIT ? OFFSET ?` to pick the page of newest transmissions. 3. `SELECT … FROM observations WHERE transmission_id IN (…page ids…)` — typically 50 ids → a few hundred observation rows. 4. Reassemble in pageIDs order, preserving the ASC-by-`first_seen` API contract. Region filtering, observation-count-as-`repeats`, and "first observation wins for hops/snr/observer" semantics are preserved (observations are scanned `ORDER BY o.id ASC`). ## Perf measurements **Before** (staging `#wardriving`, limit=50, 5 samples killed mid-loop): 30.04s, 31.41s, 31.48s, 35.33s, 34.05s. **Synthetic regression test** (`TestGetChannelMessagesPerfLargeChannel`): 3000 tx × 50 obs. - Broken impl: ~4.5s (test fails the 500ms budget — the RED commit). - Fixed impl: well under 500ms (test passes). **After (staging)**: will measure post-deploy and post-comment on issue with numbers. Synthetic scaling: staging is ~2× the test's transmission count, fixed-path cost scales with `limit` (50) + `COUNT(*)` (~5k rows on index) — expect <100ms p99. ## TDD - RED: `697c290d` — perf test asserts <500ms on 3k×50 dataset; fails at ~4.5s. - GREEN: `3f1f82d3` — fix; full suite green, perf test passes. ## Hypotheses status | # | Hypothesis | Verdict | |---|---|---| | 1 | Endpoint slow on prod-sized data | **CONFIRMED** (different mechanism — see root cause) | | 2 | Missing channel_hash index | Rejected (`idx_tx_channel_hash` exists & used) | | 3 | Frontend re-render storm | Not investigated (backend was clearly the bottleneck) | | 4 | Decode in request path | Rejected (decode is at ingest time; JSON unmarshal of cached `decoded_json` is the cost, addressed by reducing row count) | | 5 | WS subscription failure | Rejected | | 6 | Staging artifact | Rejected (reproducible) | ## Out of scope - The in-memory `(*PacketStore).GetChannelMessages` path (used when `s.db == nil`) has the same shape but operates on bounded in-memory data; not touched. If we ever fall back to it in production we'll revisit. --------- Co-authored-by: clawbot <bot@corescope> |
||
|
|
11d2026bb1 |
feat(startup): hot startup — load hotStartupHours synchronously, fill retentionHours in background (#1187)
Closes #1183 ## Summary - Adds `packetStore.hotStartupHours` config key (float64, default 0 = disabled). When set, `Load()` loads only that many hours of data synchronously, reducing startup time on large DBs. Background goroutine fills the remaining `retentionHours` window in daily chunks after startup completes. - A background goroutine (`loadBackgroundChunks`) fills the remaining `retentionHours` window in daily chunks after startup completes. Analytics indexes are rebuilt once at the end. - `QueryPackets` and `QueryGroupedPackets` check `oldestLoaded` and fall back to `db.QueryPackets()` for any query whose `Since`/`Until` predates the in-memory window — covering days 8–30 permanently (beyond `retentionHours`) and the background-fill gap during startup. - `/api/perf` gains `hotStartupHours`, `backgroundLoadComplete`, and `backgroundLoadProgress` fields inside `packetStore` so operators can monitor the fill. ### Drive-by fixes - E2E: added `gotoPackets` navigation helper used across packet-related tests - E2E: rewrote stripe assertion to check per-row stripe parity rather than a fragile computed-style comparison - E2E: theme test updated to use `#/home` as the initial route (was `#/`) - `db.go`: removed the RFC3339→unix-timestamp subquery path in `buildTransmissionWhere`; `t.first_seen` is now always compared directly as a string for both RFC3339 and non-RFC3339 inputs ## Configuration ```json "packetStore": { "retentionHours": 168, "hotStartupHours": 24 } ``` `hotStartupHours: 0` (default) preserves existing behavior exactly. Recommended for large DBs to reduce startup time; set to 0 to disable (loads full retentionHours at startup, legacy behavior). ## Test plan - [x] `TestHotStartupConfig_Clamp` — clamping when `hotStartupHours > retentionHours` - [x] `TestHotStartupConfig_ZeroIsDisabled` — zero leaves feature disabled - [x] `TestHotStartup_LoadsOnlyHotWindow` — only hot-window packets in memory after `Load()` - [x] `TestHotStartup_DisabledWhenZero` — all retention packets loaded when disabled - [x] `TestHotStartup_loadChunk_AddsOlderData` — chunk merges correctly, ASC order maintained - [x] `TestHotStartup_BackgroundFillsToRetention` — background goroutine fills to `retentionHours` - [x] `TestHotStartup_ChunkErrorRecovery` — chunk SQL failure logged and skipped, loop terminates - [x] `TestHotStartup_SQLFallback_TriggeredForOldDate` — query before `oldestLoaded` routes to SQL - [x] `TestHotStartup_SQLFallback_NotTriggeredForRecentDate` — recent query stays in-memory - [x] `TestHotStartup_PerfStats` — new fields present in `GetPerfStoreStats()` (backs the perf endpoint) - [x] `TestHotStartup_PerfStoreHTTP` — HTTP-level: GET /api/perf returns `hotStartupHours`, `backgroundLoadComplete`, `backgroundLoadProgress` in `packetStore` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: CoreScope Bot <bot@corescope.local> |
||
|
|
fb744d895f |
fix(#1143): structural pubkey attribution via from_pubkey column (#1152)
Fixes #1143. ## Summary Replaces the structurally unsound `decoded_json LIKE '%pubkey%'` (and `OR LIKE '%name%'`) attribution path with an exact-match lookup on a dedicated, indexed `transmissions.from_pubkey` column. This closes both holes documented in #1143: - **Hole 1** — same-name false positives via `OR LIKE '%name%'` - **Hole 2a** — adversarial spoofing: a malicious node names itself with another node's pubkey and gets attributed to the victim - **Hole 2b** — accidental false positive when any free-text field (path elements, channel names, message bodies) contains a 64-char hex substring matching a real pubkey - **Perf** — query now uses an index instead of a full-table scan against `LIKE '%substring%'` ## TDD Two-commit history shows red-then-green: | Commit | Status | Purpose | |---|---|---| | `7f0f08e` | RED — tests assertion-fail on master behaviour | Adversarial fixtures + spec | | `59327db` | GREEN — schema + ingestor + server + migration | Implementation | The red commit's test schema includes the new column so the file compiles, but the production code still uses LIKE — the assertions fail because the malicious / same-name / free-text rows are returned. The green commit changes the query plus adds the migration/ingest path. ## Changes ### Schema - new column `transmissions.from_pubkey TEXT` - new index `idx_transmissions_from_pubkey` ### Ingestor (`cmd/ingestor/`) - `PacketData.FromPubkey` populated from decoded ADVERT `pubKey` at write time. Cheap — already parsing `decoded_json`. Non-ADVERTs stay NULL. - `stmtInsertTransmission` writes the column. - Migration `from_pubkey_v1` ALTERs legacy DBs to add the column + index. - Bonus: rewrote the recipe in the gated one-shot `advert_count_unique_v1` migration to use `from_pubkey` (already marked done on existing DBs; kept correct for fresh installs). ### Server (`cmd/server/`) - `ensureFromPubkeyColumn` mirrors the ingestor migration so the server can boot against a DB the ingestor has never touched (e2e fixture, fresh installs). - `backfillFromPubkeyAsync` runs **after** HTTP starts. Scans `WHERE from_pubkey IS NULL AND payload_type = 4` in 5000-row chunks with a 100ms yield between chunks. Cannot block boot even on prod-sized DBs (100K+ transmissions). Queries handle NULL gracefully (return empty for that pubkey, same as today's unknown-pubkey path). - All in-scope LIKE call sites switched to exact match: | Site | Before | After | |---|---|---| | `buildPacketWhere` (was db.go:582) | `decoded_json LIKE '%pubkey%'` | `from_pubkey = ?` | | `buildTransmissionWhere` (was db.go:626) | `t.decoded_json LIKE '%pubkey%'` | `t.from_pubkey = ?` | | `GetRecentTransmissionsForNode` (was db.go:910) | `LIKE '%pubkey%' OR LIKE '%name%'` | `t.from_pubkey = ?` | | `QueryMultiNodePackets` (was db.go:1785) | `decoded_json LIKE '%pubkey%' OR ...` | `t.from_pubkey IN (?, ?, ...)` | | `advert_count_unique_v1` (was ingestor/db.go:257) | `decoded_json LIKE '%' \|\| nodes.public_key \|\| '%'` | `t.from_pubkey = nodes.public_key` | `GetRecentTransmissionsForNode` signature simplifies: the `name` parameter is gone (it was only ever used for the legacy `OR LIKE '%name%'` fallback). Sole caller in `routes.go:1243` updated. ### Tests - `cmd/server/from_pubkey_attribution_test.go` — adversarial fixtures + Hole 1/2a/2b/QueryMultiNodePackets exact-match assertions, EXPLAIN QUERY PLAN index check, migration backfill correctness. - `cmd/ingestor/from_pubkey_test.go` — write-time correctness (BuildPacketData populates FromPubkey for ADVERT only; InsertTransmission persists it; non-ADVERTs stay NULL). - Existing test schemas (server v2, server v3, coverage) get the new column **plus a SQLite trigger** that auto-populates `from_pubkey` from `decoded_json` on ADVERT inserts. This means existing fixtures (which only seed `decoded_json`) keep attributing correctly without per-test edits. - `seedTestData`'s ADVERTs explicitly set `from_pubkey`. ## Performance — index is used ``` $ EXPLAIN QUERY PLAN SELECT id FROM transmissions WHERE from_pubkey = ? SEARCH transmissions USING INDEX idx_transmissions_from_pubkey (from_pubkey=?) ``` Asserted in `TestFromPubkeyIndexUsed`. ## Migration approach - **Sync at boot**: `ALTER TABLE transmissions ADD COLUMN from_pubkey TEXT` is a metadata-only operation in SQLite — microseconds regardless of table size. `CREATE INDEX IF NOT EXISTS idx_transmissions_from_pubkey` is **not** metadata-only: it scans the table once. Empirically a few hundred ms on a 100K-row table; expect a few seconds on a 10M-row table (one-time cost, blocking boot during that window). Subsequent boots no-op via `IF NOT EXISTS`. If this boot delay becomes an operational concern at prod scale we can defer the `CREATE INDEX` to a goroutine — for now a few-second one-time delay is acceptable. - **Async**: row-level backfill of legacy NULL ADVERTs (chunked 5000 / 100ms yield). On a 100K-ADVERT prod DB, this completes in seconds in the background; HTTP is fully available throughout. - **Safety**: queries handle NULL gracefully — a node whose ADVERTs haven't backfilled yet returns empty, identical to today's behaviour for unknown pubkeys. No half-state regression. ## Out of scope (intentionally) The free-text `LIKE` paths the issue explicitly leaves alone (e.g. user-typed packet search) are untouched. Only the pubkey-attribution sites get the column treatment. ## Cycle-3 review fixes | Finding | Status | Commit | |---|---|---| | **M1c** — async-contract test was tautological (test's own `go`, not production's) | Fixed | `23ace71` (red) → `a05b50c` (green) | | **m1c** — package-global atomic resets unsafe under `t.Parallel()` | Fixed (`// DO NOT t.Parallel` comment + `Reset()` helper) | rolled into `23ace71` / `241ec69` | | **m2c** — `/api/healthz` read 3 atomics non-atomically (torn snapshot) | Fixed (single RWMutex-guarded snapshot + race test) | `241ec69` | | **n3c.m1** — vestigial OR-scaffolding in `QueryMultiNodePackets` | Fixed (cleanup) | `5a53ceb` | | **n3c.m2** — verify PR body language about `ALTER` vs `CREATE INDEX` | Verified accurate (already corrected in cycle 2) | (no change) | | **n3c.m3** — `json.Unmarshal` per row in backfill → could use SQL `json_extract` | **Deferred as known followup** — pure perf optimization (current per-row Unmarshal is correct, just slower); SQL rewrite would unwind the chunked-yield architecture and is non-trivial. Acceptable for one-time backfill at boot on legacy DBs. | ### M1c implementation detail `startFromPubkeyBackfill(dbPath, chunkSize, yieldDuration)` is now the single production entry point used by `main.go`. It internally does `go backfillFromPubkeyAsync(...)`. The test calls `startFromPubkeyBackfill` (no `go` prefix) and asserts the dispatch returns within 50ms — so if anyone removes the `go` keyword inside the wrapper, the test fails. **Manually verified**: removing the `go` keyword causes `TestBackfillFromPubkey_DoesNotBlockBoot` to fail with "backfill dispatch took ~1s (>50ms): not async — would block boot." ### m2c implementation detail `fromPubkeyBackfillTotal/Processed/Done` are now plain `int64`/`bool` package globals guarded by a single `sync.RWMutex`. `fromPubkeyBackfillSnapshot()` returns all three under one RLock. `TestHealthzFromPubkeyBackfillConsistentSnapshot` races a writer (lock-step total/processed updates with periodic done flips) against 8 readers hammering `/api/healthz`, asserting `processed<=total` and `(done => processed==total)` on every response. Verified the test catches torn reads (manually injected a 3-RLock implementation; test failed within milliseconds with "processed>total" and "done=true but processed!=total" errors). --------- Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: openclaw-bot <bot@openclaw.dev> |
||
|
|
136e1d23c8 |
feat(#730): foreign-advert detection — flag instead of silent drop (#1084)
## Summary **Partial fix for #730 (M1 only — M2 frontend and M3 alerting deferred).** Today the ingestor **silently drops** ADVERTs whose GPS lies outside the configured `geo_filter` polygon. That's the wrong default for an analytics tool — operators get zero visibility into bridged or leaked meshes. This PR makes the new default **flag, don't drop**: foreign adverts are stored, the node row is tagged `foreign_advert=1`, and the API surfaces `"foreign": true` so dashboards / map overlays can be built on top. ## Behavior | Mode | What happens to an ADVERT outside `geo_filter` | |---|---| | (default) flag | Stored, marked `foreign_advert=1`, exposed via API | | drop (legacy) | Silently dropped (preserves old behavior for ops who want it) | ## What's done (M1 — Backend) - ingestor stores foreign adverts instead of dropping - `nodes.foreign_advert` column added (migration) - `/api/nodes` and `/api/nodes/{pk}` expose `foreign: true` field - Config: `geofilter.action: "flag"|"drop"` (default `flag`) - Tests + config docs ## What's NOT done (deferred to M2 + M3) - **M2 — Frontend:** Map overlay showing foreign adverts as distinct markers, foreign-advert filter on packets/nodes pages, dedicated foreign-advert dashboard - **M3 — Alerting:** Time-series detection of bridging events, alert when foreign advert rate spikes, identify bridge entry-point nodes Issue #730 remains open for M2 and M3. --------- Co-authored-by: corescope-bot <bot@corescope> |
||
|
|
1f4969c1a6 |
fix(#770): treat region 'All' as no-filter + document region behavior (#1026)
## Summary Fixes #770 — selecting "All" in the region filter dropdown produced an empty channel list. ## Root cause `normalizeRegionCodes` (cmd/server/db.go) treated any non-empty input as a literal IATA code. The frontend region filter labels its catch-all option **"All"**; while `region-filter.js` normally sends an empty string when "All" is selected, any code path that ends up sending `?region=All` (deep-link URLs, manual queries, future callers) caused the function to return `["ALL"]`. Downstream queries then filtered observers for `iata = 'ALL'`, which never matches anything → empty response. ## Fix `normalizeRegionCodes` now treats `All` / `ALL` / `all` (case-insensitive, with optional whitespace, mixed in CSV) as equivalent to an empty value, returning `nil` to signal "no filter". Real IATA codes (`SJC`, `PDX`, `sjc,PDX` → `[SJC PDX]`) still pass through unchanged. This is a defensive server-side fix: a single chokepoint that all region-aware endpoints already flow through (channels, packets, analytics, encrypted channels, observer ID resolution). ## Documentation Expanded `_comment_regions` in `config.example.json` to explain: - How IATA codes are resolved (payload > topic > source config — set in #1012) - What the `regions` map controls (display labels) vs runtime-discovered codes - That observers without an IATA tag only appear under "All Regions" - That the `All` sentinel is server-side safe ## TDD - **Red commit** (`4f65bf4`): `cmd/server/region_filter_test.go` — `TestNormalizeRegionCodes_AllIsNoFilter` asserts `All` / `ALL` / `all` / `""` / `"All,"` all collapse to `nil`. Compiles, runs, fails on assertion (`got [ALL], want nil`). Companion test `TestNormalizeRegionCodes_RealCodesPreserved` locks in that `sjc,PDX` still returns `[SJC PDX]`. - **Green commit** (`c9fb965`): two-line change in `normalizeRegionCodes` + docs update. ## Verification ``` $ go test -run TestNormalizeRegionCodes -count=1 ./cmd/server ok github.com/corescope/server 0.023s $ go test -count=1 ./cmd/server ok github.com/corescope/server 21.454s ``` Full suite green; no existing region tests regressed. Fixes #770 --------- Co-authored-by: Kpa-clawbot <bot@corescope> |
||
|
|
df69a17718 |
feat(#772): short pubkey-prefix URLs for mesh sharing (#1016)
## Summary Fixes #772 — adds a short-URL form for node detail pages so operators can paste node links into a mesh chat without bringing along a 64-hex-char public key. ## Approach **Pubkey-prefix resolution** (no allocator, no lookup table). - The SPA hash route `#/nodes/<key>` already accepts whatever pubkey-shaped string the user pastes; the front end forwards it to `GET /api/nodes/<key>`. - When that lookup misses **and** the path is 8..63 hex chars, the backend now calls `DB.GetNodeByPrefix` and: - returns the matching node when exactly one node has that prefix, - returns **409 Conflict** when multiple nodes share the prefix (with a "use a longer prefix" hint), - falls through to the existing 404 otherwise. - 8 hex chars = 32 bits of entropy, which is enough for fleets in the low thousands. Operators can extend to 10–12 chars if collisions become common. - The full-screen node detail card gets a new **📡 Copy short URL** button that copies `…/#/nodes/<first 8 hex chars>`. ### Why not an opaque ID table (`/s/<id>`)? Considered and rejected: - Needs persistence + an allocator + cleanup story. - IDs aren't self-describing — operators can't sanity-check them. - IDs don't survive a DB rebuild. - 32 bits of pubkey already buys us collision resistance with zero moving parts. If the directory grows past the point where 8-char prefixes routinely collide, we can extend the minimum length without changing the URL shape. ## Changes - `cmd/server/db.go` — new `GetNodeByPrefix(prefix)` returning `(node, ambiguous, error)`. Validates hex; rejects <8 chars; `LIMIT 2` to detect collisions cheaply. - `cmd/server/routes.go` — `handleNodeDetail` falls back to prefix resolution; canonicalizes pubkey downstream; emits 409 on ambiguity; honors blacklist on the resolved pubkey. - `public/nodes.js` — adds **📡 Copy short URL** button + handler on the full-screen node detail card. - `cmd/server/short_url_test.go` — Go tests (red-then-green). - `test-e2e-playwright.js` — E2E: navigates via prefix-only URL and asserts the new button surfaces. ## TDD evidence - Red commit: `2dea97a` — tests added with a stub `GetNodeByPrefix` returning `(nil, false, nil)`. All four assertions failed (assertion failures, not build errors): expected node got nil; expected ambiguous=true got false; route 404 vs expected 200/409. - Green commit: `9b8f146` — implementation lands; `go test ./...` passes locally in `cmd/server`. ## Compatibility - Existing 64-char pubkey URLs are untouched (exact lookup runs first). - Blacklist is enforced both on the raw input and on the resolved pubkey. - No new config knobs. ## What I did **not** touch - `cmd/server/db_test.go`, other route tests — unchanged. - Packet-detail short URLs (issue scopes nodes; revisit in a follow-up if asked). Fixes #772 --------- Co-authored-by: clawbot <bot@corescope.local> |
||
|
|
dd2f044f2b |
fix: cache RW SQLite connection + dedup DBConfig (closes #921) (#982)
Closes #921 ## Summary Follow-up to #920 (incremental auto-vacuum). Addresses both items from the adversarial review: ### 1. RW connection caching Previously, every call to `openRW(dbPath)` opened a new SQLite RW connection and closed it after use. This happened in: - `runIncrementalVacuum` (~4x/hour) - `PruneOldPackets`, `PruneOldMetrics`, `RemoveStaleObservers` - `buildAndPersistEdges`, `PruneNeighborEdges` - All neighbor persist operations Now a single `*sql.DB` handle (with `MaxOpenConns(1)`) is cached process-wide via `cachedRW(dbPath)`. The underlying connection pool manages serialization. The original `openRW()` function is retained for one-shot test usage. ### 2. DBConfig dedup `DBConfig` was defined identically in both `cmd/server/config.go` and `cmd/ingestor/config.go`. Extracted to `internal/dbconfig/` as a shared package; both binaries now use a type alias (`type DBConfig = dbconfig.DBConfig`). ## Tests added | Test | File | |------|------| | `TestCachedRW_ReturnsSameHandle` | `cmd/server/rw_cache_test.go` | | `TestCachedRW_100Calls_SingleConnection` | `cmd/server/rw_cache_test.go` | | `TestGetIncrementalVacuumPages_Default` | `internal/dbconfig/dbconfig_test.go` | | `TestGetIncrementalVacuumPages_Configured` | `internal/dbconfig/dbconfig_test.go` | ## Verification ``` ok github.com/corescope/server 20.069s ok github.com/corescope/ingestor 47.117s ok github.com/meshcore-analyzer/dbconfig 0.003s ``` Both binaries build cleanly. 100 sequential `cachedRW()` calls return the same handle with exactly 1 entry in the cache map. --------- Co-authored-by: you <you@example.com> |
||
|
|
3364eed303 |
feat: separate "Last Status Update" from "Last Packet Observation" for observers (v3 rebase) (#969)
Rebased version of #968 (which was itself a rebase of #905) — resolves merge conflict with #906 (clock-skew UI) that landed on master. ## Conflict resolution **`public/observers.js`** — master (#906) added "Clock Offset" column to observer table; #968 split "Last Seen" into "Last Status" + "Last Packet" columns. Combined both: the table now has Status | Name | Region | Last Status | Last Packet | Packets | Packets/Hour | Clock Offset | Uptime. ## What this PR adds (unchanged from #968/#905) - `last_packet_at` column in observers DB table - Separate "Last Status Update" and "Last Packet Observation" display in observers list and detail page - Server-side migration to add the column automatically - Backfill heuristic for existing data - Tests for ingestor and server ## Verification - All Go tests pass (`cmd/server`, `cmd/ingestor`) - Frontend tests pass (`test-packets.js`, `test-hash-color.js`) - Built server, hit `/api/observers` — `last_packet_at` field present in JSON - Observer table header has all 9 columns including both Last Packet and Clock Offset ## Prior PRs - #905 — original (conflicts with master) - #968 — first rebase (conflicts after #906 landed) - This PR — second rebase, resolves #906 conflict Supersedes #968. Closes #905. --------- Co-authored-by: you <you@example.com> |
||
|
|
568de4b441 |
fix(observers): exclude soft-deleted observers from /api/observers and totalObservers (#954)
## Bug `/api/observers` returned soft-deleted (inactive=1) observers. Operators saw stale observers in the UI even after the auto-prune marked them inactive on schedule. Reproduced on staging: 14 observers older than 14 days returned by the API; all of them had `inactive=1` in the DB. ## Root cause `DB.GetObservers()` (`cmd/server/db.go:974`) ran `SELECT ... FROM observers ORDER BY last_seen DESC` with no WHERE filter. The `RemoveStaleObservers` path correctly soft-deletes by setting `inactive=1`, but the read path didn't honor it. `statsRow` (`cmd/server/db.go:234`) had the same bug — `totalObservers` count included soft-deleted rows. ## Fix Add `WHERE inactive IS NULL OR inactive = 0` to both: ```go // GetObservers "SELECT ... FROM observers WHERE inactive IS NULL OR inactive = 0 ORDER BY last_seen DESC" // statsRow.TotalObservers "SELECT COUNT(*) FROM observers WHERE inactive IS NULL OR inactive = 0" ``` `NULL` check preserves backward compatibility with rows from before the `inactive` migration. ## Tests Added regression `TestGetObservers_ExcludesInactive`: - Seed two observers, mark one inactive, assert `GetObservers()` returns only the other. - **Anti-tautology gate verified**: reverting the WHERE clause causes the test to fail with `expected 1 observer, got 2` and `inactive observer obs2 should be excluded`. `go test ./...` passes (19.6s). ## Out of scope - `GetObserverByID` lookup at line 1009 still returns inactive observers — this is intentional, so an old deep link to `/observers/<id>` shows "inactive" rather than 404. - Frontend may also have its own caching layer; this fix is server-side only. --------- Co-authored-by: Kpa-clawbot <bot@example.invalid> Co-authored-by: you <you@example.com> Co-authored-by: KpaBap <kpabap@gmail.com> |
||
|
|
a605518d6d |
fix(#881): per-observation raw_hex — each observer sees different bytes on air (#882)
## Problem Each MeshCore observer receives a physically distinct over-the-air byte sequence for the same transmission (different path bytes, flags/hops remaining). The `observations` table stored only `path_json` per observer — all observations pointed at one `transmissions.raw_hex`. This prevented the hex pane from updating when switching observations in the packet detail view. ## Changes | Layer | Change | |-------|--------| | **Schema** | `ALTER TABLE observations ADD COLUMN raw_hex TEXT` (nullable). Migration: `observations_raw_hex_v1` | | **Ingestor** | `stmtInsertObservation` now stores per-observer `raw_hex` from MQTT payload | | **View** | `packets_v` uses `COALESCE(o.raw_hex, t.raw_hex)` — backward compatible with NULL historical rows | | **Server** | `enrichObs` prefers `obs.RawHex` when non-empty, falls back to `tx.RawHex` | | **Frontend** | No changes — `effectivePkt.raw_hex` already flows through `renderDetail` | ## Tests - **Ingestor**: `TestPerObservationRawHex` — two MQTT packets for same hash from different observers → both stored with distinct raw_hex - **Server**: `TestPerObservationRawHexEnrich` — enrichObs returns per-obs raw_hex when present, tx fallback when NULL - **E2E**: Playwright assertion in `test-e2e-playwright.js` for hex pane update on observation switch E2E assertion added: `test-e2e-playwright.js:1794` ## Scope - Historical observations: raw_hex stays NULL, UI falls back to transmission raw_hex silently - No backfill, no path_json reconstruction, no frontend changes Closes #881 --------- Co-authored-by: you <you@example.com> |
||
|
|
886aabf0ae |
fix(#827): /api/packets/{hash} falls back to DB when in-memory store misses (#831)
Closes #827. ## Problem `/api/packets/{hash}` only consulted the in-memory `PacketStore`. When a packet aged out of memory, the handler 404'd — even though SQLite still had it and `/api/nodes/{pubkey}` `recentAdverts` (which reads from the DB) was actively surfacing the hash. Net effect: the **Analyze →** link on older adverts in the node detail page led to a dead "Not found". Two-store inconsistency: DB has the packet, in-memory doesn't, node detail surfaces it from DB → packet detail can't serve it. ## Fix In `handlePacketDetail`: - After in-memory miss, fall back to `db.GetPacketByHash` (already existed) for hash lookups, and `db.GetTransmissionByID` for numeric IDs. - Track when the result came from the DB; if so and the store has no observations, populate from DB via a new `db.GetObservationsForHash` so the response shows real observations instead of the misleading `observation_count = 1` fallback. ## Tests - `TestPacketDetailFallsBackToDBWhenStoreMisses` — insert a packet directly into the DB after `store.Load()`, confirm store doesn't have it, assert 200 + populated observations. - `TestPacketDetail404WhenAbsentFromBoth` — neither store nor DB → 404 (no false positives). - `TestPacketDetailPrefersStoreOverDB` — both have it; store result wins (no double-fetch). - `TestHandlePacketDetailNoStore` updated: it previously asserted the old buggy 404 behavior; now asserts the correct DB-fallback 200. All `go test ./... -run "PacketDetail|Packet|GetPacket"` and the full `cmd/server` suite pass. ## Out of scope The `/api/packets?hash=` filter is the live in-memory list endpoint and intentionally store-only for performance. Not touched here — happy to file a follow-up if you'd rather harmonise. ## Repro context Verified against prod with a recently-adverting repeater whose recent advert hash lives in `recentAdverts` (DB) but had been evicted from the in-memory store; pre-fix 404, post-fix 200 with full observations. Co-authored-by: you <you@example.com> |
||
|
|
d7fe24e2db |
Fix channel filter on Packets page (UI + API) — #812 (#816)
Closes #812 ## Root causes **Server (`/api/packets?channel=…` returned identical totals):** The handler in `cmd/server/routes.go` never read the `channel` query parameter into `PacketQuery`, so it was silently ignored by both the SQLite path (`db.go::buildTransmissionWhere`) and the in-memory path (`store.go::filterPackets`). The codebase already had everything else in place — the `channel_hash` column with an index from #762, decoded `channel` / `channelHashHex` fields on each packet — it just wasn't wired up. **UI (`/#/packets` had no channel filter):** `public/packets.js` rendered observer / type / time-window / region filters but no channel control, and didn't read `?channel=` from the URL. ## Fix ### Server - New `Channel` field on `PacketQuery`; `handlePackets` reads `r.URL.Query().Get("channel")`. - DB path filters by the indexed `channel_hash` column (exact match). - In-memory path: helper `packetMatchesChannel` matches `decoded.channel` (plaintext, e.g. `#test`, `public`) or `enc_<HEX>` against `channelHashHex` for undecryptable GRP_TXT. Uses cached `ParsedDecoded()` so it's O(1) after first parse. Fast-path index guards and the grouped-cache key updated to include channel. - Regression test (`channel_filter_test.go`): `channel=#test` returns ≥1 GRP_TXT packet and fewer than baseline; `channel=nonexistentchannel` returns `total=0`. ### UI - New `<select id="fChannel">` populated from `/api/channels`. - Round-trips via `?channel=…` on the URL hash (read on init, written on change). - Pre-seeds the current value as an option so encrypted hashes not in `/api/channels` still display as selected on reload. - On change, calls `loadPackets()` so the server-side filter applies before pagination. ## Perf Filter adds at most one cached map lookup per packet (DB path uses indexed column, store path uses `ParsedDecoded()` cache). Staging baseline 149–190 ms for `?channel=#test&limit=50`; the new comparison is negligible. Target ≤ 500 ms preserved. ## Tests `cd cmd/server && go test ./... -count=1 -timeout 120s` → PASS. --------- Co-authored-by: you <you@example.com> |
||
|
|
bf674ebfa2 |
feat: validate advert signatures on ingest, reject corrupt packets (#794)
## Summary
Validates ed25519 signatures on ADVERT packets during MQTT ingest.
Packets with invalid signatures are rejected before storage, preventing
corrupt/truncated adverts from polluting the database.
## Changes
### Ingestor (`cmd/ingestor/`)
- **Signature validation on ingest**: After decoding an ADVERT, checks
`SignatureValid` from the decoder. Invalid signatures → packet dropped,
never stored.
- **Config flag**: `validateSignatures` (default `true`). Set to `false`
to disable validation for backward compatibility with existing installs.
- **`dropped_packets` table**: New SQLite table recording every rejected
packet with full attribution:
- `hash`, `raw_hex`, `reason`, `observer_id`, `observer_name`,
`node_pubkey`, `node_name`, `dropped_at`
- Indexed on `observer_id` and `node_pubkey` for investigation queries
- **`SignatureDrops` counter**: New atomic counter in `DBStats`, logged
in periodic stats output as `sig_drops=N`
- **Retention**: `dropped_packets` pruned alongside metrics on the same
`retention.metricsDays` schedule
### Server (`cmd/server/`)
- **`GET /api/dropped-packets`** (API key required): Returns recent
drops with optional `?observer=` and `?pubkey=` filters, `?limit=`
(default 100, max 500)
- **`signatureDrops`** field added to `/api/stats` response (count from
`dropped_packets` table)
### Tests (8 new)
| Test | What it verifies |
|------|-----------------|
| `TestSigValidation_ValidAdvertStored` | Valid advert passes validation
and is stored |
| `TestSigValidation_TamperedSignatureDropped` | Tampered signature →
dropped, recorded in `dropped_packets` with correct fields |
| `TestSigValidation_TruncatedAppdataDropped` | Truncated appdata
invalidates signature → dropped |
| `TestSigValidation_DisabledByConfig` | `validateSignatures: false`
skips validation, stores tampered packet |
| `TestSigValidation_DropCounterIncrements` | Counter increments
correctly across multiple drops |
| `TestSigValidation_LogContainsFields` | `dropped_packets` row contains
hash, reason, observer, pubkey, name |
| `TestPruneDroppedPackets` | Old entries pruned, recent entries
retained |
| `TestShouldValidateSignatures_Default` | Config helper returns correct
defaults |
### Config example
```json
{
"validateSignatures": true
}
```
Fixes #793
---------
Co-authored-by: you <you@example.com>
|
||
|
|
fa3f623bd6 |
feat: add observer retention — remove stale observers after configurable days (#764)
## Summary
Observers that stop actively sending data now get removed after a
configurable retention period (default 14 days).
Previously, observers remained in the `observers` table forever. This
meant nodes that were once observers for an instance but are no longer
connected (even if still active in the mesh elsewhere) would continue
appearing in the observer list indefinitely.
## Key Design Decisions
- **Active data requirement**: `last_seen` is only updated when the
observer itself sends packets (via `stmtUpdateObserverLastSeen`). Being
seen by another node does NOT update this field. So an observer must
actively send data to stay listed.
- **Default: 14 days** — observers not seen in 14 days are removed
- **`-1` = keep forever** — for users who want observers to never be
removed
- **`0` = use default (14 days)** — same as not setting the field
- **Runs on startup + daily ticker** — staggered 3 minutes after metrics
prune to avoid DB contention
## Changes
| File | Change |
|------|--------|
| `cmd/ingestor/config.go` | Add `ObserverDays` to `RetentionConfig`,
add `ObserverDaysOrDefault()` |
| `cmd/ingestor/db.go` | Add `RemoveStaleObservers()` — deletes
observers with `last_seen` before cutoff |
| `cmd/ingestor/main.go` | Wire up startup + daily ticker for observer
retention |
| `cmd/server/config.go` | Add `ObserverDays` to `RetentionConfig`, add
`ObserverDaysOrDefault()` |
| `cmd/server/db.go` | Add `RemoveStaleObservers()` (server-side, uses
read-write connection) |
| `cmd/server/main.go` | Wire up startup + daily ticker, shutdown
cleanup |
| `cmd/server/routes.go` | Admin prune API now also removes stale
observers |
| `config.example.json` | Add `observerDays: 14` with documentation |
| `cmd/ingestor/coverage_boost_test.go` | 4 tests: basic removal, empty
store, keep forever (-1), default (0→14) |
| `cmd/server/config_test.go` | 4 tests: `ObserverDaysOrDefault` edge
cases |
## Config Example
```json
{
"retention": {
"nodeDays": 7,
"observerDays": 14,
"packetDays": 30,
"_comment": "observerDays: -1 = keep forever, 0 = use default (14)"
}
}
```
## Admin API
The `/api/admin/prune` endpoint now also removes stale observers (using
`observerDays` from config) and reports `observers_removed` in the
response alongside `packets_deleted`.
## Test Plan
- [x] `TestRemoveStaleObservers` — old observer removed, recent observer
kept
- [x] `TestRemoveStaleObserversNone` — empty store, no errors
- [x] `TestRemoveStaleObserversKeepForever` — `-1` keeps even year-old
observers
- [x] `TestRemoveStaleObserversDefault` — `0` defaults to 14 days
- [x] `TestObserverDaysOrDefault` (ingestor) —
nil/zero/positive/keep-forever
- [x] `TestObserverDaysOrDefault` (server) —
nil/zero/positive/keep-forever
- [x] Both binaries compile cleanly (`go build`)
- [ ] Manual: verify observer count decreases after retention period on
a live instance
|
||
|
|
0e286d85fd |
fix: channel query performance — add channel_hash column, SQL-level filtering (#762) (#763)
## Problem Channel API endpoints scan entire DB — 2.4s for channel list, 30s for messages. ## Fix - Added `channel_hash` column to transmissions (populated on ingest, backfilled on startup) - `GetChannels()` rewrites to GROUP BY channel_hash (one row per channel vs scanning every packet) - `GetChannelMessages()` filters by channel_hash at SQL level with proper LIMIT/OFFSET - 60s cache for channel list - Index: `idx_tx_channel_hash` for fast lookups Expected: 2.4s → <100ms for list, 30s → <500ms for messages. Fixes #762 --------- Co-authored-by: you <you@example.com> |
||
|
|
84f03f4f41 |
fix: hide undecryptable channel messages by default (#727) (#728)
## Problem Channels page shows 53K 'Unknown' messages — undecryptable GRP_TXT packets with no content. Pure noise. ## Fix - Backend: channels API filters out undecrypted messages by default - `?includeEncrypted=true` param to include them - Frontend: 'Show encrypted' toggle in channels sidebar - Unknown channels grayed out with '(no key)' label - Toggle persists in localStorage Fixes #727 --------- Co-authored-by: you <you@example.com> |
||
|
|
71be54f085 |
feat: DB-backed channel messages for full history (#725 M1) (#726)
## Summary Switches channel API endpoints to query SQLite instead of the in-memory packet store, giving users access to the full message history. Implements #725 (M1 only — DB-backed channel messages). Does NOT close #725 — M2-M5 (custom channels, PSK, persistence, retroactive decryption) remain. ## Problem Channel endpoints (`/api/channels`, `/api/channels/{hash}/messages`) preferred the in-memory packet store when available. The store is bounded by `packetStore.maxMemoryMB` — typically showing only recent messages. The SQLite database has the complete history (weeks/months of channel messages) but was only used as a fallback when the store was nil (never in production). ## Fix Reversed the preference order: DB first, in-memory store fallback. Region filtering added to the DB path. Co-authored-by: you <you@example.com> |
||
|
|
e893a1b3c4 |
fix: index relay hops in byNode for liveness tracking (#708)
## Problem Nodes that only appear as relay hops in packet paths (via `resolved_path`) were never indexed in `byNode`, so `last_heard` was never computed for them. This made relay-only nodes show as dead/stale even when actively forwarding traffic. Fixes #660 ## Root Cause `indexByNode()` only indexed pubkeys from decoded JSON fields (`pubKey`, `destPubKey`, `srcPubKey`). Relay nodes appearing in `resolved_path` were ignored entirely. ## Fix `indexByNode()` now also iterates: 1. `ResolvedPath` entries from each observation 2. `tx.ResolvedPath` (best observation's resolved path, used for DB-loaded packets) A per-call `indexed` set prevents double-indexing when the same pubkey appears in both decoded JSON and resolved path. Extracted `addToByNode()` helper to deduplicate the nodeHashes/byNode append logic. ## Scope **Phase 1 only** — server-side in-memory indexing. No DB changes, no ingestor changes. This makes `last_heard` reflect relay activity with zero risk to persistence. ## Tests 5 new test cases in `TestIndexByNodeResolvedPath`: - Resolved path pubkeys from observations get indexed - Null entries in resolved path are skipped - Relay-only nodes (no decoded JSON match) appear in `byNode` - Dedup between decoded JSON and resolved path - `tx.ResolvedPath` indexed when observations are empty All existing tests pass unchanged. ## Complexity O(observations × path_length) per packet — typically 1-3 observations × 1-3 hops. No hot-path regression. --------- Co-authored-by: you <you@example.com> |
||
|
|
fcba2a9f3d |
fix: set PRAGMA busy_timeout on all RW SQLite connections (#707)
## Problem `SQLITE_BUSY` contention between the ingestor and server's async persistence goroutine drops `resolved_path` and `neighbor_edges` updates. The DSN parameter `_busy_timeout=10000` may not be honored by the modernc/sqlite driver. ## Fix - **`openRW()` now sets `PRAGMA busy_timeout = 5000`** after opening the connection, guaranteeing SQLite retries for up to 5 seconds before returning `SQLITE_BUSY` - **Refactored `PruneOldPackets` and `PruneOldMetrics`** to use `openRW()` instead of duplicating connection setup — all RW connections now get consistent busy_timeout handling - Added test verifying the pragma is set correctly ## Changes | File | Change | |------|--------| | `cmd/server/neighbor_persist.go` | `openRW()` sets `PRAGMA busy_timeout = 5000` after open | | `cmd/server/db.go` | `PruneOldPackets` and `PruneOldMetrics` use `openRW()` instead of inline `sql.Open` | | `cmd/server/neighbor_persist_test.go` | `TestOpenRW_BusyTimeout` verifies pragma is set | ## Performance No performance impact — `PRAGMA busy_timeout` is a connection-level setting with zero overhead on uncontended writes. Under contention, it converts immediate `SQLITE_BUSY` failures into brief retries (up to 5s), which is strictly better than dropping data. Fixes #705 --------- Co-authored-by: you <you@example.com> |
||
|
|
232770a858 |
feat(rf-health): M2 — airtime, error rate, battery charts with delta computation (#605)
## M2: Airtime + Channel Quality + Battery Charts Implements M2 of #600 — server-side delta computation and three new charts in the RF Health detail view. ### Backend Changes **Delta computation** for cumulative counters (`tx_air_secs`, `rx_air_secs`, `recv_errors`): - Computes per-interval deltas between consecutive samples - **Reboot handling:** detects counter reset (current < previous), skips that delta, records reboot timestamp - **Gap handling:** if time between samples > 2× interval, inserts null (no interpolation) - Returns `tx_airtime_pct` and `rx_airtime_pct` as percentages (delta_secs / interval_secs × 100) - Returns `recv_error_rate` as delta_errors / (delta_recv + delta_errors) × 100 **`resolution` query param** on `/api/observers/{id}/metrics`: - `5m` (default) — raw samples - `1h` — hourly aggregates (GROUP BY hour with AVG/MAX) - `1d` — daily aggregates **Schema additions:** - `packets_sent` and `packets_recv` columns added to `observer_metrics` (migration) - Ingestor parses these fields from MQTT stats messages **API response** now includes: - `tx_airtime_pct`, `rx_airtime_pct`, `recv_error_rate` (computed deltas) - `reboots` array with timestamps of detected reboots - `is_reboot_sample` flag on affected samples ### Frontend Changes Three new charts in the RF Health detail view, stacked vertically below noise floor: 1. **Airtime chart** — TX (red) + RX (blue) as separate SVG lines, Y-axis 0-100%, direct labels at endpoints 2. **Error Rate chart** — `recv_error_rate` line, shown only when data exists 3. **Battery chart** — voltage line with 3.3V low reference, shown only when battery_mv > 0 All charts: - Share X-axis and time range (aligned vertically) - Reboot markers as vertical hairlines spanning all charts - Direct labels on data (no legends) - Resolution auto-selected: `1h` for 7d/30d ranges - Charts hidden when no data exists ### Tests - `TestComputeDeltas`: normal deltas, reboot detection, gap detection - `TestGetObserverMetricsResolution`: 5m/1h/1d downsampling verification - Updated `TestGetObserverMetrics` for new API signature --------- Co-authored-by: you <you@example.com> |
||
|
|
747aea37b7 |
fix(rf-health): add region filter support to metrics summary
Frontend passes RegionFilter query string to summary API. Backend filters results by observer IATA region. Added iata field to MetricsSummaryRow. |
||
|
|
6f35d4d417 |
feat: RF Health Dashboard M1 — observer metrics + small multiples grid (#604)
## RF Health Dashboard — M1: Observer Metrics Storage, API & Small Multiples Grid Implements M1 of #600. ### What this does Adds a complete RF health monitoring pipeline: MQTT stats ingestion → SQLite storage → REST API → interactive dashboard with small multiples grid. ### Backend Changes **Ingestor (`cmd/ingestor/`)** - New `observer_metrics` table via migration system (`_migrations` pattern) - Parse `tx_air_secs`, `rx_air_secs`, `recv_errors` from MQTT status messages (same pattern as existing `noise_floor` and `battery_mv`) - `INSERT OR REPLACE` with timestamps rounded to nearest 5-min interval boundary (using ingestor wall clock, not observer timestamps) - Missing fields stored as NULLs — partial data is always better than no data - Configurable retention pruning: `retention.metricsDays` (default 30), runs on startup + every 24h **Server (`cmd/server/`)** - `GET /api/observers/{id}/metrics?since=...&until=...` — per-observer time-series data - `GET /api/observers/metrics/summary?window=24h` — fleet summary with current NF, avg/max NF, sample count - `parseWindowDuration()` supports `1h`, `24h`, `3d`, `7d`, `30d` etc. - Server-side metrics retention pruning (same config, staggered 2min after packet prune) ### Frontend Changes **RF Health tab (`public/analytics.js`, `public/style.css`)** - Small multiples grid showing all observers simultaneously — anomalies pop out visually - Per-observer cell: name, current NF value, battery voltage, sparkline, avg/max stats - NF status coloring: warning (amber) at ≥-100 dBm, critical (red) at ≥-85 dBm — text color only, no background fills - Click any cell → expanded detail view with full noise floor line chart - Reference lines with direct text labels (`-100 warning`, `-85 critical`) — not color bands - Min/max points labeled directly on the chart - Time range selector: preset buttons (1h/3h/6h/12h/24h/3d/7d/30d) + custom from/to datetime picker - Deep linking: `#/analytics?tab=rf-health&observer=...&range=...` - All charts use SVG, matching existing analytics.js patterns - Responsive: 3-4 columns on desktop, 1 on mobile ### Design Decisions (from spec) - Labels directly on data, not in legends - Reference lines with text labels, not color bands - Small multiples grid, not card+accordion (Tufte: instant visual fleet comparison) - Ingestor wall clock for all timestamps (observer clocks may drift) ### Tests Added **Ingestor tests:** - `TestRoundToInterval` — 5 cases for rounding to 5-min boundaries - `TestInsertMetrics` — basic insertion with all fields - `TestInsertMetricsIdempotent` — INSERT OR REPLACE deduplication - `TestInsertMetricsNullFields` — partial data with NULLs - `TestPruneOldMetrics` — retention pruning - `TestExtractObserverMetaNewFields` — parsing tx_air_secs, rx_air_secs, recv_errors **Server tests:** - `TestGetObserverMetrics` — time-series query with since/until filters, NULL handling - `TestGetMetricsSummary` — fleet summary aggregation - `TestObserverMetricsAPIEndpoints` — DB query verification - `TestMetricsAPIEndpoints` — HTTP endpoint response shape - `TestParseWindowDuration` — duration parsing for h/d formats ### Test Results ``` cd cmd/ingestor && go test ./... → PASS (26s) cd cmd/server && go test ./... → PASS (5s) ``` ### What's NOT in this PR (deferred to M2+) - Server-side delta computation for cumulative counters - Airtime charts (TX/RX percentage lines) - Channel quality chart (recv_error_rate) - Battery voltage chart - Reboot detection and chart annotations - Resolution downsampling (1h, 1d aggregates) - Pattern detection / automated diagnosis --------- Co-authored-by: you <you@example.com> |
||
|
|
6ae62ce535 |
perf: make txToMap observations lazy via ExpandObservations flag (#595)
## Summary `txToMap()` previously always allocated observation sub-maps for every packet, even though the `/api/packets` handler immediately stripped them via `delete(p, "observations")` unless `expand=observations` was requested. A typical page of 50 packets with ~5 observations each caused 300+ unnecessary map allocations per request. ## Changes - **`txToMap`**: Add variadic `includeObservations bool` parameter. Observations are only built when `true` is passed, eliminating allocations when they'd just be discarded. - **`PacketQuery`**: Add `ExpandObservations bool` field to thread the caller's intent through the query pipeline. - **`routes.go`**: Set `ExpandObservations` based on `expand=observations` query param. Removed the post-hoc `delete(p, "observations")` loop — observations are simply never created when not requested. - **Single-packet lookups** (`GetPacketByID`, `GetPacketByHash`): Always pass `true` since detail views need observations. - **Multi-node/analytics queries**: Default (no flag) = no observations, matching prior behavior. ## Testing - Added `TestTxToMapLazyObservations` covering all three cases: no flag, `false`, and `true`. - All existing tests pass (`go test ./...`). ## Perf Impact Eliminates ~250 observation map allocations per /api/packets request (at default page size of 50 with ~5 observations each). This is a constant-factor improvement per request — no algorithmic complexity change. Fixes #374 Co-authored-by: you <you@example.com> |
||
|
|
45d8116880 |
perf: query only matching node locations in handleObservers (#579)
## Summary `handleObservers()` in `routes.go` was calling `GetNodeLocations()` which fetches ALL nodes from the DB just to match ~10 observer IDs against node public keys. With 500+ nodes this is wasteful. ## Changes - **`db.go`**: Added `GetNodeLocationsByKeys(keys []string)` — queries only the rows matching the given public keys using a parameterized `WHERE LOWER(public_key) IN (?, ?, ...)` clause. - **`routes.go`**: `handleObservers` now collects observer IDs and calls the targeted method instead of the full-table scan. - **`coverage_test.go`**: Added `TestGetNodeLocationsByKeys` covering known key, empty keys, and unknown key cases. ## Performance With ~10 observers and 500+ nodes, the query goes from scanning all 500 rows to fetching only ~10. The original `GetNodeLocations()` is preserved for any other callers. Fixes #378 Co-authored-by: you <you@example.com> |
||
|
|
ae38cdefb4 |
feat: server-side hop resolution at ingest — resolved_path (#556)
## Summary Implements server-side hop prefix resolution at ingest time with a persisted neighbor graph. Hop prefixes in `path_json` are now resolved to full 64-char pubkeys at ingest and stored as `resolved_path` on each observation, eliminating the need for client-side resolution via `HopResolver`. Fixes #555 ## What changed ### New file: `cmd/server/neighbor_persist.go` SQLite persistence layer for the neighbor graph and resolved paths: - `neighbor_edges` table creation and management - Load/build/persist neighbor edges from/to SQLite - `resolved_path` column migration on observations - `resolvePathForObs()` — resolves hop prefixes using `resolveWithContext` with 4-tier priority (affinity → geo → GPS → first match) - Cold startup backfill for observations missing `resolved_path` - Async persistence of edges and resolved paths during ingest (non-blocking) ### Modified: `cmd/server/store.go` - `StoreObs` gains `ResolvedPath []*string` field - `StoreTx` gains `ResolvedPath []*string` (cached from best observation) - `Load()` dynamically includes `resolved_path` in SQL query when column exists - `IngestNewFromDB()` resolves paths at ingest time and persists asynchronously - `pickBestObservation()` propagates `ResolvedPath` to transmission - `txToMap()` and `enrichObs()` include `resolved_path` in API responses - All 7 `pm.resolve()` call sites migrated to `pm.resolveWithContext()` with the persisted graph - Broadcast maps include `resolved_path` per observation ### Modified: `cmd/server/db.go` - `DB` struct gains `hasResolvedPath bool` flag - `detectSchema()` checks for `resolved_path` column existence - Graceful degradation when column is absent (test DBs, old schemas) ### Modified: `cmd/server/main.go` - Startup sequence: ensure tables → load/build graph → backfill resolved paths → re-pick best observations ### Modified: `cmd/server/routes.go` - `mapSliceToTransmissions()` and `mapSliceToObservations()` propagate `resolved_path` - Node paths handler uses `resolveWithContext` with graph ### Modified: `cmd/server/types.go` - `TransmissionResp` and `ObservationResp` gain `ResolvedPath []*string` with `omitempty` ### New file: `cmd/server/neighbor_persist_test.go` 16 tests covering: - Path resolution (unambiguous, empty, unresolvable prefixes) - Marshal/unmarshal of resolved_path JSON - SQLite table creation and column migration (idempotent) - Edge persistence and loading - Schema detection - Full Load() with resolved_path - API response serialization (present when set, omitted when nil) ## Design decisions 1. **Async persistence** — resolved paths and neighbor edges are written to SQLite in a goroutine to avoid blocking the ingest loop. The in-memory state is authoritative. 2. **Schema compatibility** — `DB.hasResolvedPath` flag allows the server to work with databases that don't yet have the `resolved_path` column. SQL queries dynamically include/exclude the column. 3. **`pm.resolve()` retained** — Not removed as dead code because existing tests use it directly. All production call sites now use `resolveWithContext` with the persisted graph. 4. **Edge persistence is conservative** — Only unambiguous edges (single candidate) are persisted to `neighbor_edges`. Ambiguous prefixes are handled by the in-memory `NeighborGraph` via Jaccard disambiguation. 5. **`null` = unresolved** — Ambiguous prefixes store `null` in the resolved_path array. Frontend falls back to prefix display. ## Performance - `resolveWithContext` per hop: ~1-5μs (map lookups, no DB queries) - Typical packet has 0-5 hops → <25μs total resolution overhead per packet - Edge/path persistence is async → zero impact on ingest latency - Backfill is one-time on first startup with the new column ## Test results ``` cd cmd/server && go test ./... -count=1 → ok (4.4s) cd cmd/ingestor && go test ./... -count=1 → ok (25.5s) ``` --------- Co-authored-by: you <you@example.com> |
||
|
|
709e5a4776 |
fix: observer filter drops groups in grouped packets view (#464) (#531)
## Summary - When `groupByHash=true`, each group only carries its representative (best-path) `observer_id`. The client-side filter was checking only that field, silently dropping groups that were seen by the selected observer but had a different representative. - `loadPackets` now passes the `observer` param to the server so `filterPackets`/`buildGroupedWhere` do the correct "any observation matches" check. - Client-side observer filter in `renderTableRows` is skipped for grouped mode (server already filtered correctly). - Both `db.go` and `store.go` observer filtering extended to support comma-separated IDs (multi-select UI). ## Test plan - [ ] Set an observer filter on the Packets screen with grouping enabled — all groups that have **any** observation from the selected observer(s) should appear, not just groups where that observer is the representative - [ ] Multi-select two observers — groups seen by either should appear - [ ] Toggle to flat (ungrouped) mode — per-observation filter still works correctly - [ ] Existing grouped packets tests pass: `cd cmd/server && go test ./...` Fixes #464 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: you <you@example.com> |
||
|
|
b1d89d7d9f |
fix: apply region filter in GetNodes — was silently ignored (#496) (#497)
## Summary - `db.GetNodes` accepted a `region` param from the HTTP handler but never used it — every region-filter selection was silently ignored and all nodes were always returned - Added a subquery filtering `nodes.public_key` against ADVERT transmissions (payload_type=4) observed by observers with matching IATA codes - Handles both v2 (`observer_id TEXT`) and v3 (`observer_idx INT`) schemas ## Test plan - [x] 4 new subtests added to `TestGetNodesFiltering`: SJC (1 node), SFO (1 node), SJC,SFO multi (1 node deduped), AMS unknown (0 nodes) - [x] All existing Go tests still pass - [x] Deploy to staging, open `/nodes`, select a region in the filter bar — only nodes observed by observers in that region should appear Closes #496 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: you <you@example.com> |
||
|
|
f87eb3601c |
fix: graceful container shutdown for reliable deployments (#453)
## Summary Fixes #450 — staging deployment flaky due to container not shutting down cleanly. ## Root Causes 1. **Server never closed DB on shutdown** — SQLite WAL lock held indefinitely, blocking new container startup 2. **`httpServer.Close()` instead of `Shutdown()`** — abruptly kills connections instead of draining them 3. **No `stop_grace_period` in compose configs** — Docker sends SIGTERM then immediately SIGKILL (default 10s is often not enough for WAL checkpoint) 4. **Supervisor didn't forward SIGTERM** — missing `stopsignal`/`stopwaitsecs` meant Go processes got SIGKILL instead of graceful shutdown 5. **Deploy scripts used default `docker stop` timeout** — only 10s grace period ## Changes ### Go Server (`cmd/server/`) - **Graceful HTTP shutdown**: `httpServer.Shutdown(ctx)` with 15s context timeout — drains in-flight requests before closing - **WebSocket cleanup**: New `Hub.Close()` method sends `CloseGoingAway` frames to all connected clients - **DB close on shutdown**: Explicitly closes DB after HTTP server stops (was never closed before) - **WAL checkpoint**: `PRAGMA wal_checkpoint(TRUNCATE)` before DB close — flushes WAL to main DB file and removes WAL/SHM lock files ### Go Ingestor (`cmd/ingestor/`) - **WAL checkpoint on shutdown**: New `Store.Checkpoint()` method, called before `Close()` - **Longer MQTT disconnect timeout**: 5s (was 1s) to allow in-flight messages to drain ### Docker Compose (all 4 variants) - Added `stop_grace_period: 30s` and `stop_signal: SIGTERM` ### Supervisor Configs (both variants) - Added `stopsignal=TERM` and `stopwaitsecs=20` to server and ingestor programs ### Deploy Scripts - `deploy-staging.sh`: `docker stop -t 30` with explicit grace period - `deploy-live.sh`: `docker stop -t 30` with explicit grace period ## Shutdown Sequence (after fix) 1. Docker sends SIGTERM to supervisord (PID 1) 2. Supervisord forwards SIGTERM to server + ingestor (waits up to 20s each) 3. Server: stops poller → drains HTTP (15s) → closes WS clients → checkpoints WAL → closes DB 4. Ingestor: stops tickers → disconnects MQTT (5s) → checkpoints WAL → closes DB 5. Docker waits up to 30s total before SIGKILL ## Tests All existing tests pass: - `cd cmd/server && go test ./...` ✅ - `cd cmd/ingestor && go test ./...` ✅ --------- Co-authored-by: you <you@example.com> Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com> |
||
|
|
fe314be3a8 |
feat: geo_filter enforcement, DB pruning, geofilter-builder tool, HB column (#215)
## Summary
Several features and fixes from a live deployment of the Go v3.0.0
backend.
### geo_filter — full enforcement
- **Go backend config** (`cmd/server/config.go`,
`cmd/ingestor/config.go`): added `GeoFilterConfig` struct so
`geo_filter.polygon` and `bufferKm` from `config.json` are parsed by
both the server and ingestor
- **Ingestor** (`cmd/ingestor/geo_filter.go`, `cmd/ingestor/main.go`):
ADVERT packets from nodes outside the configured polygon + buffer are
dropped *before* any DB write — no transmission, node, or observation
data is stored
- **Server API** (`cmd/server/geo_filter.go`, `cmd/server/routes.go`):
`GET /api/config/geo-filter` endpoint returns the polygon + bufferKm to
the frontend; `/api/nodes` responses filter out any out-of-area nodes
already in the DB
- **Frontend** (`public/map.js`, `public/live.js`): blue polygon overlay
(solid inner + dashed buffer zone) on Map and Live pages, toggled via
"Mesh live area" checkbox, state shared via localStorage
### Automatic DB pruning
- Add `retention.packetDays` to `config.json` to delete transmissions +
observations older than N days on a daily schedule (1 min after startup,
then every 24h). Nodes and observers are never pruned.
- `POST /api/admin/prune?days=N` for manual runs (requires `X-API-Key`
header if `apiKey` is set)
```json
"retention": {
"nodeDays": 7,
"packetDays": 30
}
```
### tools/geofilter-builder.html
Standalone HTML tool (no server needed) — open in browser, click to
place polygon points on a Leaflet map, set `bufferKm`, copy the
generated `geo_filter` JSON block into `config.json`.
### scripts/prune-nodes-outside-geo-filter.py
Utility script to clean existing out-of-area nodes from the database
(dry-run + confirm). Useful after first enabling geo_filter on a
populated DB.
### HB column in packets table
Shows the hop hash size in bytes (1–4) decoded from the path byte of
each packet's raw hex. Displayed as **HB** between Size and Type
columns, hidden on small screens.
## Test plan
- [x] ADVERT from node outside polygon is not stored (no new row in
nodes or transmissions)
- [x] `GET /api/config/geo-filter` returns polygon + bufferKm when
configured, `{polygon: null, bufferKm: 0}` when not
- [x] `/api/nodes` excludes nodes outside polygon even if present in DB
- [x] Map and Live pages show blue polygon overlay when configured;
checkbox toggles it
- [x] `retention.packetDays: 30` deletes old transmissions/observations
on startup and daily
- [x] `POST /api/admin/prune?days=30` returns `{deleted: N, days: 30}`
- [x] `tools/geofilter-builder.html` opens standalone, draws polygon,
copies valid JSON
- [x] HB column shows 1–4 for all packets in grouped and flat view
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|
|
b51ced8655 |
Wire channel region filtering end-to-end
Pass region through channel message routes, apply DB/store filtering, normalize IATA at read and write boundaries, and add regression coverage for routes/server/ingestor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
5aa4fbb600 | chore: normalize all files to LF line endings | ||
|
|
f5d0ce066b |
refactor: remove packets_v SQL fallbacks — store handles all queries (#220)
* refactor: remove all packets_v SQL fallbacks — store handles all queries Remove DB fallback paths from all route handlers. The in-memory PacketStore now handles all packet/node/analytics queries. Handlers return empty results or 404 when no store is available instead of falling back to direct DB queries. - Remove else-DB branches from handlePacketDetail, handleNodeHealth, handleNodeAnalytics, handleBulkHealth, handlePacketTimestamps, etc. - Remove unused DB methods (GetPacketByHash, GetTransmissionByID, GetPacketByID, GetObservationsForHash, GetTimestamps, GetNodeHealth, GetNodeAnalytics, GetBulkHealth, etc.) - Remove packets_v VIEW creation from schema - Update tests for new behavior (no-store returns 404/empty, not 500) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address PR #220 review comments Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: KpaBap <kpabap@gmail.com> |
||
|
|
54cbc648e0 |
feat: decode telemetry from adverts — battery voltage + temperature on nodes
Sensor nodes embed telemetry (battery_mv, temperature_c) in their advert appdata after the null-terminated name. This commit adds decoding and storage for both the Go ingestor and Node.js backend. Changes: - decoder.go/decoder.js: Parse telemetry bytes from advert appdata (battery_mv as uint16 LE millivolts, temperature_c as int16 LE /100) - db.go/db.js: Add battery_mv INTEGER and temperature_c REAL columns to nodes and inactive_nodes tables, with migration for existing DBs - main.go/server.js: Update node telemetry on advert processing - server db.go: Include battery_mv/temperature_c in node API responses - Tests: Decoder telemetry tests (positive, negative temp, no telemetry), DB migration test, node telemetry update test, server API shape tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
f374a4a775 |
fix: enforce consistent types between Go ingestor writes and server reads
Schema: - observers.noise_floor: INTEGER → REAL (dBm has decimals) - battery_mv, uptime_secs remain INTEGER (always whole numbers) Ingestor write side (cmd/ingestor/db.go): - UpsertObserver now accepts ObserverMeta with battery_mv (int), uptime_secs (int64), noise_floor (float64) - COALESCE preserves existing values when meta is nil - Added migration: cast integer noise_floor values to REAL Ingestor MQTT handler (cmd/ingestor/main.go — already updated): - extractObserverMeta extracts hardware fields from status messages - battery_mv/uptime_secs cast via math.Round to int on write Server read side (cmd/server/db.go): - Observer.BatteryMv: *float64 → *int (matches INTEGER storage) - Observer.UptimeSecs: *float64 → *int64 (matches INTEGER storage) - Observer.NoiseFloor: *float64 (unchanged, matches REAL storage) - GetObservers/GetObserverByID: use sql.NullInt64 intermediaries for battery_mv/uptime_secs, sql.NullFloat64 for noise_floor Proto (proto/observer.proto — already correct): - battery_mv: int32, uptime_secs: int64, noise_floor: double Tests: - TestUpsertObserverWithMeta: verifies correct SQLite types via typeof() - TestUpsertObserverMetaPreservesExisting: nil-meta preserves values - TestExtractObserverMeta: float-to-int rounding, empty message - TestSchemaNoiseFloorIsReal: PRAGMA table_info validation - TestObserverTypeConsistency: server reads typed values correctly - TestObserverTypesInGetObservers: list endpoint type consistency Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
1619f4857e |
fix: noise_floor/battery_mv/uptime_secs scanned as float64 to handle REAL values
SQLite stores these as REAL on some instances. Go *int scan silently fails, dropping the entire observer row (404 on detail, missing from list). Reported for YC-Base-Repeater and YC-Work-Repeater. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
9ebfd40aa0 |
fix: filter garbage channel names from /api/channels, fixes #201
Channels with garbage-decrypted names (pre-#197 data still in DB) are now filtered at the API level using the same non-printable character heuristic from #197. Applied in both Node.js server.js and Go server (store.go, db.go). No data is deleted — only filtered from API responses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
77988ded3e |
fix: #184-#189 — sanitize names, packetsLast24h, ReadMemStats cache, dup name indicator, heatmap warning
#184: Strip non-printable chars (<0x20 except tab/newline) from ADVERT names in Go server decoder, Go ingestor decoder, and Node decoder.js. #185: Add visual (N) badge next to node names when multiple nodes share the same display name (case-insensitive). Shows in list, side pane, and full detail page with 'also known as' links to other keys. #186: Add packetsLast24h field to /api/stats response. #187 #188: Cache runtime.ReadMemStats() with 5s TTL in Go server. #189: Temporarily patch HTMLCanvasElement.prototype.getContext during L.heatLayer().addTo(map) to pass { willReadFrequently: true }, preventing Chrome console warning about canvas readback performance. Tests: 10 new tests for buildDupNameMap + dupNameBadge (143 total frontend). Cache busters bumped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
2435f2eaaf |
fix: observation timestamps, leaked fields, perf path normalization
- #178: Use strftime ISO 8601 format instead of datetime() for observation timestamps in all SQL queries (v3 + v2 views). Add normalizeTimestamp() helper for non-v3 paths that may store space-separated timestamps. - #179: Strip internal fields (decoded_json, direction, payload_type, raw_hex, route_type, score, created_at) from ObservationResp. Only expose id, transmission_id, observer_id, observer_name, snr, rssi, path_json, timestamp — matching Node.js parity. - #180: Remove _parsedDecoded and _parsedPath from node detail recentAdverts response. These internal/computed fields were leaking to the API. Updated golden shapes.json accordingly. - #181: Use mux route template (GetPathTemplate) for perf stats path normalization, converting {param} to :param for Node.js parity. Fallback to hex regex for unmatched routes. Compile regexes once at package level instead of per-request. fixes #178, fixes #179, fixes #180, fixes #181 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
df63efa78d |
fix: poll new observations for existing transmissions (fixes #174)
The poller only queried WHERE t.id > sinceID, which missed new observations added to transmissions already in the store. The trace page was correct because it always queries the DB directly. Add IngestNewObservations() that polls observations by o.id watermark, adds them to existing StoreTx entries, re-picks best observation, and invalidates analytics caches. The Poller now tracks both lastTxID and lastObsID watermarks. Includes tests for v3, v2, dedup, best-path re-pick, and GetMaxObservationID. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
64bf3744e2 |
fix: channels stale latest message from observation-timestamp ordering, fixes #171
db.GetChannels() queried packets_v (observation-level rows) ordered by observation timestamp and always overwrote lastMessage. When an older message had a later re-observation, it would overwrite the correct latest message with stale data. Fix: query transmissions table directly (one row per unique message) ordered by first_seen. This ensures lastMessage always reflects the most recently sent message, not the most recently observed one. Also fix db.GetChannelMessages() to use first_seen ordering with schema-aware queries (v2/v3), and add missing distCache/subpathCache invalidation on packet ingestion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
f55a3454aa |
feat(go): replace map[string]interface{} with typed Go structs in route handlers
Phase 1: Create cmd/server/types.go with ~80 typed response structs
matching all proto definitions. Every API response shape is now a
compile-time checked struct.
Phase 2: Rewire all route handlers in routes.go to construct typed
structs instead of map[string]interface{} for response building:
- /api/stats -> StatsResponse
- /api/health -> HealthResponse
- /api/perf -> PerfResponse
- /api/config/* -> typed config responses
- /api/nodes/* -> NodeListResponse, NodeDetailResponse, etc.
- /api/packets/* -> PacketListResponse, PacketDetailResponse
- /api/analytics/* -> RFAnalyticsResponse, TopologyResponse, etc.
- /api/observers/* -> ObserverListResponse, ObserverResp
- /api/channels/* -> ChannelListResponse, ChannelMessagesResponse
- /api/traces/* -> TraceResponse
- /api/resolve-hops -> ResolveHopsResponse
- /api/iata-coords -> IataCoordsResponse (typed IataCoord)
- /api/audio-lab/buckets -> AudioLabBucketsResponse
- WebSocket broadcast -> WSMessage struct
- SlowQuery tracking -> SlowQuery struct (was map)
Phase 3 (partial): Add typed store/db methods:
- PacketStore.GetCacheStatsTyped() -> CacheStats
- PacketStore.GetPerfStoreStatsTyped() -> PerfPacketStoreStats
- DB.GetDBSizeStatsTyped() -> SqliteStats
Remaining map usage is in store/db data flow (PacketResult.Packets
still uses maps) — these will be addressed in a follow-up.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
||
|
|
2f5404edc3 |
fix: close last parity gaps in /api/perf and /api/nodes/:pubkey
- db.go: Add freelistMB (PRAGMA freelist_count * page_size) and walPages (PRAGMA wal_checkpoint(PASSIVE)) to GetDBSizeStats - store.go: Add advertByObserver count to GetPerfStoreStats indexes (count distinct pubkeys with ADVERT observations) - db.go: Add getObservationsForTransmissions helper; enrich GetRecentTransmissionsForNode results with observations array, _parsedPath, and _parsedDecoded - db_test.go: Add second ADVERT with different hash_size to seed data so hash_sizes_seen is populated; enrich decoded_json with full ADVERT fields; update count assertions for new seed row fixes #151, fixes #152 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
93dbe0e909 |
fix(go): add runtime stats to /api/perf and /api/health, fixes #143
- /api/perf: add goRuntime (heap, GC, goroutines, CPU), packetStore stats (totalLoaded, observations, index sizes, estimatedMB), sqlite stats (dbSizeMB, walSizeMB, row counts), real RF cache hit/miss tracking, and endpoint sorting by total time spent - /api/health: add memory.heapMB, goRuntime (goroutines, gcPauses, numCPU), real packetStore packet count and estimatedMB, real cache stats from RF cache; remove hardcoded-zero eventLoop - store.go: add cacheHits/cacheMisses tracking in GetAnalyticsRF, GetPerfStoreStats() and GetCacheStats() methods - db.go: add path field to DB struct, GetDBSizeStats() for file sizes and row counts - Tests: verify new fields in health/perf endpoints, add TestGetDBSizeStats, wire up PacketStore in test server setup Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
dbac8e9d52 |
fix: Go server since/until filter uses observation timestamp, not first_seen
The frontend sends ISO timestamps to filter by observation time. Go was filtering by transmission first_seen which missed packets with recent observations but old first_seen. Now converts ISO to unix epoch and queries the observations table directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
5c68605f2c |
feat(go-server): full API parity with Node.js server
Performance: - QueryGroupedPackets: 8s → <100ms (transmissions table, not packets_v VIEW) Field parity: - /api/stats: totalNodes uses 7-day window, added totalNodesAllTime - /api/stats: role counts filtered by 7-day (matching Node.js) - /api/nodes: role counts use all-time (matching Node.js) - /api/packets/🆔 path field returns parsed path_json hops - /api/packets: added multi-node filter (?nodes=pk1,pk2) - /api/observers: packetsLastHour, lat, lon, nodeRole computed - /api/observers/🆔 packetsLastHour computed - /api/nodes/bulk-health: per-node stats from SQL Tests updated with dynamic timestamps for 7-day filter compat. All tests pass, go vet clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
e18a73e1f2 |
feat: Go server API parity with Node.js — response shapes, perf, computed fields
- Packets query rewired from packets_v VIEW (9s) to direct table joins (~50ms) - Packet response: added first_seen, observation_count; removed created_at, score - Node response: added last_heard, hash_size, hash_size_inconsistent - Schema-aware v2/v3 detection for observer_idx vs observer_id - All Go tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
842b49e8c4 |
perf: fast-path count for unfiltered /api/packets (skip packets_v scan)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
742ed86596 |
feat: add Go web server (cmd/server/) — full API + WebSocket + static files
35+ REST endpoints matching Node.js server, WebSocket broadcast, static file serving with SPA fallback, config.json support. Uses modernc.org/sqlite (pure Go, no CGO required). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |