mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-06-04 04:51:17 +00:00
bfebf200b754cf3dddffc7ca5aa3908962ba725f
44 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
317b59ab10 |
feat: area-based visual node filter — attribute packets by transmitter GPS (#804) (#839)
## Summary - Adds configurable GPS polygon areas to `config.json`; nodes are attributed to an area if their last-known position falls inside the polygon - New `Area: …` dropdown filter (matching the existing region filter style) appears on all analytics, nodes, packets, map, and live screens when areas are configured - Backend resolves area membership with a 30s TTL cache; area filter bypasses the 500-node cap on `/api/bulk-health` so all area nodes are always returned - Includes a polygon builder tool (`/area-map.html`) for drawing and exporting area boundaries ## Changes **Backend** - `AreaEntry` type + `Areas` config field - `GetNodePubkeysInArea` DB query + `resolveAreaNodes` (30s TTL, `areaNodeMu` RWMutex) - `PacketQuery.Area` + `filterPackets` polygon check - `?area=` param propagated through all analytics, topology, clock-health, and bulk-health routes - `/api/config/areas` endpoint **Frontend** - `area-filter.js`: single-select dropdown, persists to localStorage, cleans up stale keys on load - Wired into analytics, nodes, packets, channels, map, and live pages - Live map clears node markers on area change **Docs & tools** - `docs/user-guide/area-filter.md` — configuration and usage guide - `docs/api-spec.md` — updated with new endpoint and `?area=` param table - `tools/area-map.html` — polygon builder for defining area boundaries - Demo areas added to `config.example.json` ## Test plan - [x] No areas configured → filter dropdown does not appear on any page - [x] Areas configured → dropdown appears, "All" selected by default - [x] Selecting an area filters nodes/packets/topology/map correctly - [x] Selecting "All" restores unfiltered view - [x] Selection persists across page reloads (localStorage) - [x] Stale localStorage key (area removed from config) is cleared on load - [x] `/api/bulk-health?area=X` returns all nodes in area (no 500-node cap) - [x] `/api/config/areas` returns correct list 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Kpa-clawbot <kpaclawbot@outlook.com> Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
caf3851ff8 |
feat(server): add opt-in HTTP gzip and WebSocket permessage-deflate compression (#934)
## Summary
- Adds `"compression": {"gzip": true, "websocket": true}` config option
(both `false` by default — no behavior change)
- HTTP gzip middleware wraps the entire router; skips WebSocket upgrade
requests and clients without `Accept-Encoding: gzip`
- WebSocket permessage-deflate enabled via
`hub.upgrader.EnableCompression` when `websocket: true`
- `CompressionConfig` struct and `GZipEnabled()` /
`WSCompressionEnabled()` helpers on `Config`
- `Hub.upgrader` moved from package-level var to struct field so tests
using `NewHub()` don't need changes
## Why opt-in / off by default
Operators behind a reverse proxy that already compresses (nginx, Caddy
with `encode gzip`) should leave this off to avoid double-compression.
Only enable when the proxy does **not** compress.
## Test plan
- [x] `TestCompressionConfigDefaults` — both helpers return false when
`Compression` is nil
- [x] `TestCompressionConfigExplicitFalse` — both helpers return false
when set to false
- [x] `TestCompressionConfigEnabled` — both helpers return true when set
to true
- [x] `TestGZipMiddlewareCompresses` — response body is valid gzip,
headers set correctly
- [x] `TestGZipMiddlewareSkipsNoAcceptEncoding` — passthrough when
client doesn't send Accept-Encoding: gzip
- [x] `TestGZipMiddlewareSkipsWebSocket` — WebSocket upgrades are never
gzip-wrapped
All 6 tests pass (`go test ./...` in `cmd/server`).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: OpenClaw Bot <bot@openclaw.local>
Co-authored-by: efiten-bot <bot@efiten.dev>
|
||
|
|
51f823bf7e |
feat: one-click prune nodes outside geofilter (#669 M4) (#738)
## Summary - Adds `POST /api/admin/prune-geo-filter` endpoint — dry-run by default, `?confirm=true` to permanently delete nodes outside the current geofilter polygon + buffer. Requires `X-API-Key` header. - Adds **Prune nodes** section inside the GeoFilter customizer tab (write-access only, same `writeEnabled` gate as PUT). **Preview** lists affected nodes; **Confirm delete** removes them. - Adds `GetNodesForGeoPrune` and `DeleteNodesByPubkeys` DB helpers. - Updates `docs/user-guide/geofilter.md` — documents the UI button as primary workflow, CLI script as alternative. > **Depends on M3** (`feat/geofilter-m3-customizer`, PR #736). Merge M3 first. ## Test plan - [x] `cd cmd/server && go test ./...` — all pass - [x] Customizer GeoFilter tab without `apiKey` — Prune section not visible - [x] With `apiKey` + polygon active — Prune section visible - [x] **Preview** returns list of nodes outside polygon (no deletions) - [x] **Confirm delete** removes nodes, list clears - [x] `POST /api/admin/prune-geo-filter` without `X-API-Key` → 401 - [x] `POST /api/admin/prune-geo-filter` with no polygon configured → 400 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
9383201c07 |
refactor(db): finish #1283 — Option 4: ingestor owns neighbor-graph + schema migrations; server is read-only (fixes #1287) (#1289)
Red commit:
https://github.com/Kpa-clawbot/CoreScope/commit/eae179b99b5fd34924547632aa8f8025c405aa53
(CI: pending — opens with this PR)
Finishes #1283. RED test `TestServerSourceHasNoCachedRWCalls` goes from
failing (13 writer call-sites) to GREEN (zero). Per #1287 Option 4
(https://github.com/Kpa-clawbot/CoreScope/issues/1287#issuecomment-4485099992):
ingestor owns the neighbor graph build + persist; server reads the
snapshot.
**Category A — Schema migrations** → new `internal/dbschema` package.
`dbschema.Apply(rw)` runs in `cmd/ingestor` startup (in `OpenStore`).
`dbschema.AssertReady(ro)` runs in `cmd/server/main.go` and
FATAL-LOG-EXITS if any expected column/index/table is missing — the
operator must restart the ingestor first. Covers indexes,
`neighbor_edges`, `observations.resolved_path`,
`observers.{inactive,last_packet_at,iata}`,
`(inactive_)nodes.foreign_advert`, `transmissions.from_pubkey`.
**Category B — Backfill** → ingestor.
`BackfillFromPubkey` and observer-blacklist soft-delete moved to
`cmd/ingestor/maintenance.go`. Server keeps an inert
`fromPubkeyBackfillSnapshot` stub for `/api/healthz` API compatibility.
**Category C — Neighbor-graph persistence (Option 4)** → ingestor
writes, server reads.
- Ingestor (`cmd/ingestor/neighbor_builder.go`): every 60s scans
`observations + transmissions`, extracts edges (originator↔first-hop for
ADVERTs; observer↔last-hop for all), resolves hop prefixes via a
node-table prefix index, upserts into `neighbor_edges`.
- Server (`cmd/server/neighbor_recomputer.go`): every 60s re-reads
`neighbor_edges` and atomic-swaps the resulting `NeighborGraph` into
`s.graph`. Initial load is synchronous on startup. All server-side
incremental edge writers (the two `asyncPersistResolvedPathsAndEdges`
paths in `cmd/server/store.go`) are gone.
- Neighbor-edge daily prune (`PruneNeighborEdges`) moved to ingestor.
**Why Option 4**: clean read/write separation, no startup CPU spike
(server loads existing snapshot instead of rebuilding from history), no
IPC/delta-protocol churn. Staleness budget ~60s — same model as the
analytics recomputers in #1240 / #1248 / #672 axis 2.
**Recomputer interval default for neighbor graph**: 60s
(`NeighborGraphRecomputerDefaultInterval`,
`NeighborEdgesBuilderInterval`).
**Invariants added**:
- `TestServerSourceHasNoCachedRWCalls` (RED commit
|
||
|
|
1da2034341 |
refactor(db): move all writes from server to ingestor; server truly read-only (fixes #1283) (#1286)
**Red commit:**
|
||
|
|
8bf7709970 |
feat(repeater): usefulness score — bridge axis (#672 axis 2 of 4) (#1275)
RED test commit: `fd661569` — CI will fail on this (stub returns empty map; assertions fail by design). GREEN: `bf4b8592`. ## What Implements **axis 2 of 4** for the repeater usefulness score per #672 ([status comment](https://github.com/Kpa-clawbot/CoreScope/issues/672#issuecomment-4484635378)). The Bridge axis measures *structural importance*: how many shortest paths between other nodes route through this one. A high-traffic redundant node and a low-traffic critical bridge will no longer look identical. ## Algorithm **Brandes' weighted betweenness centrality** with Dijkstra for shortest paths (`cmd/server/bridge_score.go`). - Nodes: pubkeys in the `neighbor_edges` graph - Edge weight: `Score(now) * Confidence()` — per the convention from #1235 (count + recency decay scaled by observer-diversity confidence). Geo-rejected edges already excluded at graph build time (#1230) so we don't re-filter here. - Dijkstra distance: `1 / max(epsilon, weight)` — high affinity = cheap cost. - Normalize: divide by max observed centrality so output is in `[0, 1]`. Cost: `O(V · (E + V log V))`. Staging-scale (~600 nodes / ~2 000 edges) ≈ ~4.8M ops, completes in milliseconds. ## Where it lives - `cmd/server/bridge_score.go` — pure algorithm, no locks - `cmd/server/bridge_recomputer.go` — background recomputer (mirrors #1240/#1262 pattern), 5-min default interval, initial sync prewarm, snapshot stored in `s.bridgeScoreMap atomic.Pointer[map[string]float64]` - `cmd/server/routes.go` — `handleNodes` adds `node["bridge_score"]` on repeater/room rows; node-detail handler adds it on the single-node path - `public/nodes.js` — separate **Bridge** row in the node detail panel, alongside the existing **Usefulness** (Traffic) row. Distinct colour-coded bar. ## What's NOT in this PR (still pending for #672) - **Coverage axis** (axis 3) — unique observer-pair connectivity - **Redundancy axis** (axis 4) — simulated node-removal impact - **Composite** — once all 4 axes ship, swap the `usefulness_score` formula from "traffic-only" to the weighted composite `Refs #672` (not `Fixes` — issue stays open until all 4 axes + composite ship). ## Tests - `TestComputeBridgeScores_LineGraph` — 4-node line: middles non-zero, leaves zero, max normalized to 1.0 - `TestComputeBridgeScores_TriangleNoBridge` — clique has zero bridges - `TestComputeBridgeScores_Empty` — defensive nil-safety - `TestComputeBridgeScores_WeightSensitive` — mutation guard: revert the `1/w` inversion and this test fails - `TestBridgeScore_HandleNodesSurface` — integration: `/api/nodes` returns `bridge_score` on repeater rows; middle nodes > 0, ends == 0 --------- Co-authored-by: clawbot <bot@meshcore.local> |
||
|
|
ae17a2be12 |
perf(#1262): /api/nodes?limit=2000 cold-miss 15.7s → <100ms — prewarm repeater enrichment cache (#1263)
RED commit: `22ce5736066142583017cad7303fa48d9e00ccf0` — CI on red: https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1262 ## Problem After #1260 added a 15s-TTL bulk cache for repeater enrichment in `handleNodes`, `/api/nodes` (default limit) dropped to ~500ms. But `/api/nodes?limit=2000` — called by `public/live.js` at SPA startup for hop resolution — still took **15.7s cold** on staging (75k tx, 600 nodes). Warm hits were ~40ms. Root cause: the bulk cache was lazily populated on the first request after TTL expiry. The rebuild ran on the request-serving goroutine. Every cold SPA load triggered the rebuild and ate 15s. ## Fix Add `StartRepeaterEnrichmentRecomputer` — a steady-state background recomputer that mirrors the `analytics_recomputer.go` pattern from #1240: - **Prewarm**: initial synchronous compute on Start so the first request hits a populated cache. - **Steady-state**: ticker refreshes the snapshot every 5min (configurable via the existing analytics recompute interval knob). - **Panic-safe** + idempotent Start. Wired into `main.go` right after `StartAnalyticsRecomputers`, using `cfg.GetHealthThresholds().RelayActiveHours` as the window. ## Test `TestHandleNodesLimit2000ColdMiss` — seeds 600 nodes + 150k non-advert tx with repeaters indexed under a shared 1-byte hop prefix (matches production hop-prefix collisions), starts the recomputer, then issues `/api/nodes?limit=2000` with **no HTTP warmup**. | State | Latency | |---|---| | Before (master, on-thread rebuild) | 3.37s | | After (prewarm + steady-state) | 56ms | | Budget | 2s | Staging end-to-end: 15.7s → expected sub-100ms on the same call path. Red commit (`22ce5736066142583017cad7303fa48d9e00ccf0`) compiles with a no-op stub of the new method so the test fails on the latency **assertion**, not a missing symbol. Fixes #1262 --------- Co-authored-by: corescope-bot <bot@corescope.local> |
||
|
|
356f001027 |
perf(#1240): steady-state background recompute for analytics endpoints (#1248)
RED commit: `27630f6a` — adds latency test that fails on master (p99=225ms > 50ms budget) and a stub `StartAnalyticsRecomputers` that returns a no-op so the assertion (not a build error) gates the change. GREEN commit: `20fbbceb` — wires real background recompute infrastructure. Test passes at p99=~1µs. ## What changed Replaces the on-request "compute-then-cache" pattern for the default-shape analytics queries with a steady-state background recompute loop. Reads always hit an `atomic.Value` snapshot in <1µs regardless of compute cost or writer contention. Operator principle: serving slightly stale data quickly beats real-time data slowly. ## Endpoints converted (default 5min interval each) | Endpoint | Cold compute | Recomputer interval | |---|---|---| | `/api/analytics/topology` | ~5s | 5 min | | `/api/analytics/rf` | ~4s | 5 min | | `/api/analytics/distance` | ~3s | 5 min | | `/api/analytics/channels` | ~0.5s | 5 min | | `/api/analytics/hash-collisions` | ~0.5s | 5 min | | `/api/analytics/hash-sizes` | ~22ms | 5 min | All intervals configurable per-endpoint via `analytics.recomputeIntervalSeconds.<name>` in `config.json`; documented in `config.example.json`. Default override via `analytics.defaultIntervalSeconds`. ## Scope: default query only Only the canonical shape `(region="", window=zero)` is precomputed. Region- or window-filtered requests fall back to the legacy TTL cache + on-request compute — keeps recomputer count bounded (6, not 6×N×M). ## Latency Test `TestAnalyticsRecomputerSteadyStateLatency`: 100 concurrent readers + 4 writers churning `s.mu.Lock` on 20k distHops. - Before: p50=188ms p99=225ms (assertion failed) - After: p50=240ns p99=1.1µs (atomic load + map return) ## Shutdown integration `StartAnalyticsRecomputers` returns a stop closure invoked from `main.go`'s SIGTERM handler BEFORE `dbClose()` so any in-flight SQLite compute drains cleanly. `TestAnalyticsRecomputerShutdownNoLeak` confirms all 6 goroutines are reaped (Δ=6 within 2s). ## Safety details - Initial compute is synchronous in `Start()` — first read after startup never sees nil. - `recover()` inside `runOnce` keeps a compute panic from killing the goroutine; previous snapshot remains valid. - `analyticsRecomputerMu` is a sync.RWMutex; recomputer pointers are read-locked in the hot path. The atomic.Value swap inside `runOnce` is lock-free. Fixes #1240. --------- Co-authored-by: OpenClaw Bot <bot@openclaw.local> |
||
|
|
b881a09f02 |
feat(#1188): show observer IATA on packets + filter grammar (#1189)
Red commit:
|
||
|
|
eba9e89a72 |
fix(#1203): path-inspector — singleflight + stale-while-revalidate (#1208)
Red commit:
|
||
|
|
11d2026bb1 |
feat(startup): hot startup — load hotStartupHours synchronously, fill retentionHours in background (#1187)
Closes #1183 ## Summary - Adds `packetStore.hotStartupHours` config key (float64, default 0 = disabled). When set, `Load()` loads only that many hours of data synchronously, reducing startup time on large DBs. Background goroutine fills the remaining `retentionHours` window in daily chunks after startup completes. - A background goroutine (`loadBackgroundChunks`) fills the remaining `retentionHours` window in daily chunks after startup completes. Analytics indexes are rebuilt once at the end. - `QueryPackets` and `QueryGroupedPackets` check `oldestLoaded` and fall back to `db.QueryPackets()` for any query whose `Since`/`Until` predates the in-memory window — covering days 8–30 permanently (beyond `retentionHours`) and the background-fill gap during startup. - `/api/perf` gains `hotStartupHours`, `backgroundLoadComplete`, and `backgroundLoadProgress` fields inside `packetStore` so operators can monitor the fill. ### Drive-by fixes - E2E: added `gotoPackets` navigation helper used across packet-related tests - E2E: rewrote stripe assertion to check per-row stripe parity rather than a fragile computed-style comparison - E2E: theme test updated to use `#/home` as the initial route (was `#/`) - `db.go`: removed the RFC3339→unix-timestamp subquery path in `buildTransmissionWhere`; `t.first_seen` is now always compared directly as a string for both RFC3339 and non-RFC3339 inputs ## Configuration ```json "packetStore": { "retentionHours": 168, "hotStartupHours": 24 } ``` `hotStartupHours: 0` (default) preserves existing behavior exactly. Recommended for large DBs to reduce startup time; set to 0 to disable (loads full retentionHours at startup, legacy behavior). ## Test plan - [x] `TestHotStartupConfig_Clamp` — clamping when `hotStartupHours > retentionHours` - [x] `TestHotStartupConfig_ZeroIsDisabled` — zero leaves feature disabled - [x] `TestHotStartup_LoadsOnlyHotWindow` — only hot-window packets in memory after `Load()` - [x] `TestHotStartup_DisabledWhenZero` — all retention packets loaded when disabled - [x] `TestHotStartup_loadChunk_AddsOlderData` — chunk merges correctly, ASC order maintained - [x] `TestHotStartup_BackgroundFillsToRetention` — background goroutine fills to `retentionHours` - [x] `TestHotStartup_ChunkErrorRecovery` — chunk SQL failure logged and skipped, loop terminates - [x] `TestHotStartup_SQLFallback_TriggeredForOldDate` — query before `oldestLoaded` routes to SQL - [x] `TestHotStartup_SQLFallback_NotTriggeredForRecentDate` — recent query stays in-memory - [x] `TestHotStartup_PerfStats` — new fields present in `GetPerfStoreStats()` (backs the perf endpoint) - [x] `TestHotStartup_PerfStoreHTTP` — HTTP-level: GET /api/perf returns `hotStartupHours`, `backgroundLoadComplete`, `backgroundLoadProgress` in `packetStore` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: CoreScope Bot <bot@corescope.local> |
||
|
|
fb744d895f |
fix(#1143): structural pubkey attribution via from_pubkey column (#1152)
Fixes #1143. ## Summary Replaces the structurally unsound `decoded_json LIKE '%pubkey%'` (and `OR LIKE '%name%'`) attribution path with an exact-match lookup on a dedicated, indexed `transmissions.from_pubkey` column. This closes both holes documented in #1143: - **Hole 1** — same-name false positives via `OR LIKE '%name%'` - **Hole 2a** — adversarial spoofing: a malicious node names itself with another node's pubkey and gets attributed to the victim - **Hole 2b** — accidental false positive when any free-text field (path elements, channel names, message bodies) contains a 64-char hex substring matching a real pubkey - **Perf** — query now uses an index instead of a full-table scan against `LIKE '%substring%'` ## TDD Two-commit history shows red-then-green: | Commit | Status | Purpose | |---|---|---| | `7f0f08e` | RED — tests assertion-fail on master behaviour | Adversarial fixtures + spec | | `59327db` | GREEN — schema + ingestor + server + migration | Implementation | The red commit's test schema includes the new column so the file compiles, but the production code still uses LIKE — the assertions fail because the malicious / same-name / free-text rows are returned. The green commit changes the query plus adds the migration/ingest path. ## Changes ### Schema - new column `transmissions.from_pubkey TEXT` - new index `idx_transmissions_from_pubkey` ### Ingestor (`cmd/ingestor/`) - `PacketData.FromPubkey` populated from decoded ADVERT `pubKey` at write time. Cheap — already parsing `decoded_json`. Non-ADVERTs stay NULL. - `stmtInsertTransmission` writes the column. - Migration `from_pubkey_v1` ALTERs legacy DBs to add the column + index. - Bonus: rewrote the recipe in the gated one-shot `advert_count_unique_v1` migration to use `from_pubkey` (already marked done on existing DBs; kept correct for fresh installs). ### Server (`cmd/server/`) - `ensureFromPubkeyColumn` mirrors the ingestor migration so the server can boot against a DB the ingestor has never touched (e2e fixture, fresh installs). - `backfillFromPubkeyAsync` runs **after** HTTP starts. Scans `WHERE from_pubkey IS NULL AND payload_type = 4` in 5000-row chunks with a 100ms yield between chunks. Cannot block boot even on prod-sized DBs (100K+ transmissions). Queries handle NULL gracefully (return empty for that pubkey, same as today's unknown-pubkey path). - All in-scope LIKE call sites switched to exact match: | Site | Before | After | |---|---|---| | `buildPacketWhere` (was db.go:582) | `decoded_json LIKE '%pubkey%'` | `from_pubkey = ?` | | `buildTransmissionWhere` (was db.go:626) | `t.decoded_json LIKE '%pubkey%'` | `t.from_pubkey = ?` | | `GetRecentTransmissionsForNode` (was db.go:910) | `LIKE '%pubkey%' OR LIKE '%name%'` | `t.from_pubkey = ?` | | `QueryMultiNodePackets` (was db.go:1785) | `decoded_json LIKE '%pubkey%' OR ...` | `t.from_pubkey IN (?, ?, ...)` | | `advert_count_unique_v1` (was ingestor/db.go:257) | `decoded_json LIKE '%' \|\| nodes.public_key \|\| '%'` | `t.from_pubkey = nodes.public_key` | `GetRecentTransmissionsForNode` signature simplifies: the `name` parameter is gone (it was only ever used for the legacy `OR LIKE '%name%'` fallback). Sole caller in `routes.go:1243` updated. ### Tests - `cmd/server/from_pubkey_attribution_test.go` — adversarial fixtures + Hole 1/2a/2b/QueryMultiNodePackets exact-match assertions, EXPLAIN QUERY PLAN index check, migration backfill correctness. - `cmd/ingestor/from_pubkey_test.go` — write-time correctness (BuildPacketData populates FromPubkey for ADVERT only; InsertTransmission persists it; non-ADVERTs stay NULL). - Existing test schemas (server v2, server v3, coverage) get the new column **plus a SQLite trigger** that auto-populates `from_pubkey` from `decoded_json` on ADVERT inserts. This means existing fixtures (which only seed `decoded_json`) keep attributing correctly without per-test edits. - `seedTestData`'s ADVERTs explicitly set `from_pubkey`. ## Performance — index is used ``` $ EXPLAIN QUERY PLAN SELECT id FROM transmissions WHERE from_pubkey = ? SEARCH transmissions USING INDEX idx_transmissions_from_pubkey (from_pubkey=?) ``` Asserted in `TestFromPubkeyIndexUsed`. ## Migration approach - **Sync at boot**: `ALTER TABLE transmissions ADD COLUMN from_pubkey TEXT` is a metadata-only operation in SQLite — microseconds regardless of table size. `CREATE INDEX IF NOT EXISTS idx_transmissions_from_pubkey` is **not** metadata-only: it scans the table once. Empirically a few hundred ms on a 100K-row table; expect a few seconds on a 10M-row table (one-time cost, blocking boot during that window). Subsequent boots no-op via `IF NOT EXISTS`. If this boot delay becomes an operational concern at prod scale we can defer the `CREATE INDEX` to a goroutine — for now a few-second one-time delay is acceptable. - **Async**: row-level backfill of legacy NULL ADVERTs (chunked 5000 / 100ms yield). On a 100K-ADVERT prod DB, this completes in seconds in the background; HTTP is fully available throughout. - **Safety**: queries handle NULL gracefully — a node whose ADVERTs haven't backfilled yet returns empty, identical to today's behaviour for unknown pubkeys. No half-state regression. ## Out of scope (intentionally) The free-text `LIKE` paths the issue explicitly leaves alone (e.g. user-typed packet search) are untouched. Only the pubkey-attribution sites get the column treatment. ## Cycle-3 review fixes | Finding | Status | Commit | |---|---|---| | **M1c** — async-contract test was tautological (test's own `go`, not production's) | Fixed | `23ace71` (red) → `a05b50c` (green) | | **m1c** — package-global atomic resets unsafe under `t.Parallel()` | Fixed (`// DO NOT t.Parallel` comment + `Reset()` helper) | rolled into `23ace71` / `241ec69` | | **m2c** — `/api/healthz` read 3 atomics non-atomically (torn snapshot) | Fixed (single RWMutex-guarded snapshot + race test) | `241ec69` | | **n3c.m1** — vestigial OR-scaffolding in `QueryMultiNodePackets` | Fixed (cleanup) | `5a53ceb` | | **n3c.m2** — verify PR body language about `ALTER` vs `CREATE INDEX` | Verified accurate (already corrected in cycle 2) | (no change) | | **n3c.m3** — `json.Unmarshal` per row in backfill → could use SQL `json_extract` | **Deferred as known followup** — pure perf optimization (current per-row Unmarshal is correct, just slower); SQL rewrite would unwind the chunked-yield architecture and is non-trivial. Acceptable for one-time backfill at boot on legacy DBs. | ### M1c implementation detail `startFromPubkeyBackfill(dbPath, chunkSize, yieldDuration)` is now the single production entry point used by `main.go`. It internally does `go backfillFromPubkeyAsync(...)`. The test calls `startFromPubkeyBackfill` (no `go` prefix) and asserts the dispatch returns within 50ms — so if anyone removes the `go` keyword inside the wrapper, the test fails. **Manually verified**: removing the `go` keyword causes `TestBackfillFromPubkey_DoesNotBlockBoot` to fail with "backfill dispatch took ~1s (>50ms): not async — would block boot." ### m2c implementation detail `fromPubkeyBackfillTotal/Processed/Done` are now plain `int64`/`bool` package globals guarded by a single `sync.RWMutex`. `fromPubkeyBackfillSnapshot()` returns all three under one RLock. `TestHealthzFromPubkeyBackfillConsistentSnapshot` races a writer (lock-step total/processed updates with periodic done flips) against 8 readers hammering `/api/healthz`, asserting `processed<=total` and `(done => processed==total)` on every response. Verified the test catches torn reads (manually injected a 3-RLock implementation; test failed within milliseconds with "processed>total" and "done=true but processed!=total" errors). --------- Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: openclaw-bot <bot@openclaw.dev> |
||
|
|
136e1d23c8 |
feat(#730): foreign-advert detection — flag instead of silent drop (#1084)
## Summary **Partial fix for #730 (M1 only — M2 frontend and M3 alerting deferred).** Today the ingestor **silently drops** ADVERTs whose GPS lies outside the configured `geo_filter` polygon. That's the wrong default for an analytics tool — operators get zero visibility into bridged or leaked meshes. This PR makes the new default **flag, don't drop**: foreign adverts are stored, the node row is tagged `foreign_advert=1`, and the API surfaces `"foreign": true` so dashboards / map overlays can be built on top. ## Behavior | Mode | What happens to an ADVERT outside `geo_filter` | |---|---| | (default) flag | Stored, marked `foreign_advert=1`, exposed via API | | drop (legacy) | Silently dropped (preserves old behavior for ops who want it) | ## What's done (M1 — Backend) - ingestor stores foreign adverts instead of dropping - `nodes.foreign_advert` column added (migration) - `/api/nodes` and `/api/nodes/{pk}` expose `foreign: true` field - Config: `geofilter.action: "flag"|"drop"` (default `flag`) - Tests + config docs ## What's NOT done (deferred to M2 + M3) - **M2 — Frontend:** Map overlay showing foreign adverts as distinct markers, foreign-advert filter on packets/nodes pages, dedicated foreign-advert dashboard - **M3 — Alerting:** Time-series detection of bridging events, alert when foreign advert rate spikes, identify bridge entry-point nodes Issue #730 remains open for M2 and M3. --------- Co-authored-by: corescope-bot <bot@corescope> |
||
|
|
d05e468598 |
feat(memlimit): GOMEMLIMIT support, derive from packetStore.maxMemoryMB (#836) (#1077)
## Summary Implements **part 1** of #836 — `GOMEMLIMIT` support so the Go runtime self-throttles GC under cgroup memory pressure instead of getting SIGKILLed. (Parts 2 & 3 — bounded cold-load batching + README ops docs — land in follow-up PRs.) ## Behavior On startup `cmd/server/main.go` now calls `applyMemoryLimit(maxMemoryMB, envSet)`: | Condition | Action | Log | |---|---|---| | `GOMEMLIMIT` env set | Honor the runtime's parse, do nothing | `[memlimit] using GOMEMLIMIT from environment (...)` | | env unset, `packetStore.maxMemoryMB > 0` | `debug.SetMemoryLimit(maxMB * 1.5 MiB)` | `[memlimit] derived from packetStore.maxMemoryMB=512 → 768 MiB (1.5x headroom)` | | env unset, `maxMemoryMB == 0` | No-op | `[memlimit] no soft memory limit set ... recommend setting one to avoid container OOM-kill` | The 1.5x headroom covers Go's NextGC trigger at ~2× live heap (per #836 heap profile: 680 MB live → 1.38 GB NextGC). ## Tests (TDD red→green visible in commit history) - `TestApplyMemoryLimit_FromEnv` — env wins, function does not override - `TestApplyMemoryLimit_DerivedFromMaxMemoryMB` — verifies bytes computation + `debug.SetMemoryLimit` actually applied at runtime - `TestApplyMemoryLimit_None` — no env, no config → reports `"none"`, no side effect Red commit: `7de3c62` (assertion failures, builds clean) Green commit: `454516d` ## Config docs `config.example.json` `packetStore._comment_gomemlimit` documents env/derived/override behavior. ## Out of scope - Cold-load transient bounding (item 2 in #836) - README container-size table (item 3) - QA §1.1 rewrite Closes part 1 of #836. --------- Co-authored-by: corescope-bot <bot@corescope> |
||
|
|
dd2f044f2b |
fix: cache RW SQLite connection + dedup DBConfig (closes #921) (#982)
Closes #921 ## Summary Follow-up to #920 (incremental auto-vacuum). Addresses both items from the adversarial review: ### 1. RW connection caching Previously, every call to `openRW(dbPath)` opened a new SQLite RW connection and closed it after use. This happened in: - `runIncrementalVacuum` (~4x/hour) - `PruneOldPackets`, `PruneOldMetrics`, `RemoveStaleObservers` - `buildAndPersistEdges`, `PruneNeighborEdges` - All neighbor persist operations Now a single `*sql.DB` handle (with `MaxOpenConns(1)`) is cached process-wide via `cachedRW(dbPath)`. The underlying connection pool manages serialization. The original `openRW()` function is retained for one-shot test usage. ### 2. DBConfig dedup `DBConfig` was defined identically in both `cmd/server/config.go` and `cmd/ingestor/config.go`. Extracted to `internal/dbconfig/` as a shared package; both binaries now use a type alias (`type DBConfig = dbconfig.DBConfig`). ## Tests added | Test | File | |------|------| | `TestCachedRW_ReturnsSameHandle` | `cmd/server/rw_cache_test.go` | | `TestCachedRW_100Calls_SingleConnection` | `cmd/server/rw_cache_test.go` | | `TestGetIncrementalVacuumPages_Default` | `internal/dbconfig/dbconfig_test.go` | | `TestGetIncrementalVacuumPages_Configured` | `internal/dbconfig/dbconfig_test.go` | ## Verification ``` ok github.com/corescope/server 20.069s ok github.com/corescope/ingestor 47.117s ok github.com/meshcore-analyzer/dbconfig 0.003s ``` Both binaries build cleanly. 100 sequential `cachedRW()` calls return the same handle with exactly 1 entry in the cache map. --------- Co-authored-by: you <you@example.com> |
||
|
|
3364eed303 |
feat: separate "Last Status Update" from "Last Packet Observation" for observers (v3 rebase) (#969)
Rebased version of #968 (which was itself a rebase of #905) — resolves merge conflict with #906 (clock-skew UI) that landed on master. ## Conflict resolution **`public/observers.js`** — master (#906) added "Clock Offset" column to observer table; #968 split "Last Seen" into "Last Status" + "Last Packet" columns. Combined both: the table now has Status | Name | Region | Last Status | Last Packet | Packets | Packets/Hour | Clock Offset | Uptime. ## What this PR adds (unchanged from #968/#905) - `last_packet_at` column in observers DB table - Separate "Last Status Update" and "Last Packet Observation" display in observers list and detail page - Server-side migration to add the column automatically - Backfill heuristic for existing data - Tests for ingestor and server ## Verification - All Go tests pass (`cmd/server`, `cmd/ingestor`) - Frontend tests pass (`test-packets.js`, `test-hash-color.js`) - Built server, hit `/api/observers` — `last_packet_at` field present in JSON - Observer table header has all 9 columns including both Last Packet and Clock Offset ## Prior PRs - #905 — original (conflicts with master) - #968 — first rebase (conflicts after #906 landed) - This PR — second rebase, resolves #906 conflict Supersedes #968. Closes #905. --------- Co-authored-by: you <you@example.com> |
||
|
|
b3a9677c52 |
feat(ingestor + server): observerBlacklist config (#962) (#963)
## Summary Implements `observerBlacklist` config — mirrors the existing `nodeBlacklist` pattern for observers. Drop observers by pubkey at ingest, with defense-in-depth filtering on the server side. Closes #962 ## Changes ### Ingestor (`cmd/ingestor/`) - **`config.go`**: Added `ObserverBlacklist []string` field + `IsObserverBlacklisted()` method (case-insensitive, whitespace-trimmed) - **`main.go`**: Early return in `handleMessage` when `parts[2]` (observer ID from MQTT topic) matches blacklist — before status handling, before IATA filter. No UpsertObserver, no observations, no metrics insert. Log line: `observer <pubkey-short> blacklisted, dropping` ### Server (`cmd/server/`) - **`config.go`**: Same `ObserverBlacklist` field + `IsObserverBlacklisted()` with `sync.Once` cached set (same pattern as `nodeBlacklist`) - **`routes.go`**: Defense-in-depth filtering in `handleObservers` (skip blacklisted in list) and `handleObserverDetail` (404 for blacklisted ID) - **`main.go`**: Startup `softDeleteBlacklistedObservers()` marks matching rows `inactive=1` so historical data is hidden - **`neighbor_persist.go`**: `softDeleteBlacklistedObservers()` implementation ### Tests - `cmd/ingestor/observer_blacklist_test.go`: config method tests (case-insensitive, empty, nil) - `cmd/server/observer_blacklist_test.go`: config tests + HTTP handler tests (list excludes blacklisted, detail returns 404, no-blacklist passes all, concurrent safety) ## Config ```json { "observerBlacklist": [ "EE550DE547D7B94848A952C98F585881FCF946A128E72905E95517475F83CFB1" ] } ``` ## Verification (Rule 18 — actual server output) **Before blacklist** (no config): ``` Total: 31 DUBLIN in list: True ``` **After blacklist** (DUBLIN Observer pubkey in `observerBlacklist`): ``` [observer-blacklist] soft-deleted 1 blacklisted observer(s) Total: 30 DUBLIN in list: False ``` Detail endpoint for blacklisted observer returns **404**. All existing tests pass (`go test ./...` for both server and ingestor). --------- Co-authored-by: you <you@example.com> |
||
|
|
e1a1be1735 |
fix(server): add observers.inactive column at startup if missing (root cause of CI flake) (#961)
## The actual root cause PR #954 added `WHERE inactive IS NULL OR inactive = 0` to the server's observer queries, but the `inactive` column is only added by the **ingestor** migration (`cmd/ingestor/db.go:344-354`). When the server runs against a DB the ingestor never touched (e.g. the e2e fixture), the column doesn't exist: ``` $ sqlite3 test-fixtures/e2e-fixture.db "SELECT COUNT(*) FROM observers WHERE inactive IS NULL OR inactive = 0;" Error: no such column: inactive ``` The server's `db.QueryRow().Scan()` swallows that error → `totalObservers` stays 0 → `/api/observers` returns empty → map test fails with "No map markers/overlays found". This explains all the failing CI runs since #954 merged. PR #957 (freshen fixture) helped with the `nodes` time-rot but couldn't fix the missing-column problem. PR #960 (freshen observers) added the right timestamps but the column was still missing. PR #959 (data-loaded in finally) fixed a different real bug. None of those touched the actual mechanism. ## Fix Mirror the existing `ensureResolvedPathColumn` pattern: add `ensureObserverInactiveColumn` that runs at server startup, checks if the column exists via `PRAGMA table_info`, adds it with `ALTER TABLE observers ADD COLUMN inactive INTEGER DEFAULT 0` if missing. Wired into `cmd/server/main.go` immediately after `ensureResolvedPathColumn`. ## Verification End-to-end on a freshened fixture: ``` $ sqlite3 /tmp/e2e-verify.db "PRAGMA table_info(observers);" | grep inactive (no output — column absent) $ ./cs-fixed -port 13702 -db /tmp/e2e-verify.db -public public & [store] Added inactive column to observers $ curl 'http://localhost:13702/api/observers' returned=31 # was 0 before fix ``` `go test ./...` passes (19.8s). ## Lessons I should have run `sqlite3 fixture "SELECT ... WHERE inactive ..."` directly the first time the map test failed after #954 instead of writing four "fix" PRs that didn't address the actual mechanism. Apologies for the wild goose chase. Co-authored-by: Kpa-clawbot <bot@example.invalid> |
||
|
|
57e272494d |
feat(server): /api/healthz readiness endpoint gated on store load (#955) (#956)
## Summary Fixes RCA #2 from #955: the HTTP listener and `/api/stats` go live before background goroutines (pickBestObservation, neighbor graph build) finish, causing CI readiness checks to pass prematurely. ## Changes 1. **`cmd/server/healthz.go`** — New `GET /api/healthz` endpoint: - Returns `503 {"ready":false,"reason":"loading"}` while background init is running - Returns `200 {"ready":true,"loadedTx":N,"loadedObs":N}` once ready 2. **`cmd/server/main.go`** — Added `sync.WaitGroup` tracking pickBestObservation and neighbor graph build goroutines. A coordinator goroutine sets `readiness.Store(1)` when all complete. `backfillResolvedPathsAsync` is NOT gated (async by design, can take 20+ min). 3. **`cmd/server/routes.go`** — Wired `/api/healthz` before system endpoints. 4. **`.github/workflows/deploy.yml`** — CI wait-for-ready loop now polls `/api/healthz` instead of `/api/stats`. 5. **`cmd/server/healthz_test.go`** — Tests for 503-before-ready, 200-after-ready, JSON shape, and anti-tautology gate. ## Rule 18 Verification Built and ran against `test-fixtures/e2e-fixture.db` (499 tx): - With the small fixture DB, init completes in <300ms so both immediate and delayed curls return 200 - Unit tests confirm 503 behavior when `readiness=0` (simulating slow init) - On production DBs with 100K+ txs, the 503 window would be 5-15s (pickBestObservation processes in 5000-tx chunks with 10ms yields) ## Test Results ``` === RUN TestHealthzNotReady --- PASS === RUN TestHealthzReady --- PASS === RUN TestHealthzAntiTautology --- PASS ok github.com/corescope/server 19.662s (full suite) ``` Co-authored-by: you <you@example.com> |
||
|
|
aeae7813bc |
fix: enable SQLite incremental auto-vacuum so DB shrinks after retention (#919) (#920)
Closes #919 ## Summary Enables SQLite incremental auto-vacuum so the database file actually shrinks after retention reaper deletes old data. Previously, `DELETE` operations freed pages internally but never returned disk space to the OS. ## Changes ### 1. Auto-vacuum on new databases - `PRAGMA auto_vacuum = INCREMENTAL` set via DSN pragma before `journal_mode(WAL)` in the ingestor's `OpenStoreWithInterval` - Must be set before any tables are created; DSN ordering ensures this ### 2. Post-reaper incremental vacuum - `PRAGMA incremental_vacuum(N)` runs after every retention reaper cycle (packets, metrics, observers, neighbor edges) - N defaults to 1024 pages, configurable via `db.incrementalVacuumPages` - Noop on `auto_vacuum=NONE` databases (safe before migration) - Added to both server and ingestor ### 3. Opt-in full VACUUM for existing databases - Startup check logs a clear warning if `auto_vacuum != INCREMENTAL` - `db.vacuumOnStartup: true` config triggers one-time `PRAGMA auto_vacuum = INCREMENTAL; VACUUM` - Logs start/end time for operator visibility ### 4. Documentation - `docs/user-guide/configuration.md`: retention section notes that lowering retention doesn't immediately shrink the DB - `docs/user-guide/database.md`: new guide covering WAL, auto-vacuum, migration, manual VACUUM ### 5. Tests - `TestNewDBHasIncrementalAutoVacuum` — fresh DB gets `auto_vacuum=2` - `TestExistingDBHasAutoVacuumNone` — old DB stays at `auto_vacuum=0` - `TestVacuumOnStartupMigratesDB` — full VACUUM sets `auto_vacuum=2` - `TestIncrementalVacuumReducesFreelist` — DELETE + vacuum shrinks freelist - `TestCheckAutoVacuumLogs` — handles both modes without panic - `TestConfigIncrementalVacuumPages` — config defaults and overrides ## Migration path for existing databases 1. On startup, CoreScope logs: `[db] auto_vacuum=NONE — DB needs one-time VACUUM...` 2. Set `db.vacuumOnStartup: true` in config.json 3. Restart — VACUUM runs (blocks startup, minutes on large DBs) 4. Remove `vacuumOnStartup` after migration ## Test results ``` ok github.com/corescope/server 19.448s ok github.com/corescope/ingestor 30.682s ``` --------- Co-authored-by: you <you@example.com> |
||
|
|
a8e1cea683 |
fix: use payload type bits only in content hash (not full header byte) (#787)
## Problem The firmware computes packet content hash as: ``` SHA256(payload_type_byte + [path_len for TRACE] + payload) ``` Where `payload_type_byte = (header >> 2) & 0x0F` — just the payload type bits (2-5). CoreScope was using the **full header byte** in its hash computation, which includes route type bits (0-1) and version bits (6-7). This meant the same logical packet produced different content hashes depending on route type — breaking dedup and packet lookup. **Firmware reference:** `Packet.cpp::calculatePacketHash()` uses `getPayloadType()` which returns `(header >> PH_TYPE_SHIFT) & PH_TYPE_MASK`. ## Fix - Extract only payload type bits: `payloadType := (headerByte >> 2) & 0x0F` - Include `path_len` byte in hash for TRACE packets (matching firmware behavior) - Applied to both `cmd/server/decoder.go` and `cmd/ingestor/decoder.go` ## Tests Added - **Route type independence:** Same payload with FLOOD vs DIRECT route types produces identical hash - **TRACE path_len inclusion:** TRACE packets with different `path_len` produce different hashes - **Firmware compatibility:** Hash output matches manual computation of firmware algorithm ## Migration Impact Existing packets in the DB have content hashes computed with the old (incorrect) formula. Options: 1. **Recompute hashes** via migration (recommended for clean state) 2. **Dual lookup** — check both old and new hash on queries (backward compat) 3. **Accept the break** — old hashes become stale, new packets get correct hashes Recommend option 1 (migration) as a follow-up. The volume of affected packets depends on how many distinct route types were seen for the same logical packet. Fixes #786 --------- Co-authored-by: you <you@example.com> |
||
|
|
b9ba447046 |
feat: add nodeBlacklist config to hide abusive/troll nodes (#742)
## Problem
Some mesh participants set offensive names, report deliberately false
GPS positions, or otherwise troll the network. Instance operators
currently have no way to hide these nodes from public-facing APIs
without deleting the underlying data.
## Solution
Add a `nodeBlacklist` array to `config.json` containing public keys of
nodes to exclude from all API responses.
### Blacklisted nodes are filtered from:
- `GET /api/nodes` — list endpoint
- `GET /api/nodes/search` — search results
- `GET /api/nodes/{pubkey}` — detail (returns 404)
- `GET /api/nodes/{pubkey}/health` — returns 404
- `GET /api/nodes/{pubkey}/paths` — returns 404
- `GET /api/nodes/{pubkey}/analytics` — returns 404
- `GET /api/nodes/{pubkey}/neighbors` — returns 404
- `GET /api/nodes/bulk-health` — filtered from results
### Config example
```json
{
"nodeBlacklist": [
"aabbccdd...",
"11223344..."
]
}
```
### Design decisions
- **Case-insensitive** — public keys normalized to lowercase
- **Whitespace trimming** — leading/trailing whitespace handled
- **Empty entries ignored** — `""` or `" "` do not cause false positives
- **Nil-safe** — `IsBlacklisted()` on nil Config returns false
- **Backward-compatible** — empty/missing `nodeBlacklist` has zero
effect
- **Lazy-cached set** — blacklist converted to `map[string]bool` on
first lookup
### What this does NOT do (intentionally)
- Does **not** delete or modify database data — only filters API
responses
- Does **not** block packet ingestion — data still flows for analytics
- Does **not** filter `/api/packets` — only node-facing endpoints are
affected
## Testing
- Unit tests for `Config.IsBlacklisted()` (case sensitivity, whitespace,
empty entries, nil config)
- Integration tests for `/api/nodes`, `/api/nodes/{pubkey}`,
`/api/nodes/search`
- Full test suite passes with no regressions
|
||
|
|
fa3f623bd6 |
feat: add observer retention — remove stale observers after configurable days (#764)
## Summary
Observers that stop actively sending data now get removed after a
configurable retention period (default 14 days).
Previously, observers remained in the `observers` table forever. This
meant nodes that were once observers for an instance but are no longer
connected (even if still active in the mesh elsewhere) would continue
appearing in the observer list indefinitely.
## Key Design Decisions
- **Active data requirement**: `last_seen` is only updated when the
observer itself sends packets (via `stmtUpdateObserverLastSeen`). Being
seen by another node does NOT update this field. So an observer must
actively send data to stay listed.
- **Default: 14 days** — observers not seen in 14 days are removed
- **`-1` = keep forever** — for users who want observers to never be
removed
- **`0` = use default (14 days)** — same as not setting the field
- **Runs on startup + daily ticker** — staggered 3 minutes after metrics
prune to avoid DB contention
## Changes
| File | Change |
|------|--------|
| `cmd/ingestor/config.go` | Add `ObserverDays` to `RetentionConfig`,
add `ObserverDaysOrDefault()` |
| `cmd/ingestor/db.go` | Add `RemoveStaleObservers()` — deletes
observers with `last_seen` before cutoff |
| `cmd/ingestor/main.go` | Wire up startup + daily ticker for observer
retention |
| `cmd/server/config.go` | Add `ObserverDays` to `RetentionConfig`, add
`ObserverDaysOrDefault()` |
| `cmd/server/db.go` | Add `RemoveStaleObservers()` (server-side, uses
read-write connection) |
| `cmd/server/main.go` | Wire up startup + daily ticker, shutdown
cleanup |
| `cmd/server/routes.go` | Admin prune API now also removes stale
observers |
| `config.example.json` | Add `observerDays: 14` with documentation |
| `cmd/ingestor/coverage_boost_test.go` | 4 tests: basic removal, empty
store, keep forever (-1), default (0→14) |
| `cmd/server/config_test.go` | 4 tests: `ObserverDaysOrDefault` edge
cases |
## Config Example
```json
{
"retention": {
"nodeDays": 7,
"observerDays": 14,
"packetDays": 30,
"_comment": "observerDays: -1 = keep forever, 0 = use default (14)"
}
}
```
## Admin API
The `/api/admin/prune` endpoint now also removes stale observers (using
`observerDays` from config) and reports `observers_removed` in the
response alongside `packets_deleted`.
## Test Plan
- [x] `TestRemoveStaleObservers` — old observer removed, recent observer
kept
- [x] `TestRemoveStaleObserversNone` — empty store, no errors
- [x] `TestRemoveStaleObserversKeepForever` — `-1` keeps even year-old
observers
- [x] `TestRemoveStaleObserversDefault` — `0` defaults to 14 days
- [x] `TestObserverDaysOrDefault` (ingestor) —
nil/zero/positive/keep-forever
- [x] `TestObserverDaysOrDefault` (server) —
nil/zero/positive/keep-forever
- [x] Both binaries compile cleanly (`go build`)
- [ ] Manual: verify observer count decreases after retention period on
a live instance
|
||
|
|
dc5b5ce9a0 |
fix: reject weak/default API keys + startup warning (#532) (#628)
## Summary Hardens API key security for write endpoints (fixes #532): 1. **Constant-time comparison** — uses `crypto/subtle.ConstantTimeCompare` to prevent timing attacks on API key validation 2. **Weak key blocklist** — rejects known default/example keys (`test`, `password`, `change-me`, `your-secret-api-key-here`, etc.) 3. **Minimum length enforcement** — keys shorter than 16 characters are rejected 4. **Startup warning** — logs a clear warning if the configured key is weak or a known default 5. **Generic error messages** — HTTP 403 response uses opaque "forbidden" message to prevent information leakage about why a key was rejected ### Security Model - **Empty key** → all write endpoints disabled (403) - **Weak/default key** → all write endpoints disabled (403), startup warning logged - **Wrong key** → 401 unauthorized - **Strong correct key** → request proceeds ### Files Changed - `cmd/server/config.go` — `IsWeakAPIKey()` function + blocklist - `cmd/server/routes.go` — constant-time comparison via `constantTimeEqual()`, weak key rejection - `cmd/server/main.go` — startup warning for weak keys - `cmd/server/apikey_security_test.go` — comprehensive test coverage - `cmd/server/routes_test.go` — existing tests updated to use strong keys ### Reviews - ✅ Self-review: all security properties verified - ✅ djb Final Review: timing fix correct, blocklist pragmatic, error messages opaque, tests comprehensive. **Verdict: Ship it.** ### Test Results All existing + new tests pass. Coverage includes: weak key detection (blocklist + length + case-insensitive), empty key handling, strong key acceptance, wrong key rejection, and constant-time comparison. --------- Co-authored-by: you <you@example.com> |
||
|
|
05fbcb09dd |
fix: wire cacheTTL.analyticsHashSizes config to collision cache (#420) (#622)
## Summary Fixes #420 — wires `cacheTTL` config values to server-side cache durations that were previously hardcoded. ## Problem `collisionCacheTTL` was hardcoded at 60s in `store.go`. The config has `cacheTTL.analyticsHashSizes: 3600` (1 hour) but it was never read — the `/api/config/cache` endpoint just passed the raw map to the client without applying values server-side. ## Changes - **`store.go`**: Add `cacheTTLSec()` helper to safely extract duration values from the `cacheTTL` config map. `NewPacketStore` now accepts an optional `cacheTTL` map (variadic, backward-compatible) and wires: - `cacheTTL.analyticsHashSizes` → `collisionCacheTTL` - `cacheTTL.analyticsRF` → `rfCacheTTL` - **Default changed**: `collisionCacheTTL` default raised from 60s → 3600s (1 hour). Hash collision computation is expensive and data changes rarely — 60s was causing unnecessary recomputation. - **`main.go`**: Pass `cfg.CacheTTL` to `NewPacketStore`. - **Tests**: Added `TestCacheTTLFromConfig` and `TestCacheTTLDefaults` in eviction_test.go. Updated existing `TestHashCollisionsCacheTTL` for the new default. ## Audit of other cacheTTL values The remaining `cacheTTL` keys (`stats`, `nodeDetail`, `nodeHealth`, `nodeList`, `bulkHealth`, `networkStatus`, `observers`, `channels`, `channelMessages`, `analyticsTopology`, `analyticsChannels`, `analyticsSubpaths`, `analyticsSubpathDetail`, `nodeAnalytics`, `nodeSearch`, `invalidationDebounce`) are **client-side only** — served via `/api/config/cache` and consumed by the frontend. They don't have corresponding server-side caches to wire to. The only server-side caches (`rfCache`, `topoCache`, `hashCache`, `chanCache`, `distCache`, `subpathCache`, `collisionCache`) all use either `rfCacheTTL` or `collisionCacheTTL`, both now configurable. ## Complexity O(1) config lookup at store init time. No hot-path impact. Co-authored-by: you <you@example.com> |
||
|
|
767c8a5a3e |
perf: async chunked backfill — HTTP serves within 2 minutes (#612) (#614)
## Summary Adds two config knobs for controlling backfill scope and neighbor graph data retention, plus removes the dead synchronous backfill function. ## Changes ### Config knobs #### `resolvedPath.backfillHours` (default: 24) Controls how far back (in hours) the async backfill scans for observations with NULL `resolved_path`. Transmissions with `first_seen` older than this window are skipped, reducing startup time for instances with large historical datasets. #### `neighborGraph.maxAgeDays` (default: 30) Controls the maximum age of `neighbor_edges` entries. Edges with `last_seen` older than this are pruned from both SQLite and the in-memory graph. Pruning runs on startup (after a 4-minute stagger) and every 24 hours thereafter. ### Dead code removal - Removed the synchronous `backfillResolvedPaths` function that was replaced by the async version. ### Implementation details - `backfillResolvedPathsAsync` now accepts a `backfillHours` parameter and filters by `tx.FirstSeen` - `NeighborGraph.PruneOlderThan(cutoff)` removes stale edges from the in-memory graph - `PruneNeighborEdges(conn, graph, maxAgeDays)` prunes both DB and in-memory graph - Periodic pruning ticker follows the same pattern as metrics pruning (24h interval, staggered start) - Graceful shutdown stops the edge prune ticker ### Config example Both knobs added to `config.example.json` with `_comment` fields. ## Tests - Config default/override tests for both knobs - `TestGraphPruneOlderThan` — in-memory edge pruning - `TestPruneNeighborEdgesDB` — SQLite + in-memory pruning together - `TestBackfillRespectsHourWindow` — verifies old transmissions are excluded by backfill window --------- Co-authored-by: you <you@example.com> |
||
|
|
6f35d4d417 |
feat: RF Health Dashboard M1 — observer metrics + small multiples grid (#604)
## RF Health Dashboard — M1: Observer Metrics Storage, API & Small Multiples Grid Implements M1 of #600. ### What this does Adds a complete RF health monitoring pipeline: MQTT stats ingestion → SQLite storage → REST API → interactive dashboard with small multiples grid. ### Backend Changes **Ingestor (`cmd/ingestor/`)** - New `observer_metrics` table via migration system (`_migrations` pattern) - Parse `tx_air_secs`, `rx_air_secs`, `recv_errors` from MQTT status messages (same pattern as existing `noise_floor` and `battery_mv`) - `INSERT OR REPLACE` with timestamps rounded to nearest 5-min interval boundary (using ingestor wall clock, not observer timestamps) - Missing fields stored as NULLs — partial data is always better than no data - Configurable retention pruning: `retention.metricsDays` (default 30), runs on startup + every 24h **Server (`cmd/server/`)** - `GET /api/observers/{id}/metrics?since=...&until=...` — per-observer time-series data - `GET /api/observers/metrics/summary?window=24h` — fleet summary with current NF, avg/max NF, sample count - `parseWindowDuration()` supports `1h`, `24h`, `3d`, `7d`, `30d` etc. - Server-side metrics retention pruning (same config, staggered 2min after packet prune) ### Frontend Changes **RF Health tab (`public/analytics.js`, `public/style.css`)** - Small multiples grid showing all observers simultaneously — anomalies pop out visually - Per-observer cell: name, current NF value, battery voltage, sparkline, avg/max stats - NF status coloring: warning (amber) at ≥-100 dBm, critical (red) at ≥-85 dBm — text color only, no background fills - Click any cell → expanded detail view with full noise floor line chart - Reference lines with direct text labels (`-100 warning`, `-85 critical`) — not color bands - Min/max points labeled directly on the chart - Time range selector: preset buttons (1h/3h/6h/12h/24h/3d/7d/30d) + custom from/to datetime picker - Deep linking: `#/analytics?tab=rf-health&observer=...&range=...` - All charts use SVG, matching existing analytics.js patterns - Responsive: 3-4 columns on desktop, 1 on mobile ### Design Decisions (from spec) - Labels directly on data, not in legends - Reference lines with text labels, not color bands - Small multiples grid, not card+accordion (Tufte: instant visual fleet comparison) - Ingestor wall clock for all timestamps (observer clocks may drift) ### Tests Added **Ingestor tests:** - `TestRoundToInterval` — 5 cases for rounding to 5-min boundaries - `TestInsertMetrics` — basic insertion with all fields - `TestInsertMetricsIdempotent` — INSERT OR REPLACE deduplication - `TestInsertMetricsNullFields` — partial data with NULLs - `TestPruneOldMetrics` — retention pruning - `TestExtractObserverMetaNewFields` — parsing tx_air_secs, rx_air_secs, recv_errors **Server tests:** - `TestGetObserverMetrics` — time-series query with since/until filters, NULL handling - `TestGetMetricsSummary` — fleet summary aggregation - `TestObserverMetricsAPIEndpoints` — DB query verification - `TestMetricsAPIEndpoints` — HTTP endpoint response shape - `TestParseWindowDuration` — duration parsing for h/d formats ### Test Results ``` cd cmd/ingestor && go test ./... → PASS (26s) cd cmd/server && go test ./... → PASS (5s) ``` ### What's NOT in this PR (deferred to M2+) - Server-side delta computation for cumulative counters - Airtime charts (TX/RX percentage lines) - Channel quality chart (recv_error_rate) - Battery voltage chart - Reboot detection and chart annotations - Resolution downsampling (1h, 1d aggregates) - Pattern detection / automated diagnosis --------- Co-authored-by: you <you@example.com> |
||
|
|
b0862f7a41 |
fix: replace time.Tick with NewTicker in prune goroutine for graceful shutdown (#593)
## Summary Replace `time.Tick()` with `time.NewTicker()` in the auto-prune goroutine so it stops cleanly during graceful shutdown. ## Problem `time.Tick` creates a ticker that can never be garbage collected or stopped. While the prune goroutine runs for the process lifetime, it won't stop during graceful shutdown — the goroutine leaks past the shutdown sequence. ## Fix - Create a `time.NewTicker` and a done channel - Use `select` to listen on both the ticker and done channel - Stop the ticker and close the done channel in the shutdown path (after `poller.Stop()`) - Pattern matches the existing `StartEvictionTicker()` approach ## Testing - `go build ./...` — compiles cleanly - `go test ./...` — all tests pass Fixes #377 Co-authored-by: you <you@example.com> |
||
|
|
0c340e1eb6 |
fix: set hasResolvedPath flag after ensuring column exists
detectSchema() runs at DB open time before ensureResolvedPathColumn() adds the column during Load(). On first run (or any run where the column was just added), hasResolvedPath stayed false, causing Load() to skip reading resolved_path from SQLite. This forced a full backfill of all observations on every restart, burning CPU for minutes on large DBs. Fix: set hasResolvedPath = true after ensureResolvedPathColumn succeeds. |
||
|
|
ae38cdefb4 |
feat: server-side hop resolution at ingest — resolved_path (#556)
## Summary Implements server-side hop prefix resolution at ingest time with a persisted neighbor graph. Hop prefixes in `path_json` are now resolved to full 64-char pubkeys at ingest and stored as `resolved_path` on each observation, eliminating the need for client-side resolution via `HopResolver`. Fixes #555 ## What changed ### New file: `cmd/server/neighbor_persist.go` SQLite persistence layer for the neighbor graph and resolved paths: - `neighbor_edges` table creation and management - Load/build/persist neighbor edges from/to SQLite - `resolved_path` column migration on observations - `resolvePathForObs()` — resolves hop prefixes using `resolveWithContext` with 4-tier priority (affinity → geo → GPS → first match) - Cold startup backfill for observations missing `resolved_path` - Async persistence of edges and resolved paths during ingest (non-blocking) ### Modified: `cmd/server/store.go` - `StoreObs` gains `ResolvedPath []*string` field - `StoreTx` gains `ResolvedPath []*string` (cached from best observation) - `Load()` dynamically includes `resolved_path` in SQL query when column exists - `IngestNewFromDB()` resolves paths at ingest time and persists asynchronously - `pickBestObservation()` propagates `ResolvedPath` to transmission - `txToMap()` and `enrichObs()` include `resolved_path` in API responses - All 7 `pm.resolve()` call sites migrated to `pm.resolveWithContext()` with the persisted graph - Broadcast maps include `resolved_path` per observation ### Modified: `cmd/server/db.go` - `DB` struct gains `hasResolvedPath bool` flag - `detectSchema()` checks for `resolved_path` column existence - Graceful degradation when column is absent (test DBs, old schemas) ### Modified: `cmd/server/main.go` - Startup sequence: ensure tables → load/build graph → backfill resolved paths → re-pick best observations ### Modified: `cmd/server/routes.go` - `mapSliceToTransmissions()` and `mapSliceToObservations()` propagate `resolved_path` - Node paths handler uses `resolveWithContext` with graph ### Modified: `cmd/server/types.go` - `TransmissionResp` and `ObservationResp` gain `ResolvedPath []*string` with `omitempty` ### New file: `cmd/server/neighbor_persist_test.go` 16 tests covering: - Path resolution (unambiguous, empty, unresolvable prefixes) - Marshal/unmarshal of resolved_path JSON - SQLite table creation and column migration (idempotent) - Edge persistence and loading - Schema detection - Full Load() with resolved_path - API response serialization (present when set, omitted when nil) ## Design decisions 1. **Async persistence** — resolved paths and neighbor edges are written to SQLite in a goroutine to avoid blocking the ingest loop. The in-memory state is authoritative. 2. **Schema compatibility** — `DB.hasResolvedPath` flag allows the server to work with databases that don't yet have the `resolved_path` column. SQL queries dynamically include/exclude the column. 3. **`pm.resolve()` retained** — Not removed as dead code because existing tests use it directly. All production call sites now use `resolveWithContext` with the persisted graph. 4. **Edge persistence is conservative** — Only unambiguous edges (single candidate) are persisted to `neighbor_edges`. Ambiguous prefixes are handled by the in-memory `NeighborGraph` via Jaccard disambiguation. 5. **`null` = unresolved** — Ambiguous prefixes store `null` in the resolved_path array. Frontend falls back to prefix display. ## Performance - `resolveWithContext` per hop: ~1-5μs (map lookups, no DB queries) - Typical packet has 0-5 hops → <25μs total resolution overhead per packet - Edge/path persistence is async → zero impact on ingest latency - Backfill is one-time on first startup with the new column ## Test results ``` cd cmd/server && go test ./... -count=1 → ok (4.4s) cd cmd/ingestor && go test ./... -count=1 → ok (25.5s) ``` --------- Co-authored-by: you <you@example.com> |
||
|
|
bf2e721dd7 |
feat: auto-inject cache busters at server startup — eliminates merge conflicts (#481)
## Problem Every PR that touches `public/` files requires manually bumping cache buster timestamps in `index.html` (e.g. `?v=1775111407`). Since all PRs change the same lines in the same file, this causes **constant merge conflicts** — it's been the #1 source of unnecessary PR friction. ## Solution Replace all hardcoded `?v=TIMESTAMP` values in `index.html` with a `?v=__BUST__` placeholder. The Go server replaces `__BUST__` with the current Unix timestamp **once at startup** when it reads `index.html`, then serves the pre-processed HTML from memory. Every server restart automatically picks up fresh cache busters — no manual intervention needed. ## What changed | File | Change | |------|--------| | `public/index.html` | All `v=1775111407` → `v=__BUST__` (28 occurrences) | | `cmd/server/main.go` | `spaHandler` reads index.html at init, replaces `__BUST__` with Unix timestamp, serves from memory for `/`, `/index.html`, and SPA fallback | | `cmd/server/helpers_test.go` | New `TestSpaHandlerCacheBust` — verifies placeholder replacement works for root, SPA fallback, and direct `/index.html` requests. Also added tests for root `/` and `/index.html` routes | | `AGENTS.md` | Rule 3 updated: cache busters are now automatic, agents should not manually edit them | ## Testing - `go build ./...` — compiles cleanly - `go test ./...` — all tests pass (including new cache-bust tests) - `node test-frontend-helpers.js && node test-packet-filter.js && node test-aging.js` — all frontend tests pass - No hardcoded timestamps remain in `index.html` --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: you <you@example.com> |
||
|
|
f87eb3601c |
fix: graceful container shutdown for reliable deployments (#453)
## Summary Fixes #450 — staging deployment flaky due to container not shutting down cleanly. ## Root Causes 1. **Server never closed DB on shutdown** — SQLite WAL lock held indefinitely, blocking new container startup 2. **`httpServer.Close()` instead of `Shutdown()`** — abruptly kills connections instead of draining them 3. **No `stop_grace_period` in compose configs** — Docker sends SIGTERM then immediately SIGKILL (default 10s is often not enough for WAL checkpoint) 4. **Supervisor didn't forward SIGTERM** — missing `stopsignal`/`stopwaitsecs` meant Go processes got SIGKILL instead of graceful shutdown 5. **Deploy scripts used default `docker stop` timeout** — only 10s grace period ## Changes ### Go Server (`cmd/server/`) - **Graceful HTTP shutdown**: `httpServer.Shutdown(ctx)` with 15s context timeout — drains in-flight requests before closing - **WebSocket cleanup**: New `Hub.Close()` method sends `CloseGoingAway` frames to all connected clients - **DB close on shutdown**: Explicitly closes DB after HTTP server stops (was never closed before) - **WAL checkpoint**: `PRAGMA wal_checkpoint(TRUNCATE)` before DB close — flushes WAL to main DB file and removes WAL/SHM lock files ### Go Ingestor (`cmd/ingestor/`) - **WAL checkpoint on shutdown**: New `Store.Checkpoint()` method, called before `Close()` - **Longer MQTT disconnect timeout**: 5s (was 1s) to allow in-flight messages to drain ### Docker Compose (all 4 variants) - Added `stop_grace_period: 30s` and `stop_signal: SIGTERM` ### Supervisor Configs (both variants) - Added `stopsignal=TERM` and `stopwaitsecs=20` to server and ingestor programs ### Deploy Scripts - `deploy-staging.sh`: `docker stop -t 30` with explicit grace period - `deploy-live.sh`: `docker stop -t 30` with explicit grace period ## Shutdown Sequence (after fix) 1. Docker sends SIGTERM to supervisord (PID 1) 2. Supervisord forwards SIGTERM to server + ingestor (waits up to 20s each) 3. Server: stops poller → drains HTTP (15s) → closes WS clients → checkpoints WAL → closes DB 4. Ingestor: stops tickers → disconnects MQTT (5s) → checkpoints WAL → closes DB 5. Docker waits up to 30s total before SIGKILL ## Tests All existing tests pass: - `cd cmd/server && go test ./...` ✅ - `cd cmd/ingestor && go test ./...` ✅ --------- Co-authored-by: you <you@example.com> Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com> |
||
|
|
fe314be3a8 |
feat: geo_filter enforcement, DB pruning, geofilter-builder tool, HB column (#215)
## Summary
Several features and fixes from a live deployment of the Go v3.0.0
backend.
### geo_filter — full enforcement
- **Go backend config** (`cmd/server/config.go`,
`cmd/ingestor/config.go`): added `GeoFilterConfig` struct so
`geo_filter.polygon` and `bufferKm` from `config.json` are parsed by
both the server and ingestor
- **Ingestor** (`cmd/ingestor/geo_filter.go`, `cmd/ingestor/main.go`):
ADVERT packets from nodes outside the configured polygon + buffer are
dropped *before* any DB write — no transmission, node, or observation
data is stored
- **Server API** (`cmd/server/geo_filter.go`, `cmd/server/routes.go`):
`GET /api/config/geo-filter` endpoint returns the polygon + bufferKm to
the frontend; `/api/nodes` responses filter out any out-of-area nodes
already in the DB
- **Frontend** (`public/map.js`, `public/live.js`): blue polygon overlay
(solid inner + dashed buffer zone) on Map and Live pages, toggled via
"Mesh live area" checkbox, state shared via localStorage
### Automatic DB pruning
- Add `retention.packetDays` to `config.json` to delete transmissions +
observations older than N days on a daily schedule (1 min after startup,
then every 24h). Nodes and observers are never pruned.
- `POST /api/admin/prune?days=N` for manual runs (requires `X-API-Key`
header if `apiKey` is set)
```json
"retention": {
"nodeDays": 7,
"packetDays": 30
}
```
### tools/geofilter-builder.html
Standalone HTML tool (no server needed) — open in browser, click to
place polygon points on a Leaflet map, set `bufferKm`, copy the
generated `geo_filter` JSON block into `config.json`.
### scripts/prune-nodes-outside-geo-filter.py
Utility script to clean existing out-of-area nodes from the database
(dry-run + confirm). Useful after first enabling geo_filter on a
populated DB.
### HB column in packets table
Shows the hop hash size in bytes (1–4) decoded from the path byte of
each packet's raw hex. Displayed as **HB** between Size and Type
columns, hidden on small screens.
## Test plan
- [x] ADVERT from node outside polygon is not stored (no new row in
nodes or transmissions)
- [x] `GET /api/config/geo-filter` returns polygon + bufferKm when
configured, `{polygon: null, bufferKm: 0}` when not
- [x] `/api/nodes` excludes nodes outside polygon even if present in DB
- [x] Map and Live pages show blue polygon overlay when configured;
checkbox toggles it
- [x] `retention.packetDays: 30` deletes old transmissions/observations
on startup and daily
- [x] `POST /api/admin/prune?days=30` returns `{deleted: N, days: 30}`
- [x] `tools/geofilter-builder.html` opens standalone, draws polygon,
copies valid JSON
- [x] HB column shows 1–4 for all packets in grouped and flat view
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|
|
5aa4fbb600 | chore: normalize all files to LF line endings | ||
|
|
93f85dee6e |
Add API key auth to Go write endpoints (#283)
## Summary - added API key middleware for write routes in cmd/server/routes.go - protected all current non-GET API routes (POST /api/packets, POST /api/perf/reset, POST /api/decode) - middleware enforces X-API-Key against cfg.APIKey and returns 401 JSON error on missing/wrong key - preserves backward compatibility: if piKey is empty, requests pass through - added startup warning log in cmd/server/main.go when no API key is configured: - [security] WARNING: no apiKey configured — write endpoints are unprotected - added route tests for missing/wrong/correct key and empty-apiKey compatibility ## Validation - cd cmd/server && go test ./... ✅ ## Notes - config.example.json already contains piKey, so no changes were required. --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
77d8f35a04 |
feat: implement packet store eviction/aging to prevent OOM (#273)
## Summary
The in-memory `PacketStore` had **no eviction or aging** — it grew
unbounded until OOM killed the process. At ~3K packets/hour and ~5KB per
packet (not the 450 bytes previously estimated), an 8GB VM would OOM in
a few days.
## Changes
### Time-based eviction
- Configurable via `config.json`: `"packetStore": { "retentionHours": 24
}`
- Packets older than the retention window are evicted from the head of
the sorted slice
### Memory-based cap
- Configurable via `"packetStore": { "maxMemoryMB": 1024 }`
- Hard ceiling — evicts oldest packets when estimated memory exceeds the
cap
### Index cleanup
When a `StoreTx` is evicted, ALL associated data is removed from:
- `byHash`, `byTxID`, `byObsID`, `byObserver`, `byNode`, `byPayloadType`
- `nodeHashes`, `distHops`, `distPaths`, `spIndex`
### Periodic execution
- Background ticker runs eviction every 60 seconds
- Analytics caches and hash size cache are invalidated after eviction
### Stats fixes
- `estimatedMB` now uses ~5KB/packet + ~500B/observation (was 430B +
200B)
- `evicted` counter reflects actual evictions (was hardcoded to 0)
- Removed fake `maxPackets: 2386092` and `maxMB: 1024` from stats
### Config example
```json
{
"packetStore": {
"retentionHours": 24,
"maxMemoryMB": 1024
}
}
```
Both values default to 0 (unlimited) for backward compatibility.
## Tests
- 7 new tests in `eviction_test.go` covering time-based, memory-based,
index cleanup, thread safety, config parsing, and no-op when disabled
- All existing tests pass unchanged
Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
|
||
|
|
cdcaa476f2 |
rename: MeshCore Analyzer → CoreScope (Phase 1 — backend + infra)
Rename product branding, binary names, Docker images, container names,
Go modules, proto go_package, CI, manage.sh, and documentation.
Preserved (backward compat):
- meshcore.db database filename
- meshcore-data / meshcore-staging-data directory paths
- MQTT topics (meshcore/#, meshcore/+/+/packets, etc.)
- proto package namespace (meshcore.v1)
- localStorage keys
Changes by category:
- Go modules: github.com/corescope/{server,ingestor}
- Binaries: corescope-server, corescope-ingestor
- Docker images: corescope:latest, corescope-go:latest
- Containers: corescope-prod, corescope-staging, corescope-staging-go
- Supervisord programs: corescope, corescope-server, corescope-ingestor
- Branding: siteName, heroTitle, startup logs, fallback HTML
- Proto go_package: github.com/corescope/proto/v1
- CI: container refs, deploy path
- Docs: 8 markdown files updated
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
||
|
|
b326e3f1a6 |
fix: pprof port conflict crashed Go server — non-fatal bind + separate ports
Server defaults to 6060, ingestor to 6061. Removed shared PPROF_PORT env var. Bind failure logs warning instead of log.Fatal killing the process. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
6d31cb2ad6 |
feat: add pprof profiling controlled by ENABLE_PPROF env var
Add net/http/pprof support to both Go server (default port 6060) and ingestor (default port 6061). Profiling is off by default — only starts the pprof HTTP listener when ENABLE_PPROF=true. PPROF_PORT env var overrides the default port for each binary. Enable on staging-go in docker-compose with exposed ports 6060/6061. Not enabled on prod. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
e300228874 |
perf: add TTL cache for subpaths API + build timestamp in stats/health
- Add 15s TTL cache to GetAnalyticsSubpaths with composite key (region|minLen|maxLen|limit), matching the existing cache pattern used by RF, topology, hash, channel, and distance analytics. Cache hits return instantly vs 900ms+ computation. fixes #168 - Add BuildTime to /api/stats and /api/health responses, injected via ldflags at build time. Dockerfile.go now accepts BUILD_TIME build arg. fixes #165 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
0d9b535451 |
feat: add version and git commit to /api/stats and /api/health
Node.js: reads version from package.json, commit from .git-commit file or git rev-parse --short HEAD at runtime, with unknown fallback. Go: uses -ldflags build-time variables (Version, Commit) with fallback to .git-commit file and git command at runtime. Dockerfile: copies .git-commit if present (CI bakes it before build). Dockerfile.go: passes APP_VERSION and GIT_COMMIT as build args to ldflags. deploy.yml: writes GITHUB_SHA to .git-commit before docker build steps. docker-compose.yml: passes build args to Go staging build. Tests updated to verify version and commit fields in both endpoints. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
afe16db960 |
feat(go-server): in-memory packet store — port of packet-store.js
Streams transmissions + observations from SQLite at startup into 5 indexed in-memory structures. QueryPackets and QueryGroupedPackets now serve from RAM (<10ms) instead of hitting SQLite (2.3s). - store.go: PacketStore with byHash, byTxID, byObsID, byObserver, byNode indexes - main.go: create + load store at startup - routes.go: dispatch to store for packet/stats endpoints - websocket.go: poller ingests new transmissions into store Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
b2e6c8105b |
fix: handle WebSocket upgrade at root path (client connects to ws://host/)
Node.js upgrades WS at /, Go was only at /ws. Now the static file handler checks for Upgrade header first and routes to WebSocket. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
742ed86596 |
feat: add Go web server (cmd/server/) — full API + WebSocket + static files
35+ REST endpoints matching Node.js server, WebSocket broadcast, static file serving with SPA fallback, config.json support. Uses modernc.org/sqlite (pure Go, no CGO required). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |