meshcore-analyzer

mirror of https://github.com/Kpa-clawbot/meshcore-analyzer.git synced 2026-05-11 09:46:54 +00:00

Author	SHA1	Message	Date
Kpa-clawbot	5a5df5d92b	revert: group commit M1 (#1117 ) — starves MQTT, refs #1129 (#1130 ) ## Why Diagnostic on #1129 shows PR #1117 (group commit M1 for #1115) is fundamentally broken: it starves the MQTT goroutine via `gcMu` lock contention, causing pingresp disconnects and lost packets at modest ingest rates. ## Three structural defects 1. Lock held across `sql.Stmt.Exec` — every concurrent `InsertTransmission` blocks for the full SQLite write latency, not just the brief queue mutation. 2. Lock held across `tx.Commit` — the WAL fsync runs under `gcMu`, so any backlog blocks all ingest writers AND the flusher ticker, snowballing under load. 3. Single-conn DB (`MaxOpenConns=1`) — the flusher and the ingest path serialise on one connection, turning the lock into a global ingest stall. Net effect: at modest packet rates the MQTT client loop misses its own pingresp deadline, the broker drops the connection, and packets received during the stall are lost. ## What this PR removes - `Store.SetGroupCommit`, `Store.FlushGroupTx`, `Store.flushLocked`, `Store.GroupCommitMs` - `gcMu`, `activeTx`, `pendingRows`, `groupCommitMs`, `groupCommitMaxRows` Store fields - `groupCommitMs` / `groupCommitMaxRows` config fields and `GroupCommitMsOrDefault` / `GroupCommitMaxRowsOrDefault` accessors - The flusher goroutine in `cmd/ingestor/main.go` - `cmd/ingestor/group_commit_test.go` - The `if s.activeTx != nil { … pendingRows … }` branch in `InsertTransmission` — reverts to plain prepared-stmt usage ## What this PR keeps (merged after #1117) - #1119 `BackfillPathJSON` `path_json='[]'` fix - #1120/#1123 perf metrics endpoints — `WALCommits` counter retained - `GroupCommitFlushes` JSON field on `/api/perf/write-sources` is kept as always-0 for API stability (server `perf_io.go` references it as a string field name; no client breakage) - `DBStats.GroupCommitFlushes` atomic field is removed from the Go struct ## Tests `cd cmd/ingestor && go test ./... -run "Test"` → `ok` (47.8s). `cd cmd/server && go build ./...` → clean. ## #1115 stays open The group-commit idea is sound — batching observation INSERTs would meaningfully reduce WAL fsync rate. But it needs a redesign that does not hold a mutex across blocking SQLite calls. Suggested directions for a future M1: - Channel-fed writer goroutine (single owner of the tx, ingest path is non-blocking enqueue) - Per-batch DB handle so the flusher doesn't serialise the ingest connection - Bounded queue with backpressure rather than a shared lock Refs #1117 #1129	2026-05-05 19:02:43 -07:00
Kpa-clawbot	45f2607f75	perf(ingestor): group commit observation INSERTs by time window (M1, refs #1115 ) (#1117 ) ## Summary Implements M1 from #1115: batches observation/transmission INSERTs into a single SQLite `BEGIN/COMMIT` window instead of fsyncing per packet. At ~250 obs/sec this drops WAL fsync rate from ~20/s to ~1/s and eliminates the `obs-persist skipped` / `SQLITE_BUSY` log spam that the issue documents. This is a partial fix — it ships the group-commit mechanism. Acceptance items 6–7 (measured fsync rate / measured `obs-persist skipped` rate at staging steady-state) require post-deploy observation, and M2 (per-`tx_hash` observation buffering) is intentionally deferred. The issue stays open for the user to verify on staging. > Partial fix for #1115 — does not auto-close. Refs #1115. ## Mechanism - `Store` gains an active `sql.Tx`, `pendingRows` counter, `gcMu`, and the `groupCommitMs` / `groupCommitMaxRows` knobs. `SetGroupCommit(ms, maxRows)` enables the mode; `FlushGroupTx()` commits the in-flight tx. - `InsertTransmission` lazily opens a tx on the first call after each flush, then issues all writes through `tx.Stmt()` bindings of the existing prepared statements. With `MaxOpenConns(1)` the connection is already serialized; `gcMu` serializes group-commit state without contention. - A goroutine in `cmd/ingestor/main.go` calls `FlushGroupTx()` every `groupCommitMs` ms. `pendingRows >= groupCommitMaxRows` triggers an eager flush. `Close()` flushes before the WAL checkpoint so no rows are lost on graceful shutdown. - `groupCommitMs == 0` short-circuits to the legacy per-call auto-commit path (statements bound to `s.db`, no tx) — current behavior preserved byte-for-byte for operators who opt out. ## Config Two new optional fields (ingestor-only), both documented in `config.example.json`: \| Field \| Default \| Effect \| \|---\|---\|---\| \| `groupCommitMs` \| `1000` \| Flush window in ms. `0` disables batching (legacy per-packet auto-commit). \| \| `groupCommitMaxRows` \| `1000` \| Safety cap; when exceeded the queue flushes immediately to bound memory and the crash-loss window. \| No DB schema change. No required config change on upgrade. ## Tests (TDD red → green visible in commits) `cmd/ingestor/group_commit_test.go` — three assertions, written first as the red commit: - `TestGroupCommit_BatchesInsertsIntoOneTx` — 50 `InsertTransmission` calls inside a wide window produce 0* commits until `FlushGroupTx`, then exactly 1; all 50 rows visible after flush. (This is the spec's "50 observations → 1 SQLite write transaction" assertion.) - `TestGroupCommit_Disabled` — `groupCommitMs=0` keeps every insert immediately visible and `GroupCommitFlushes` never advances. (Spec's "groupCommitMs=0 reverts to per-packet behavior" assertion.) - `TestGroupCommit_MaxRowsForcesEarlyFlush` — cap=3, 7 inserts → 2 auto-flushes from the cap + 1 final manual flush = 3 total. Red commit: `e2b0370` (stubs `SetGroupCommit` / `FlushGroupTx` so the tests compile and fail on assertions, not import errors). Green commit: `73f3559`. Full ingestor suite (`go test ./...` in `cmd/ingestor`) stays green, ~49 s. ## Performance This PR is the perf change itself. Local micro-test (the new `TestGroupCommit_BatchesInsertsIntoOneTx`) shows the structural property: 50 inserts → 1 commit. The fsync-rate measurement called out in the M1 acceptance criteria (`~20/s → ~1/s` at 250 obs/sec) requires staging deployment to confirm — that's the remaining open item that keeps #1115 open after this merges. No hot-path regressions: when `groupCommitMs > 0` we acquire one mutex per insert (uncontended in the steady state — the connection was already single-threaded via `MaxOpenConns(1)`). When `groupCommitMs == 0` the code path is identical to before plus one nil-tx check. ## What this PR does NOT do (per spec) - Does not collapse "30 observations of one packet" into 1 row write — that's M2. - Does not eliminate dual-writer contention with `cmd/server`'s `resolved_path` writes. - Does not change observation ordering or live broadcast latency. --------- Co-authored-by: corescope-bot <bot@corescope.local>	2026-05-05 16:38:43 -07:00
Kpa-clawbot	136e1d23c8	feat(#730 ): foreign-advert detection — flag instead of silent drop (#1084 ) ## Summary Partial fix for #730 (M1 only — M2 frontend and M3 alerting deferred). Today the ingestor silently drops ADVERTs whose GPS lies outside the configured `geo_filter` polygon. That's the wrong default for an analytics tool — operators get zero visibility into bridged or leaked meshes. This PR makes the new default flag, don't drop: foreign adverts are stored, the node row is tagged `foreign_advert=1`, and the API surfaces `"foreign": true` so dashboards / map overlays can be built on top. ## Behavior \| Mode \| What happens to an ADVERT outside `geo_filter` \| \|---\|---\| \| (default) flag \| Stored, marked `foreign_advert=1`, exposed via API \| \| drop (legacy) \| Silently dropped (preserves old behavior for ops who want it) \| ## What's done (M1 — Backend) - ingestor stores foreign adverts instead of dropping - `nodes.foreign_advert` column added (migration) - `/api/nodes` and `/api/nodes/{pk}` expose `foreign: true` field - Config: `geofilter.action: "flag"\|"drop"` (default `flag`) - Tests + config docs ## What's NOT done (deferred to M2 + M3) - M2 — Frontend: Map overlay showing foreign adverts as distinct markers, foreign-advert filter on packets/nodes pages, dedicated foreign-advert dashboard - M3 — Alerting: Time-series detection of bridging events, alert when foreign advert rate spikes, identify bridge entry-point nodes Issue #730 remains open for M2 and M3. --------- Co-authored-by: corescope-bot <bot@corescope>	2026-05-05 01:58:52 -07:00
Kpa-clawbot	3ab404b545	feat(node-battery): voltage trend chart + /api/nodes/{pubkey}/battery (#663 ) (#1082 ) ## Summary Closes #663 (Phase 2 + 3 partial — time-series tracking + thresholds for nodes that are also observers). Adds a per-node battery voltage trend chart and `/api/nodes/{pubkey}/battery` endpoint, sourced from the existing `observer_metrics.battery_mv` samples populated by observer status messages. No new ingest or schema changes — purely surfaces data we were already collecting. ## Scope (TDD red→green) RED commit: test(node-battery) — DB query, endpoint shape (200/404/no-data), and config getters all asserted. GREEN commit: feat(node-battery) — implementation only. ## Changes ### Backend - `cmd/server/node_battery.go` (new): - `DB.GetNodeBatteryHistory(pubkey, since)` — pulls `(timestamp, battery_mv)` rows from `observer_metrics WHERE LOWER(observer_id) = LOWER(public_key) AND battery_mv IS NOT NULL`. Case-insensitive join tolerates historical pubkey casing variation (observers persist uppercase, nodes lowercase in this DB). - `Server.handleNodeBattery` — `GET /api/nodes/{pubkey}/battery?days=N` (default 7, max 365). Returns `{public_key, days, samples[], latest_mv, latest_ts, status, thresholds}`. - `Config.LowBatteryMv()` / `CriticalBatteryMv()` — defaults 3300 / 3000 mV. - `cmd/server/config.go` — `BatteryThresholds *BatteryThresholdsConfig` field. - `cmd/server/routes.go` — route registration alongside existing `/health`, `/analytics`. ### Frontend - `public/node-analytics.js` — new "Battery Voltage" chart card with status badge (🔋 OK / ⚠️ Low / 🪫 Critical / No data). Renders dashed threshold lines at `lowMv` and `criticalMv`. Empty-state message when no samples in window. ### Config - `config.example.json` — `batteryThresholds: { lowMv: 3300, criticalMv: 3000 }` with `_comment` per Config Documentation Rule. ## Status semantics \| latest_mv \| status \| \|-----------------------\|------------\| \| no samples in window \| `unknown` \| \| `>= lowMv` \| `ok` \| \| `< lowMv`, `>= critMv`\| `low` \| \| `< criticalMv` \| `critical` \| ## What this PR does NOT do (deferred) The issue's full Phase 1 (writing decoded sensor advert telemetry into `nodes.battery_mv` / `temperature_c` from server-side decoder) and Phase 4 (firmware/active polling for repeaters without observers) are out of scope here. This PR delivers the requested Phase 2/3 surfacing for the data path that already lands rows: `observer_metrics`. Repeaters that are also observers (i.e. publish status to MQTT) will get a voltage trend immediately; pure passive nodes won't until Phase 1 lands. ## Tests - `TestGetNodeBatteryHistory_FromObserverMetrics` — case-insensitive join, NULL skipping, ordering. - `TestNodeBatteryEndpoint` — full happy path with thresholds + status. - `TestNodeBatteryEndpoint_NoData` — 200 + status=unknown. - `TestNodeBatteryEndpoint_404` — unknown node. - `TestBatteryThresholds_ConfigOverride` — config getters + defaults. `cd cmd/server && go test ./...` — green. ## Performance Endpoint is per-pubkey (called once on analytics page open), indexed by `(observer_id, timestamp)` PK on `observer_metrics`. No hot-path impact. --------- Co-authored-by: bot <bot@corescope>	2026-05-05 01:41:00 -07:00
Kpa-clawbot	d05e468598	feat(memlimit): GOMEMLIMIT support, derive from packetStore.maxMemoryMB (#836 ) (#1077 ) ## Summary Implements part 1 of #836 — `GOMEMLIMIT` support so the Go runtime self-throttles GC under cgroup memory pressure instead of getting SIGKILLed. (Parts 2 & 3 — bounded cold-load batching + README ops docs — land in follow-up PRs.) ## Behavior On startup `cmd/server/main.go` now calls `applyMemoryLimit(maxMemoryMB, envSet)`: \| Condition \| Action \| Log \| \|---\|---\|---\| \| `GOMEMLIMIT` env set \| Honor the runtime's parse, do nothing \| `[memlimit] using GOMEMLIMIT from environment (...)` \| \| env unset, `packetStore.maxMemoryMB > 0` \| `debug.SetMemoryLimit(maxMB * 1.5 MiB)` \| `[memlimit] derived from packetStore.maxMemoryMB=512 → 768 MiB (1.5x headroom)` \| \| env unset, `maxMemoryMB == 0` \| No-op \| `[memlimit] no soft memory limit set ... recommend setting one to avoid container OOM-kill` \| The 1.5x headroom covers Go's NextGC trigger at ~2× live heap (per #836 heap profile: 680 MB live → 1.38 GB NextGC). ## Tests (TDD red→green visible in commit history) - `TestApplyMemoryLimit_FromEnv` — env wins, function does not override - `TestApplyMemoryLimit_DerivedFromMaxMemoryMB` — verifies bytes computation + `debug.SetMemoryLimit` actually applied at runtime - `TestApplyMemoryLimit_None` — no env, no config → reports `"none"`, no side effect Red commit: `7de3c62` (assertion failures, builds clean) Green commit: `454516d` ## Config docs `config.example.json` `packetStore._comment_gomemlimit` documents env/derived/override behavior. ## Out of scope - Cold-load transient bounding (item 2 in #836) - README container-size table (item 3) - QA §1.1 rewrite Closes part 1 of #836. --------- Co-authored-by: corescope-bot <bot@corescope>	2026-05-05 01:33:23 -07:00
Kpa-clawbot	45f30fcadc	feat(repeater): liveness detection — distinguish actively relaying from advert-only (#662 ) (#1073 ) ## Summary Implements repeater liveness detection per #662 — distinguishes a repeater that is actively relaying traffic from one that is alive but idle (only sending its own adverts). ## Approach The backend already maintains a `byPathHop` index keyed by lowercase hop/pubkey for every transmission. Decode-window writes also key it by resolved pubkey for relay hops. We just weren't surfacing it. `GetRepeaterRelayInfo(pubkey, windowHours)`: - Reads `byPathHop[pubkey]`. - Skips packets whose `payload_type == 4` (advert) — a self-advert proves liveness, not relaying. - Returns the most recent `FirstSeen` as `lastRelayed`, plus `relayActive` (within window) and the `windowHours` actually used. ## Three states (per issue) \| State \| Indicator \| Condition \| \|---\|---\|---\| \| 🟢 Relaying \| green \| `last_relayed` within `relayActiveHours` \| \| 🟡 Alive (idle) \| yellow \| repeater is in the DB but `relay_active=false` (no recent path-hop appearance, or none ever) \| \| ⚪ Stale \| existing \| falls out of the existing `getNodeStatus` logic \| ## API - `GET /api/nodes` — repeater/room rows now include `last_relayed` (omitted if never observed) and `relay_active`. - `GET /api/nodes/{pubkey}` — same fields plus `relay_window_hours`. ## Config New optional field under `healthThresholds`: ```json "healthThresholds": { ..., "relayActiveHours": 24 } ``` Default 24h. Documented in `config.example.json`. ## Frontend Node detail page gains a Last Relayed row for repeaters/rooms with the 🟢/🟡 state badge. Tooltip explains the distinction from "Last Heard". ## TDD - Red commit `4445f91`: `repeater_liveness_test.go` + stub `GetRepeaterRelayInfo` returning zero. Active and Stale tests fail on assertion (LastRelayed empty / mismatched). Idle and IgnoresAdverts already match the desired behavior under the stub. Compiles, runs, fails on assertions — not on imports. - Green commit `5fcfb57`: Implementation. All four tests pass. Full `cmd/server` suite green (~22s). ## Performance `O(N)` over `byPathHop[pubkey]` per call. The index is bounded by store eviction; a single repeater has at most a few hundred entries on real data. The `/api/nodes` loop adds one map read + scan per repeater row — negligible against the existing enrichment work. ## Limitations (per issue body) 1. Observer coverage gaps — if no observer hears a repeater's relay, it'll show as idle even when actively relaying. This is inherent to passive observation. 2. Low-traffic networks — a repeater in a quiet area legitimately shows idle. The 🟡 indicator copy makes that explicit ("alive (idle)"). 3. Hash collisions are mitigated by the existing `resolveWithContext` path before pubkeys land in `byPathHop`. Fixes #662 --------- Co-authored-by: clawbot <bot@corescope.local>	2026-05-05 01:17:52 -07:00
Kpa-clawbot	1f4969c1a6	fix(#770 ): treat region 'All' as no-filter + document region behavior (#1026 ) ## Summary Fixes #770 — selecting "All" in the region filter dropdown produced an empty channel list. ## Root cause `normalizeRegionCodes` (cmd/server/db.go) treated any non-empty input as a literal IATA code. The frontend region filter labels its catch-all option "All"; while `region-filter.js` normally sends an empty string when "All" is selected, any code path that ends up sending `?region=All` (deep-link URLs, manual queries, future callers) caused the function to return `["ALL"]`. Downstream queries then filtered observers for `iata = 'ALL'`, which never matches anything → empty response. ## Fix `normalizeRegionCodes` now treats `All` / `ALL` / `all` (case-insensitive, with optional whitespace, mixed in CSV) as equivalent to an empty value, returning `nil` to signal "no filter". Real IATA codes (`SJC`, `PDX`, `sjc,PDX` → `[SJC PDX]`) still pass through unchanged. This is a defensive server-side fix: a single chokepoint that all region-aware endpoints already flow through (channels, packets, analytics, encrypted channels, observer ID resolution). ## Documentation Expanded `_comment_regions` in `config.example.json` to explain: - How IATA codes are resolved (payload > topic > source config — set in #1012) - What the `regions` map controls (display labels) vs runtime-discovered codes - That observers without an IATA tag only appear under "All Regions" - That the `All` sentinel is server-side safe ## TDD - Red commit (`4f65bf4`): `cmd/server/region_filter_test.go` — `TestNormalizeRegionCodes_AllIsNoFilter` asserts `All` / `ALL` / `all` / `""` / `"All,"` all collapse to `nil`. Compiles, runs, fails on assertion (`got [ALL], want nil`). Companion test `TestNormalizeRegionCodes_RealCodesPreserved` locks in that `sjc,PDX` still returns `[SJC PDX]`. - Green commit (`c9fb965`): two-line change in `normalizeRegionCodes` + docs update. ## Verification ``` $ go test -run TestNormalizeRegionCodes -count=1 ./cmd/server ok github.com/corescope/server 0.023s $ go test -count=1 ./cmd/server ok github.com/corescope/server 21.454s ``` Full suite green; no existing region tests regressed. Fixes #770 --------- Co-authored-by: Kpa-clawbot <bot@corescope>	2026-05-03 19:50:01 -07:00
Kpa-clawbot	4d043579f8	feat: geofilter draft save (localStorage) + downloadable config snippet (#1006 ) ## Issue Closes #819 ## Summary Adds Save Draft / Load Draft / Download buttons to `/geofilter-builder.html` so operators can: - Persist their work-in-progress polygon across sessions (localStorage) - Reload it later to continue editing - Download a ready-to-paste `geo_filter` JSON snippet for `config.json` ## Implementation - New module `public/geofilter-draft.js` exposes `GeofilterDraft` global with `saveDraft / loadDraft / clearDraft / buildConfigSnippet / downloadConfig`. - Builder HTML wires three new buttons; updates the help text to document the new flow. ## TDD - Red commit: `b0a1a4c` (tests fail — module doesn't exist) - Green commit: `a717f33` (implementation added, all tests pass) ## How to test 1. Open `/geofilter-builder.html` 2. Click 3+ points on the map 3. Click "Save Draft" — reload page — click "Load Draft" → polygon restored 4. Click "Download" → `geofilter-config-snippet.json` downloaded with correct format --- E2E assertion added: test-e2e-playwright.js:2264 --------- Co-authored-by: you <you@example.com> Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-03 18:24:08 +00:00
Kpa-clawbot	b0e4d2fa18	feat: add optional MQTT region field (#788 ) (#1012 ) ## Summary Add optional `region` field to MQTT source config and JSON payload, enabling publishers to explicitly provide region data without relying solely on topic path structure. ## Changes - `MQTTSource.Region` — new optional config field. When set, acts as default region for all messages from that source (useful when a broker serves a single region). - `MQTTPacketMessage.Region` — new optional JSON payload field. Publishers can include `"region": "PDX"` in their MQTT messages. - `PacketData.Region` — carries the resolved region through to storage. - Priority resolution: payload `region` > topic-derived region > source config `region` - Observer IATA is updated with the effective region on every packet. ## Config example ```json { "mqttSources": [ { "name": "cascadia", "broker": "tcp://cascadia-broker:1883", "topics": ["meshcore/#"], "region": "PDX" } ] } ``` ## Payload example ```json {"raw": "0a1b2c...", "SNR": 5.2, "region": "PDX"} ``` ## TDD - Red commit: `980304c` (tests fail at compile — fields don't exist) - Green commit: `4caf88b` (implementation, all tests pass) ## Unblocks - #804, #770, #730 (all depend on region being available on observations) Fixes #788 --------- Co-authored-by: you <you@example.com>	2026-05-03 11:21:54 -07:00
Kpa-clawbot	153308134e	feat: add global observer IATA whitelist config (#1001 ) ## Summary Adds a global `observerIATAWhitelist` config field that restricts which observer IATA regions are processed by the ingestor. ## Problem Operators running regional instances (e.g., Sweden) want to ensure only observers physically in their region contribute data. The existing per-source `iataFilter` only filters packet messages but still allows status messages through, meaning observers from other regions appear in the database. ## Solution New top-level config field `observerIATAWhitelist`: - When non-empty, all messages (status + packets) from observers outside the whitelist are silently dropped - Case-insensitive matching - Empty list = all regions allowed (fully backwards compatible) - Lazy O(1) lookup via cached uppercase set (same pattern as `observerBlacklist`) ### Config example ```json { "observerIATAWhitelist": ["ARN", "GOT"] } ``` ## TDD - Red commit: `f19c2b2` — tests for `ObserverIATAWhitelist` field and `IsObserverIATAAllowed` method (build fails) - Green commit: `782f516` — implementation + integration test ## Files changed - `cmd/ingestor/config.go` — new field, new method `IsObserverIATAAllowed` - `cmd/ingestor/main.go` — whitelist check in `handleMessage` before status processing - `cmd/ingestor/config_test.go` — unit tests for config parsing and matching - `cmd/ingestor/main_test.go` — integration test for handleMessage filtering Fixes #914 --------- Co-authored-by: you <you@example.com>	2026-05-03 10:23:35 -07:00
Kpa-clawbot	5aa8f795cd	feat(ingestor): per-source MQTT connect timeout (#931 ) (#977 ) ## Summary Per-source MQTT connect timeout, correctly targeting the `WaitTimeout` startup gate (#931). ## What changed - Added `connectTimeoutSec` field to `MQTTSource` struct (per-source, not global) — `config.go:24` - Added `ConnectTimeoutOrDefault()` helper returning configured value or 30 (default from #926) — `config.go:29` - Replaced hardcoded `WaitTimeout(30 * time.Second)` with `WaitTimeout(time.Duration(connectTimeout) * time.Second)` — `main.go:173` - Updated `config.example.json` with field at source level - Unit tests for default (30) and custom values ## Why this supersedes #976 PR #976 made paho's `SetConnectTimeout` (per-TCP-dial, was 10s) configurable via a global `mqttConnectTimeoutSeconds` field. Issue #931 explicitly references the 30s timeout — which is `WaitTimeout(30s)`, the startup gate from #926. It also requests per-source config, not global. This PR targets the correct timeout at the correct granularity. ## Live verification (Rule 18) Two sources pointed at unreachable brokers: - `fast` (`connectTimeoutSec: 5`): timed out in 5s ✅ - `default` (unset): timed out in 30s ✅ ``` 19:00:35 MQTT [fast] connect timeout: 5s 19:00:40 MQTT [fast] initial connection timed out — retrying in background 19:00:40 MQTT [default] connect timeout: 30s 19:01:10 MQTT [default] initial connection timed out — retrying in background ``` Closes #931 Supersedes #976 Co-authored-by: you <you@example.com>	2026-05-02 12:08:25 -07:00
efiten	e460932668	fix(store): apply retentionHours cutoff in Load() to prevent OOM on cold start (#917 ) ## Problem `Load()` loaded all transmissions from the DB regardless of `retentionHours`, so `buildSubpathIndex()` processed the full DB history on every startup. On a DB with ~280K paths this produces ~13.5M subpath index entries, OOM-killing the process before it ever starts listening — causing a supervisord crash loop with no useful error message. ## Fix Apply the same `retentionHours` cutoff to `Load()`'s SQL that `EvictStale()` already uses at runtime. Both conditions (`retentionHours` window and `maxPackets` cap) are combined with AND so neither safety limit is bypassed. Startup now builds indexes only over the retention window, making startup time and memory proportional to recent activity rather than total DB history. ## Docs - `config.example.json`: adds `retentionHours` to the `packetStore` block with recommended value `168` (7 days) and a warning about `0` on large DBs - `docs/user-guide/configuration.md`: documents the field and adds an explicit OOM warning ## Test plan - [x] `cd cmd/server && go test ./... -run TestRetentionLoad` — covers the retention-filtered load: verifies packets outside the window are excluded, and that `retentionHours: 0` still loads everything - [x] Deploy on an instance with a large DB (>100K paths) and `retentionHours: 168` — server reaches "listening" in seconds instead of OOM-crashing - [x] Verify `config.example.json` has `retentionHours: 168` in the `packetStore` block - [x] Verify `docs/user-guide/configuration.md` documents the field and warning 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Kpa-clawbot <kpaclawbot@outlook.com>	2026-05-01 06:47:55 +00:00
Kpa-clawbot	aeae7813bc	fix: enable SQLite incremental auto-vacuum so DB shrinks after retention (#919 ) (#920 ) Closes #919 ## Summary Enables SQLite incremental auto-vacuum so the database file actually shrinks after retention reaper deletes old data. Previously, `DELETE` operations freed pages internally but never returned disk space to the OS. ## Changes ### 1. Auto-vacuum on new databases - `PRAGMA auto_vacuum = INCREMENTAL` set via DSN pragma before `journal_mode(WAL)` in the ingestor's `OpenStoreWithInterval` - Must be set before any tables are created; DSN ordering ensures this ### 2. Post-reaper incremental vacuum - `PRAGMA incremental_vacuum(N)` runs after every retention reaper cycle (packets, metrics, observers, neighbor edges) - N defaults to 1024 pages, configurable via `db.incrementalVacuumPages` - Noop on `auto_vacuum=NONE` databases (safe before migration) - Added to both server and ingestor ### 3. Opt-in full VACUUM for existing databases - Startup check logs a clear warning if `auto_vacuum != INCREMENTAL` - `db.vacuumOnStartup: true` config triggers one-time `PRAGMA auto_vacuum = INCREMENTAL; VACUUM` - Logs start/end time for operator visibility ### 4. Documentation - `docs/user-guide/configuration.md`: retention section notes that lowering retention doesn't immediately shrink the DB - `docs/user-guide/database.md`: new guide covering WAL, auto-vacuum, migration, manual VACUUM ### 5. Tests - `TestNewDBHasIncrementalAutoVacuum` — fresh DB gets `auto_vacuum=2` - `TestExistingDBHasAutoVacuumNone` — old DB stays at `auto_vacuum=0` - `TestVacuumOnStartupMigratesDB` — full VACUUM sets `auto_vacuum=2` - `TestIncrementalVacuumReducesFreelist` — DELETE + vacuum shrinks freelist - `TestCheckAutoVacuumLogs` — handles both modes without panic - `TestConfigIncrementalVacuumPages` — config defaults and overrides ## Migration path for existing databases 1. On startup, CoreScope logs: `[db] auto_vacuum=NONE — DB needs one-time VACUUM...` 2. Set `db.vacuumOnStartup: true` in config.json 3. Restart — VACUUM runs (blocks startup, minutes on large DBs) 4. Remove `vacuumOnStartup` after migration ## Test results ``` ok github.com/corescope/server 19.448s ok github.com/corescope/ingestor 30.682s ``` --------- Co-authored-by: you <you@example.com>	2026-04-30 23:45:00 -07:00
Joel Claw	b9ba447046	feat: add nodeBlacklist config to hide abusive/troll nodes (#742 ) ## Problem Some mesh participants set offensive names, report deliberately false GPS positions, or otherwise troll the network. Instance operators currently have no way to hide these nodes from public-facing APIs without deleting the underlying data. ## Solution Add a `nodeBlacklist` array to `config.json` containing public keys of nodes to exclude from all API responses. ### Blacklisted nodes are filtered from: - `GET /api/nodes` — list endpoint - `GET /api/nodes/search` — search results - `GET /api/nodes/{pubkey}` — detail (returns 404) - `GET /api/nodes/{pubkey}/health` — returns 404 - `GET /api/nodes/{pubkey}/paths` — returns 404 - `GET /api/nodes/{pubkey}/analytics` — returns 404 - `GET /api/nodes/{pubkey}/neighbors` — returns 404 - `GET /api/nodes/bulk-health` — filtered from results ### Config example ```json { "nodeBlacklist": [ "aabbccdd...", "11223344..." ] } ``` ### Design decisions - Case-insensitive — public keys normalized to lowercase - Whitespace trimming — leading/trailing whitespace handled - Empty entries ignored — `""` or `" "` do not cause false positives - Nil-safe — `IsBlacklisted()` on nil Config returns false - Backward-compatible — empty/missing `nodeBlacklist` has zero effect - Lazy-cached set — blacklist converted to `map[string]bool` on first lookup ### What this does NOT do (intentionally) - Does not delete or modify database data — only filters API responses - Does not block packet ingestion — data still flows for analytics - Does not filter `/api/packets` — only node-facing endpoints are affected ## Testing - Unit tests for `Config.IsBlacklisted()` (case sensitivity, whitespace, empty entries, nil config) - Integration tests for `/api/nodes`, `/api/nodes/{pubkey}`, `/api/nodes/search` - Full test suite passes with no regressions	2026-04-17 23:43:05 +00:00
Joel Claw	fa3f623bd6	feat: add observer retention — remove stale observers after configurable days (#764 ) ## Summary Observers that stop actively sending data now get removed after a configurable retention period (default 14 days). Previously, observers remained in the `observers` table forever. This meant nodes that were once observers for an instance but are no longer connected (even if still active in the mesh elsewhere) would continue appearing in the observer list indefinitely. ## Key Design Decisions - Active data requirement: `last_seen` is only updated when the observer itself sends packets (via `stmtUpdateObserverLastSeen`). Being seen by another node does NOT update this field. So an observer must actively send data to stay listed. - Default: 14 days — observers not seen in 14 days are removed - `-1` = keep forever — for users who want observers to never be removed - `0` = use default (14 days) — same as not setting the field - Runs on startup + daily ticker — staggered 3 minutes after metrics prune to avoid DB contention ## Changes \| File \| Change \| \|------\|--------\| \| `cmd/ingestor/config.go` \| Add `ObserverDays` to `RetentionConfig`, add `ObserverDaysOrDefault()` \| \| `cmd/ingestor/db.go` \| Add `RemoveStaleObservers()` — deletes observers with `last_seen` before cutoff \| \| `cmd/ingestor/main.go` \| Wire up startup + daily ticker for observer retention \| \| `cmd/server/config.go` \| Add `ObserverDays` to `RetentionConfig`, add `ObserverDaysOrDefault()` \| \| `cmd/server/db.go` \| Add `RemoveStaleObservers()` (server-side, uses read-write connection) \| \| `cmd/server/main.go` \| Wire up startup + daily ticker, shutdown cleanup \| \| `cmd/server/routes.go` \| Admin prune API now also removes stale observers \| \| `config.example.json` \| Add `observerDays: 14` with documentation \| \| `cmd/ingestor/coverage_boost_test.go` \| 4 tests: basic removal, empty store, keep forever (-1), default (0→14) \| \| `cmd/server/config_test.go` \| 4 tests: `ObserverDaysOrDefault` edge cases \| ## Config Example ```json { "retention": { "nodeDays": 7, "observerDays": 14, "packetDays": 30, "_comment": "observerDays: -1 = keep forever, 0 = use default (14)" } } ``` ## Admin API The `/api/admin/prune` endpoint now also removes stale observers (using `observerDays` from config) and reports `observers_removed` in the response alongside `packets_deleted`. ## Test Plan - [x] `TestRemoveStaleObservers` — old observer removed, recent observer kept - [x] `TestRemoveStaleObserversNone` — empty store, no errors - [x] `TestRemoveStaleObserversKeepForever` — `-1` keeps even year-old observers - [x] `TestRemoveStaleObserversDefault` — `0` defaults to 14 days - [x] `TestObserverDaysOrDefault` (ingestor) — nil/zero/positive/keep-forever - [x] `TestObserverDaysOrDefault` (server) — nil/zero/positive/keep-forever - [x] Both binaries compile cleanly (`go build`) - [ ] Manual: verify observer count decreases after retention period on a live instance	2026-04-17 09:24:40 -07:00
copelaje	d27a7a653e	fix case on channel key so Public decode/display works right (#761 ) Simple change. Before this change Public wasn't showing up in the channels display due to the case issue.	2026-04-16 00:14:47 -07:00
efiten	b7c2cb070c	docs: geofilter manual + config.example.json entry (#734 ) ## Summary - Add missing `geo_filter` block to `config.example.json` with polygon example, `bufferKm`, and inline `_comment` - Add `docs/user-guide/geofilter.md`: full operator guide covering config schema, GeoFilter Builder workflow, and prune script as one-time migration tool - Add Geographic filtering section to `docs/user-guide/configuration.md` with link to the full guide Closes #669 (M1: documentation) ## Test plan - [x] `config.example.json` parses cleanly (no JSON errors) - [x] `docs/user-guide/geofilter.md` renders correctly in GitHub preview - [x] Link from `configuration.md` to `geofilter.md` resolves 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 22:43:19 -07:00
Kpa-clawbot	767c8a5a3e	perf: async chunked backfill — HTTP serves within 2 minutes (#612 ) (#614 ) ## Summary Adds two config knobs for controlling backfill scope and neighbor graph data retention, plus removes the dead synchronous backfill function. ## Changes ### Config knobs #### `resolvedPath.backfillHours` (default: 24) Controls how far back (in hours) the async backfill scans for observations with NULL `resolved_path`. Transmissions with `first_seen` older than this window are skipped, reducing startup time for instances with large historical datasets. #### `neighborGraph.maxAgeDays` (default: 30) Controls the maximum age of `neighbor_edges` entries. Edges with `last_seen` older than this are pruned from both SQLite and the in-memory graph. Pruning runs on startup (after a 4-minute stagger) and every 24 hours thereafter. ### Dead code removal - Removed the synchronous `backfillResolvedPaths` function that was replaced by the async version. ### Implementation details - `backfillResolvedPathsAsync` now accepts a `backfillHours` parameter and filters by `tx.FirstSeen` - `NeighborGraph.PruneOlderThan(cutoff)` removes stale edges from the in-memory graph - `PruneNeighborEdges(conn, graph, maxAgeDays)` prunes both DB and in-memory graph - Periodic pruning ticker follows the same pattern as metrics pruning (24h interval, staggered start) - Graceful shutdown stops the edge prune ticker ### Config example Both knobs added to `config.example.json` with `_comment` fields. ## Tests - Config default/override tests for both knobs - `TestGraphPruneOlderThan` — in-memory edge pruning - `TestPruneNeighborEdgesDB` — SQLite + in-memory pruning together - `TestBackfillRespectsHourWindow` — verifies old transmissions are excluded by backfill window --------- Co-authored-by: you <you@example.com>	2026-04-05 09:49:39 -07:00
efiten	fe314be3a8	feat: geo_filter enforcement, DB pruning, geofilter-builder tool, HB column (#215 ) ## Summary Several features and fixes from a live deployment of the Go v3.0.0 backend. ### geo_filter — full enforcement - Go backend config (`cmd/server/config.go`, `cmd/ingestor/config.go`): added `GeoFilterConfig` struct so `geo_filter.polygon` and `bufferKm` from `config.json` are parsed by both the server and ingestor - Ingestor (`cmd/ingestor/geo_filter.go`, `cmd/ingestor/main.go`): ADVERT packets from nodes outside the configured polygon + buffer are dropped before any DB write — no transmission, node, or observation data is stored - Server API (`cmd/server/geo_filter.go`, `cmd/server/routes.go`): `GET /api/config/geo-filter` endpoint returns the polygon + bufferKm to the frontend; `/api/nodes` responses filter out any out-of-area nodes already in the DB - Frontend (`public/map.js`, `public/live.js`): blue polygon overlay (solid inner + dashed buffer zone) on Map and Live pages, toggled via "Mesh live area" checkbox, state shared via localStorage ### Automatic DB pruning - Add `retention.packetDays` to `config.json` to delete transmissions + observations older than N days on a daily schedule (1 min after startup, then every 24h). Nodes and observers are never pruned. - `POST /api/admin/prune?days=N` for manual runs (requires `X-API-Key` header if `apiKey` is set) ```json "retention": { "nodeDays": 7, "packetDays": 30 } ``` ### tools/geofilter-builder.html Standalone HTML tool (no server needed) — open in browser, click to place polygon points on a Leaflet map, set `bufferKm`, copy the generated `geo_filter` JSON block into `config.json`. ### scripts/prune-nodes-outside-geo-filter.py Utility script to clean existing out-of-area nodes from the database (dry-run + confirm). Useful after first enabling geo_filter on a populated DB. ### HB column in packets table Shows the hop hash size in bytes (1–4) decoded from the path byte of each packet's raw hex. Displayed as HB between Size and Type columns, hidden on small screens. ## Test plan - [x] ADVERT from node outside polygon is not stored (no new row in nodes or transmissions) - [x] `GET /api/config/geo-filter` returns polygon + bufferKm when configured, `{polygon: null, bufferKm: 0}` when not - [x] `/api/nodes` excludes nodes outside polygon even if present in DB - [x] Map and Live pages show blue polygon overlay when configured; checkbox toggles it - [x] `retention.packetDays: 30` deletes old transmissions/observations on startup and daily - [x] `POST /api/admin/prune?days=30` returns `{deleted: N, days: 30}` - [x] `tools/geofilter-builder.html` opens standalone, draws polygon, copies valid JSON - [x] HB column shows 1–4 for all packets in grouped and flat view 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 01:10:56 -07:00
Kpa-clawbot	1e1fb298c2	Backend: timestamp config for client defaults (#292 ) ## Backend: Timestamp Config for Client Defaults Refs #286 — implements backend scope from the [final spec](https://github.com/Kpa-clawbot/CoreScope/issues/286#issuecomment-4158891089). ### What changed Config struct (`cmd/server/config.go`) - Added `TimestampConfig` struct with `defaultMode`, `timezone`, `formatPreset`, `customFormat`, `allowCustomFormat` - Added `Timestamps TimestampConfig` to main `Config` struct - Normalization method: invalid values fall back to safe defaults (`ago`/`local`/`iso`) Startup warnings (`cmd/server/main.go`)* - Missing timestamps section: `[config] timestamps not configured — using defaults (ago/local/iso)` - Invalid values logged with what was normalized API endpoint (`cmd/server/routes.go`) - Timestamp config included in `GET /api/config/client` response via `ClientConfigResponse` - Frontend reads server defaults from this endpoint Config example (`config.example.json`) - Added `timestamps` section with documented defaults ### Tests (`cmd/server/`) - Config loads with timestamps section - Config loads without timestamps section (defaults applied) - Invalid values are normalized - `/api/config/client` returns timestamp config ### Validation - `cd cmd/server && go test ./...` ✅ --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-30 17:41:45 -07:00
you	0f70cd1ac0	feat: make health thresholds configurable in hours Change healthThresholds config from milliseconds to hours for readability. Config keys: infraDegradedHours, infraSilentHours, nodeDegradedHours, nodeSilentHours. Defaults: infra degraded 24h, silent 72h; node degraded 1h, silent 24h. - Config stored in hours, converted to ms at comparison time - /api/config/client sends ms to frontend (backward compatible) - Frontend tooltips use dynamic thresholds instead of hardcoded strings - Added healthThresholds section to config.example.json - Updated Go and Node.js servers, tests	2026-03-29 09:50:32 -07:00
Kpa-clawbot	cdcaa476f2	rename: MeshCore Analyzer → CoreScope (Phase 1 — backend + infra) Rename product branding, binary names, Docker images, container names, Go modules, proto go_package, CI, manage.sh, and documentation. Preserved (backward compat): - meshcore.db database filename - meshcore-data / meshcore-staging-data directory paths - MQTT topics (meshcore/#, meshcore/+/+/packets, etc.) - proto package namespace (meshcore.v1) - localStorage keys Changes by category: - Go modules: github.com/corescope/{server,ingestor} - Binaries: corescope-server, corescope-ingestor - Docker images: corescope:latest, corescope-go:latest - Containers: corescope-prod, corescope-staging, corescope-staging-go - Supervisord programs: corescope, corescope-server, corescope-ingestor - Branding: siteName, heroTitle, startup logs, fallback HTML - Proto go_package: github.com/corescope/proto/v1 - CI: container refs, deploy path - Docs: 8 markdown files updated Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-28 14:08:15 -07:00
Kpa-clawbot	520adcc6ab	feat: move stale nodes to inactive_nodes table, fixes #202 - Create inactive_nodes table with identical schema to nodes - Add retention.nodeDays config (default 7) in Node.js and Go - On startup: move nodes not seen in N days to inactive_nodes - Daily timer (24h setInterval / goroutine ticker) repeats the move - Log 'Moved X nodes to inactive_nodes (not seen in N days)' - All existing queries unchanged — they only read nodes table - Add 14 new tests for moveStaleNodes in test-db.js - Both Node (db.js/server.js) and Go (ingestor/server) implemented Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 22:43:53 -07:00
you	0ed96539db	feat: config-driven customization system (Phase 1) Add GET /api/config/theme endpoint serving branding, theme colors, node colors, and home page content from config.json with sensible defaults so unconfigured instances look identical to before. Client-side (app.js): - Fetch theme config on page load, before first render - Override CSS variables from theme.* on document root - Override ROLE_COLORS/ROLE_STYLE from nodeColors.* - Replace nav brand text, logo, favicon from branding.* - Store config in window.SITE_CONFIG for other pages Home page (home.js): - Hero title/subtitle from config.home - Steps and checklist from config.home - Footer links from config.home.footerLinks - Chooser welcome text uses configured siteName Config example updated with all available theme options. No default appearance changes — all overrides are optional.	2026-03-23 00:37:48 +00:00
you	2170dd7743	security: require API key for POST /api/packets and /api/perf/reset - New config.apiKey field — when set, POST endpoints require X-Api-Key header - If apiKey not configured, endpoints remain open (dev/local mode) - GET endpoints and /api/decode (read-only) remain public - Closes the packet injection attack surface	2026-03-21 18:40:06 +00:00
you	7ac051d2a9	Add rainbow table of pre-computed channel keys for common MeshCore channels - channel-rainbow.json: 592 pre-computed SHA256-derived keys for common channel names (cities, topics, ham radio, emergency, etc.) - server.js: Load rainbow table at startup as lowest-priority key source - config.example.json: Add #LongFast to hashChannels list Key derivation verified against MeshCore source: SHA256('#name')[:16bytes]. Rainbow table boosted decryption from ~48% to ~88% in testing.	2026-03-21 06:35:14 +00:00
you	1a7145fc46	Use hashChannels for derived keys, keep only hardcoded public key in channelKeys Channel keys for #test, #sf, #wardrive, #yo, #bot, #queer, #bookclub, #shtf are all SHA256(channelName).slice(0,32) — no need to hardcode them. Move to hashChannels array for auto-derivation. Only the MeshCore default public key (8b3387e9c5cdea6ac9e5edbaa115cd72) needs explicit specification since it's not derived from its channel name.	2026-03-21 06:22:05 +00:00
you	81275acff0	Make map default center/zoom configurable via config.json Adds mapDefaults config option with center and zoom properties. New /api/config/map endpoint serves the defaults. live.js and map.js fetch the config with fallback to hardcoded Bay Area defaults. Fixes Kpa-clawbot/meshcore-analyzer#115	2026-03-21 05:29:05 +00:00
you	02d00a17bb	fix: add hashChannels to config.example.json	2026-03-21 03:52:50 +00:00
you	d660c03833	feat: realistic packet propagation mode on live map	2026-03-21 01:41:55 +00:00
Kpa-clawbot	4a9e69b207	Merge pull request #105 from lincomatic/https add https support	2026-03-21 01:25:15 +00:00
lincomatic	209e17fcd4	add https support	2026-03-20 09:21:02 -07:00
you	9bf78bd28d	feat: add MRY (Monterey) to lincomatic MQTT topics	2026-03-20 15:29:37 +00:00
you	5fe275b3f8	fix: use region-specific MQTT topics instead of wildcards — saves bandwidth	2026-03-20 15:28:54 +00:00
you	db884f12eb	fix: 4 bugs - spark bars inline style, My Nodes filter field names, duplicate pin button, map dark mode 1. Spark bars: inline style override on td (max-width:none, min-width:80px) 2. My Nodes filter: pubkey→pubKey, to/from→srcPubKey/destPubKey/srcHash/destHash 3. Pin button: guard against duplicates in init, remove in destroy 4. Map page: CartoDB dark/light tiles with MutationObserver theme swap	2026-03-20 09:21:17 +00:00
you	2713d501b4	feat: MQTT topic arrays, IATA filtering, observer status parsing - mqttSources[].topics is now an array of topic patterns - mqttSources[].iataFilter optionally restricts to specific regions - meshcore/<region>/<id>/status topic parsed for observer metadata: name, model, firmware, client_version, radio, battery, uptime, noise_floor - New observer columns with auto-migration for existing DBs - Status updates don't inflate packet_count (separate updateObserverStatus)	2026-03-20 07:29:01 +00:00
you	4ff72935ca	feat: multi-broker MQTT support config.mqttSources array allows multiple MQTT brokers with independent topics, credentials, and TLS settings. Legacy config.mqtt still works for backward compatibility.	2026-03-20 07:18:49 +00:00
you	d8d0572abb	perf: in-memory packet store — all reads from RAM, SQLite write-only - PacketStore loads all packets into memory on startup (~11MB for 27K packets) - Indexed by id, hash, observer, and node pubkey for fast lookups - /api/packets, /api/packets/timestamps, /api/packets/:id all served from RAM - MQTT ingest writes to both RAM + SQLite - Configurable maxMemoryMB (default 1024MB) in config.json packetStore section - groupByHash queries computed in-memory - Packet store stats exposed in /api/perf - Expected: /api/packets goes from 77ms to <1ms	2026-03-20 03:38:37 +00:00
you	de658bfb0d	perf: configurable cache TTLs via config.json — server + client fetch from /api/config/cache All cache TTLs now read from config.json cacheTTL section (seconds). Client fetches config on load via GET /api/config/cache. config.example.json updated with defaults. Edit config.json, restart server — no code changes needed to tweak TTLs.	2026-03-20 03:23:58 +00:00
you	e525566080	Add config.example.json	2026-03-18 19:55:28 +00:00

40 Commits