Eliminates the SQLITE_BUSY VACUUM bug from #1283 by making cmd/server
truly read-only. The bug surfaced when supervisord launched both
ingestor + server in one container: the ingestor took the write lock for
INSERTs, then the server's VACUUM-on-startup immediately failed with
SQLITE_BUSY. Same race latently affected three other server-side writes.
Four write operations moved out of cmd/server/:
1. VACUUM / auto_vacuum migration (cmd/server/vacuum.go, entire file)
→ cmd/ingestor/db.go Store.CheckAutoVacuum (already existed;
ingestor runs it BEFORE the MQTT subscriber starts so there is
no contention with concurrent writes).
2. PruneOldPackets (DELETE FROM transmissions)
cmd/server/db.go → cmd/ingestor/maintenance.go (new file,
Store.PruneOldPackets) + main.go scheduler.
3. PruneOldMetrics (DELETE FROM observer_metrics)
cmd/server/db.go → cmd/ingestor/db.go Store.PruneOldMetrics
(already existed).
4. RemoveStaleObservers (UPDATE observers SET inactive = 1)
cmd/server/db.go → cmd/ingestor/db.go Store.RemoveStaleObservers
(already existed).
Server-side changes:
- vacuum.go deleted; checkAutoVacuum / runIncrementalVacuum gone.
- cmd/server/db.go: PruneOldPackets, PruneOldMetrics, RemoveStaleObservers
deleted.
- cmd/server/main.go: packet/metrics/observer prune schedulers removed;
the neighbor-edge prune scheduler (PruneNeighborEdges) is intentionally
left in place — outside scope of #1283, tracked separately.
- routes.go + openapi.go: /api/admin/prune endpoint removed (prune is
scheduled by the ingestor now; operators restart the ingestor for an
ad-hoc pass).
Ingestor changes:
- New cmd/ingestor/maintenance.go with Store.PruneOldPackets.
- cmd/ingestor/config.go gains RetentionConfig.PacketDays and
Config.PacketDaysOrZero().
- cmd/ingestor/main.go runs PruneOldPackets at startup (if
packetDays > 0) and on a 24h ticker.
Docs:
- AGENTS.md: documents the read/write separation invariant.
- config.example.json: notes that retention + vacuumOnStartup are
consumed by the ingestor.
TDD:
- Red: bb1d749a — invariant tests + Store.PruneOldPackets stub.
- Green: this commit — real implementation + server-side removals.
Note: cachedRW() still has three out-of-scope callers in cmd/server
(neighbor_persist.go, ensure_indexes.go, from_pubkey_migration.go).
Those are pre-existing write paths not covered by issue #1283 and are
left untouched per the issue scope. Future work can relocate them
under the same invariant.
RED: 97f49a0c · CI:
https://github.com/Kpa-clawbot/CoreScope/actions/runs/26046530920Fixes#1265.
## Problem
On staging two clock-skew endpoints serve compute-on-request:
- `/api/observers/clock-skew` — 3.3s
- `/api/nodes/clock-skew` — 8.9s
Both drive a full `clockSkew.Recompute` over 100k+ adverts while holding
`s.mu.RLock`, blocking under concurrent reader load.
## Fix
Wire both endpoints into the established `analytics_recomputer.go`
pattern (PRs #1248 / #1259 / #1263). Two new slots:
- `recompObserversClockSkew` — wraps `computeObserverCalibrations()`
- `recompNodesClockSkew` — wraps `computeFleetClockSkew()`
Accessors `GetObserverCalibrations` / `GetFleetClockSkew` now prefer the
atomic-pointer snapshot; on-request compute is fallback-only for the
brief window before initial sync compute lands (and for tests that skip
the recomputer).
Default interval **300s**, overridable via:
```json
"analytics": {
"recomputeIntervalSeconds": {
"observersClockSkew": 300,
"nodesClockSkew": 300
}
}
```
`config.example.json` + the `_comment_analytics` doc updated.
## TDD
- RED `97f49a0c` — `TestClockSkewRecomputersRegistered` +
`TestClockSkewHandlersSteadyStateLatency` (8 concurrent readers × 25
reqs per endpoint, p99 < 100ms gate). Fails on master: recomputer slots
nil.
- GREEN `19599375` — wire + accessor switch. p99 well under 5ms on the
test fixture.
## Verification
```
cd cmd/server && go test ./... -count=1 # ok 42s
bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master # all gates pass
```
---------
Co-authored-by: CoreScope Bot <bot@corescope.local>
RED commit: `0190466d` — failing CI:
https://github.com/Kpa-clawbot/CoreScope/actions (will populate after PR
creation)
## Problem
On staging (commit `d69d9fb`, 78k tx, 2.3M obs), `curl
http://localhost/api/analytics/roles` times out at 60s with 0 bytes —
the Roles tab is unusable. Issue #1256.
PR #1248's steady-state recomputer fan-out (topology / rf / distance /
channels / hash-collisions / hash-sizes) **didn't include roles**. The
legacy handler:
1. Holds `s.mu.RLock` for the entire compute.
2. Calls `GetFleetClockSkew()`, which drives `clockSkew.Recompute(s)`
over all ADVERT transmissions — O(78k) per request.
3. Concurrent ingest writers compound the latency through
writer-starvation.
Result: every request hits the cold path; the response never comes back
inside the 60 s HTTP budget.
## Fix
Add `roles` as the 7th endpoint in the recomputer fan-out — same pattern
as #1248:
- `PacketStore.recompRoles` slot, registered in
`StartAnalyticsRecomputers` with default 5-min interval.
- `PacketStore.GetAnalyticsRoles()` → atomic-pointer load from the
snapshot (sub-ms), with a `computeAnalyticsRoles()` fallback only for
the brief startup window before the initial sync compute completes.
- Handler is now a thin wrapper — no lock-held work on the request path.
- New optional `roles` key under `analytics.recomputeIntervalSeconds` in
config; `config.example.json` and `_comment_analytics` updated.
## Latency (unit-scope benchmark)
- Worst-of-50 handler latency: **<100 ms** (test budget; well under the
2 s p99 acceptance).
- Compute itself is bounded by the existing 5-min recompute window — it
runs once in the background, never on the request path.
## Tests
- RED `0190466d`: asserts `recompRoles` is registered and the handler
returns under the latency budget. Fails on master with `recompRoles not
registered`.
- GREEN `d7784f76`: registers the recomputer + snapshot accessor — both
tests pass.
Fixes#1256
---------
Co-authored-by: openclaw-bot <bot@openclaw.local>
RED commit: `27630f6a` — adds latency test that fails on master
(p99=225ms > 50ms budget) and a stub `StartAnalyticsRecomputers` that
returns a no-op so the assertion (not a build error) gates the change.
GREEN commit: `20fbbceb` — wires real background recompute
infrastructure. Test passes at p99=~1µs.
## What changed
Replaces the on-request "compute-then-cache" pattern for the
default-shape analytics queries with a steady-state background recompute
loop. Reads always hit an `atomic.Value` snapshot in <1µs regardless of
compute cost or writer contention. Operator principle: serving slightly
stale data quickly beats real-time data slowly.
## Endpoints converted (default 5min interval each)
| Endpoint | Cold compute | Recomputer interval |
|---|---|---|
| `/api/analytics/topology` | ~5s | 5 min |
| `/api/analytics/rf` | ~4s | 5 min |
| `/api/analytics/distance` | ~3s | 5 min |
| `/api/analytics/channels` | ~0.5s | 5 min |
| `/api/analytics/hash-collisions` | ~0.5s | 5 min |
| `/api/analytics/hash-sizes` | ~22ms | 5 min |
All intervals configurable per-endpoint via
`analytics.recomputeIntervalSeconds.<name>` in `config.json`; documented
in `config.example.json`. Default override via
`analytics.defaultIntervalSeconds`.
## Scope: default query only
Only the canonical shape `(region="", window=zero)` is precomputed.
Region- or window-filtered requests fall back to the legacy TTL cache +
on-request compute — keeps recomputer count bounded (6, not 6×N×M).
## Latency
Test `TestAnalyticsRecomputerSteadyStateLatency`: 100 concurrent readers
+ 4 writers churning `s.mu.Lock` on 20k distHops.
- Before: p50=188ms p99=225ms (assertion failed)
- After: p50=240ns p99=1.1µs (atomic load + map return)
## Shutdown integration
`StartAnalyticsRecomputers` returns a stop closure invoked from
`main.go`'s SIGTERM handler BEFORE `dbClose()` so any in-flight SQLite
compute drains cleanly. `TestAnalyticsRecomputerShutdownNoLeak` confirms
all 6 goroutines are reaped (Δ=6 within 2s).
## Safety details
- Initial compute is synchronous in `Start()` — first read after startup
never sees nil.
- `recover()` inside `runOnce` keeps a compute panic from killing the
goroutine; previous snapshot remains valid.
- `analyticsRecomputerMu` is a sync.RWMutex; recomputer pointers are
read-locked in the hot path. The atomic.Value swap inside `runOnce` is
lock-free.
Fixes#1240.
---------
Co-authored-by: OpenClaw Bot <bot@openclaw.local>
## Summary
Fixes#1239 — `/api/analytics/distance` 15s cold on staging under heavy
ingest. Two independent fixes.
First commit on this branch is the RED test for Fix B (`a539882`),
demonstrating reader/writer contention against the main store lock. CI:
see Actions tab for the run on the test-only commit — it asserts >150µs
avg writer cycle and fails at 82367µs pre-fix. GREEN commit (`d3938f1`)
brings it to 1µs.
## Fix A — TTL bump 15s → 60s (`5eae1e0`)
- `rfCacheTTL` default in `cmd/server/store.go` changed from `15 *
time.Second` to `60 * time.Second`. This is the shared TTL for RF /
topology / distance / hash-sizes / subpath / channel analytics caches.
- Per operator clarification (issue thread): distance analytics IS
viewed live during analysis sessions, not background-glanced. 60s
smooths the cold-miss churn during heavy ingest without freezing data.
- `config.example.json`: documented `cacheTTL.analyticsRF` with new
default + caveat.
- Existing assertions (`TestCacheTTLDefaults`,
`TestHashCollisionsCacheTTL`) updated to the new default.
## Fix B — Drop main RLock around compute (`a539882` red, `d3938f1`
green)
`computeAnalyticsDistance` previously held `s.mu.RLock()` for the entire
iteration: region match-set construction, hop/path filtering, sort,
dedup, histogram, category stats, time series. Readers serialized
writers (ingest, `buildDistanceIndex`).
Refactor: hold the RLock only long enough to snapshot the
`distHops`/`distPaths` slice headers AND build the region match-set
(which reads `tx.Observations`, mutated under `s.mu.Lock`). For
`region=""` (the hot cold-call path) the lock hold is just the header
snapshot — microseconds. Everything else runs on the locally-captured
slices outside the lock.
Safety: `distHops`/`distPaths` are append-only via re-slice in
`buildDistanceIndex` / `updateDistanceIndexForTxs` (both under
`s.mu.Lock`). If the backing array reallocates after the snapshot, the
snapshot still references the prior array (GC-pinned) at the consistent
length captured under the lock. Records are value types — no torn
writes.
## Test results
`cmd/server/distance_lock_contention_test.go` (8 reader goroutines × 20k
synthetic distHops × 200 writer Lock/Unlock cycles):
- pre-fix avg writer cycle: **82367µs** (16.5s for 200 cycles)
- post-fix avg writer cycle: **1µs** (279µs for 200 cycles)
- ~82000× reduction in writer contention; reader result shape unchanged
Full `go test ./cmd/server/...` green with `-race`.
## Out of scope (per issue)
- Same lock pattern in topology / RF / hash / subpath analytics — file
separately if needed.
- Per-region cache key sharding.
- WebSocket-driven cache invalidation.
---------
Co-authored-by: openclaw-bot <bot@openclaw.local>
Fixes#1228 — geo-implausible neighbor-graph edges are rejected at build
time.
Red commit: `5a6d9660` — failing tests for 4 cases (reject SF↔Berlin,
accept local CA, accept no-GPS endpoint, counter increments). Live CI
run (latest commit):
https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1228
## Why
The disambiguator's tier-1 affinity graph is built blindly from path
co-occurrence. On wide-geo MQTT deployments, a single bad hop
disambiguation seeds an edge across geographically impossible distances
(e.g. Bay Area ↔ Berlin), which then reinforces the same wrong
resolution next time. Self-poisoning spiral.
## What changed
- `upsertEdge` now consults a per-graph GPS index. When **both**
endpoints have known GPS and their haversine distance exceeds the
threshold, the edge is dropped and `NeighborGraph.RejectedEdgesGeoFar`
(atomic) is incremented.
- Either endpoint missing GPS ⇒ accept (no signal to reject), per
acceptance criteria.
- Threshold is configurable via `neighborGraph.maxEdgeKm` (default **500
km** — well above any plausible terrestrial LoRa hop, including
satellite-assisted). 0 ⇒ use default; negative ⇒ disable the filter.
Exposed via `Config.NeighborMaxEdgeKm()`.
- New `BuildFromStoreWithOptions` carrying the threshold;
`BuildFromStore` and `BuildFromStoreWithLog` are kept as thin wrappers.
- Stats are surfaced under `GET /api/analytics/neighbor-graph` as
`stats.rejected_edges_geo_far`.
- All rejection logs PII-truncate pubkeys to 8 hex chars (public repo
discipline).
- `config.example.json` updated with the new field + comment.
## Follow-up
#1229 (per-region scoped affinity graphs) depends on this landing first.
---------
Co-authored-by: corescope-bot <bot@corescope.local>
Closes#1183
## Summary
- Adds `packetStore.hotStartupHours` config key (float64, default 0 =
disabled). When set, `Load()` loads only that many hours of data
synchronously, reducing startup time on large DBs. Background goroutine
fills the remaining `retentionHours` window in daily chunks after
startup completes.
- A background goroutine (`loadBackgroundChunks`) fills the remaining
`retentionHours` window in daily chunks after startup completes.
Analytics indexes are rebuilt once at the end.
- `QueryPackets` and `QueryGroupedPackets` check `oldestLoaded` and fall
back to `db.QueryPackets()` for any query whose `Since`/`Until` predates
the in-memory window — covering days 8–30 permanently (beyond
`retentionHours`) and the background-fill gap during startup.
- `/api/perf` gains `hotStartupHours`, `backgroundLoadComplete`, and
`backgroundLoadProgress` fields inside `packetStore` so operators can
monitor the fill.
### Drive-by fixes
- E2E: added `gotoPackets` navigation helper used across packet-related
tests
- E2E: rewrote stripe assertion to check per-row stripe parity rather
than a fragile computed-style comparison
- E2E: theme test updated to use `#/home` as the initial route (was
`#/`)
- `db.go`: removed the RFC3339→unix-timestamp subquery path in
`buildTransmissionWhere`; `t.first_seen` is now always compared directly
as a string for both RFC3339 and non-RFC3339 inputs
## Configuration
```json
"packetStore": {
"retentionHours": 168,
"hotStartupHours": 24
}
```
`hotStartupHours: 0` (default) preserves existing behavior exactly.
Recommended for large DBs to reduce startup time; set to 0 to disable
(loads full retentionHours at startup, legacy behavior).
## Test plan
- [x] `TestHotStartupConfig_Clamp` — clamping when `hotStartupHours >
retentionHours`
- [x] `TestHotStartupConfig_ZeroIsDisabled` — zero leaves feature
disabled
- [x] `TestHotStartup_LoadsOnlyHotWindow` — only hot-window packets in
memory after `Load()`
- [x] `TestHotStartup_DisabledWhenZero` — all retention packets loaded
when disabled
- [x] `TestHotStartup_loadChunk_AddsOlderData` — chunk merges correctly,
ASC order maintained
- [x] `TestHotStartup_BackgroundFillsToRetention` — background goroutine
fills to `retentionHours`
- [x] `TestHotStartup_ChunkErrorRecovery` — chunk SQL failure logged and
skipped, loop terminates
- [x] `TestHotStartup_SQLFallback_TriggeredForOldDate` — query before
`oldestLoaded` routes to SQL
- [x] `TestHotStartup_SQLFallback_NotTriggeredForRecentDate` — recent
query stays in-memory
- [x] `TestHotStartup_PerfStats` — new fields present in
`GetPerfStoreStats()` (backs the perf endpoint)
- [x] `TestHotStartup_PerfStoreHTTP` — HTTP-level: GET /api/perf returns
`hotStartupHours`, `backgroundLoadComplete`, `backgroundLoadProgress` in
`packetStore`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: openclaw-bot <bot@openclaw.local>
Co-authored-by: CoreScope Bot <bot@corescope.local>
Red commit: e964ec9c46 (CI run: pending —
workflow only triggers on PR open)
Partial fix for #1120 — finishes the four follow-up items left open
after PR #1123 (cancelled writes, ingestor I/O, threshold-flag tests,
docs).
## What's done
- **`cancelledWriteBytesPerSec`** — server `/proc/self/io` parser
handles `cancelled_write_bytes`; `/api/perf/io` exposes the per-second
rate; Perf page renders it next to Read/Write with ⚠️ when sustained >1
MB/s.
- **Ingestor `/proc/<pid>/io`** — `cmd/ingestor/stats_file.go` samples
its own `/proc/self/io` each tick and includes `procIO` in the snapshot.
The server's `/api/perf/io` reads it and surfaces `.ingestor`. Frontend
renders an `Ingestor process` Disk I/O block alongside the existing
`server process` block (issue mockup: "Both ingestor and server").
- **Threshold + anomaly tests** — `test-perf-disk-io-1120.js` now
asserts ⚠️ fires/suppresses on WAL>100MB, cache_hit<90%, and the
backfill-rate-vs-tx-rate guard with the `tx_inserted >= 100` baseline
floor. Drops the tautological `|| ... === false` short-circuits flagged
in MINOR m4.
- **Docs (m8)** — `config.example.json` adds `_comment_ingestorStats`
(env var, default path, shared-tmp security note);
`cmd/ingestor/README.md` adds `CORESCOPE_INGESTOR_STATS` to the env-var
table plus a `Stats file` section.
## What's NOT done (deferred)
m1 sync.Map → map+RWMutex, m2 perfIOMu rate caching, m3 negative
cacheSize translation, m5 deterministic-write test, m7 ctx-aware
shutdown — pure polish; will file a follow-up issue if the operator
wants them tracked.
## TDD
- Red: `e964ec9` — adds failing tests + stub field/handler shape
(cancelled missing from struct, ingestor stub returns nil, ingestor
procIO absent).
- Green: `1240703` — wires up the parser case, ingestor sampler,
frontend rendering, docs.
E2E assertion added: test-perf-disk-io-1120.js:108
---------
Co-authored-by: clawbot <clawbot@users.noreply.github.com>
Co-authored-by: Kpa-clawbot <bot@kpa-clawbot.local>
Co-authored-by: Kpa-clawbot <bot@kpa-clawbot>
## Why
Diagnostic on #1129 shows PR #1117 (group commit M1 for #1115) is
fundamentally broken: it starves the MQTT goroutine via `gcMu` lock
contention, causing pingresp disconnects and lost packets at modest
ingest rates.
## Three structural defects
1. **Lock held across `sql.Stmt.Exec`** — every concurrent
`InsertTransmission` blocks for the full SQLite write latency, not just
the brief queue mutation.
2. **Lock held across `tx.Commit`** — the WAL fsync runs *under* `gcMu`,
so any backlog blocks all ingest writers AND the flusher ticker,
snowballing under load.
3. **Single-conn DB** (`MaxOpenConns=1`) — the flusher and the ingest
path serialise on one connection, turning the lock into a global ingest
stall.
Net effect: at modest packet rates the MQTT client loop misses its own
pingresp deadline, the broker drops the connection, and packets received
during the stall are lost.
## What this PR removes
- `Store.SetGroupCommit`, `Store.FlushGroupTx`, `Store.flushLocked`,
`Store.GroupCommitMs`
- `gcMu`, `activeTx`, `pendingRows`, `groupCommitMs`,
`groupCommitMaxRows` Store fields
- `groupCommitMs` / `groupCommitMaxRows` config fields and
`GroupCommitMsOrDefault` / `GroupCommitMaxRowsOrDefault` accessors
- The flusher goroutine in `cmd/ingestor/main.go`
- `cmd/ingestor/group_commit_test.go`
- The `if s.activeTx != nil { … pendingRows … }` branch in
`InsertTransmission` — reverts to plain prepared-stmt usage
## What this PR keeps (merged after #1117)
- #1119 `BackfillPathJSON` `path_json='[]'` fix
- #1120/#1123 perf metrics endpoints — `WALCommits` counter retained
- `GroupCommitFlushes` JSON field on `/api/perf/write-sources` is kept
as always-0 for API stability (server `perf_io.go` references it as a
string field name; no client breakage)
- `DBStats.GroupCommitFlushes` atomic field is removed from the Go
struct
## Tests
`cd cmd/ingestor && go test ./... -run "Test"` → `ok` (47.8s).
`cd cmd/server && go build ./...` → clean.
## #1115 stays open
The group-commit *idea* is sound — batching observation INSERTs would
meaningfully reduce WAL fsync rate. But it needs a redesign that does
**not** hold a mutex across blocking SQLite calls. Suggested directions
for a future M1:
- Channel-fed writer goroutine (single owner of the tx, ingest path is
non-blocking enqueue)
- Per-batch DB handle so the flusher doesn't serialise the ingest
connection
- Bounded queue with backpressure rather than a shared lock
Refs #1117#1129
## Summary
Implements **M1 from #1115**: batches observation/transmission INSERTs
into a single SQLite `BEGIN/COMMIT` window instead of fsyncing per
packet. At ~250 obs/sec this drops WAL fsync rate from ~20/s to ~1/s and
eliminates the `obs-persist skipped` / `SQLITE_BUSY` log spam that the
issue documents.
This is a **partial fix** — it ships the group-commit mechanism.
Acceptance items 6–7 (measured fsync rate / measured `obs-persist
skipped` rate at staging steady-state) require post-deploy observation,
and M2 (per-`tx_hash` observation buffering) is intentionally deferred.
The issue stays open for the user to verify on staging.
> Partial fix for #1115 — does not auto-close. Refs #1115.
## Mechanism
- `Store` gains an active `*sql.Tx`, `pendingRows` counter, `gcMu`, and
the `groupCommitMs` / `groupCommitMaxRows` knobs. `SetGroupCommit(ms,
maxRows)` enables the mode; `FlushGroupTx()` commits the in-flight tx.
- `InsertTransmission` lazily opens a tx on the first call after each
flush, then issues all writes through `tx.Stmt()` bindings of the
existing prepared statements. With `MaxOpenConns(1)` the connection is
already serialized; `gcMu` serializes group-commit state without
contention.
- A goroutine in `cmd/ingestor/main.go` calls `FlushGroupTx()` every
`groupCommitMs` ms. `pendingRows >= groupCommitMaxRows` triggers an
eager flush. `Close()` flushes before the WAL checkpoint so no rows are
lost on graceful shutdown.
- `groupCommitMs == 0` short-circuits to the legacy per-call auto-commit
path (statements bound to `s.db`, no tx) — current behavior preserved
byte-for-byte for operators who opt out.
## Config
Two new optional fields (ingestor-only), both documented in
`config.example.json`:
| Field | Default | Effect |
|---|---|---|
| `groupCommitMs` | `1000` | Flush window in ms. `0` disables batching
(legacy per-packet auto-commit). |
| `groupCommitMaxRows` | `1000` | Safety cap; when exceeded the queue
flushes immediately to bound memory and the crash-loss window. |
No DB schema change. No required config change on upgrade.
## Tests (TDD red → green visible in commits)
`cmd/ingestor/group_commit_test.go` — three assertions, written first as
the red commit:
- `TestGroupCommit_BatchesInsertsIntoOneTx` — 50 `InsertTransmission`
calls inside a wide window produce **0** commits until `FlushGroupTx`,
then exactly **1**; all 50 rows visible after flush. (This is the spec's
"50 observations → 1 SQLite write transaction" assertion.)
- `TestGroupCommit_Disabled` — `groupCommitMs=0` keeps every insert
immediately visible and `GroupCommitFlushes` never advances. (Spec's
"groupCommitMs=0 reverts to per-packet behavior" assertion.)
- `TestGroupCommit_MaxRowsForcesEarlyFlush` — cap=3, 7 inserts → 2
auto-flushes from the cap + 1 final manual flush = 3 total.
Red commit: `e2b0370` (stubs `SetGroupCommit` / `FlushGroupTx` so the
tests compile and fail on **assertions**, not import errors).
Green commit: `73f3559`.
Full ingestor suite (`go test ./...` in `cmd/ingestor`) stays green, ~49
s.
## Performance
This PR is the perf change itself. Local micro-test (the new
`TestGroupCommit_BatchesInsertsIntoOneTx`) shows the structural
property: 50 inserts → 1 commit. The fsync-rate measurement called out
in the M1 acceptance criteria (`~20/s → ~1/s` at 250 obs/sec) requires
staging deployment to confirm — that's the remaining open item that
keeps #1115 open after this merges.
No hot-path regressions: when `groupCommitMs > 0` we acquire one mutex
per insert (uncontended in the steady state — the connection was already
single-threaded via `MaxOpenConns(1)`). When `groupCommitMs == 0` the
code path is identical to before plus one nil-tx check.
## What this PR does NOT do (per spec)
- Does not collapse "30 observations of one packet" into 1 row write —
that's M2.
- Does not eliminate dual-writer contention with `cmd/server`'s
`resolved_path` writes.
- Does not change observation ordering or live broadcast latency.
---------
Co-authored-by: corescope-bot <bot@corescope.local>
## Summary
**Partial fix for #730 (M1 only — M2 frontend and M3 alerting
deferred).**
Today the ingestor **silently drops** ADVERTs whose GPS lies outside the
configured `geo_filter` polygon. That's the wrong default for an
analytics tool — operators get zero visibility into bridged or leaked
meshes.
This PR makes the new default **flag, don't drop**: foreign adverts are
stored, the node row is tagged `foreign_advert=1`, and the API surfaces
`"foreign": true` so dashboards / map overlays can be built on top.
## Behavior
| Mode | What happens to an ADVERT outside `geo_filter` |
|---|---|
| (default) flag | Stored, marked `foreign_advert=1`, exposed via API |
| drop (legacy) | Silently dropped (preserves old behavior for ops who
want it) |
## What's done (M1 — Backend)
- ingestor stores foreign adverts instead of dropping
- `nodes.foreign_advert` column added (migration)
- `/api/nodes` and `/api/nodes/{pk}` expose `foreign: true` field
- Config: `geofilter.action: "flag"|"drop"` (default `flag`)
- Tests + config docs
## What's NOT done (deferred to M2 + M3)
- **M2 — Frontend:** Map overlay showing foreign adverts as distinct
markers, foreign-advert filter on packets/nodes pages, dedicated
foreign-advert dashboard
- **M3 — Alerting:** Time-series detection of bridging events, alert
when foreign advert rate spikes, identify bridge entry-point nodes
Issue #730 remains open for M2 and M3.
---------
Co-authored-by: corescope-bot <bot@corescope>
## Summary
Closes#663 (Phase 2 + 3 partial — time-series tracking + thresholds for
nodes that are also observers).
Adds a per-node battery voltage trend chart and
`/api/nodes/{pubkey}/battery` endpoint, sourced from the existing
`observer_metrics.battery_mv` samples populated by observer status
messages. No new ingest or schema changes — purely surfaces data we were
already collecting.
## Scope (TDD red→green)
**RED commit:** test(node-battery) — DB query, endpoint shape
(200/404/no-data), and config getters all asserted.
**GREEN commit:** feat(node-battery) — implementation only.
## Changes
### Backend
- `cmd/server/node_battery.go` (new):
- `DB.GetNodeBatteryHistory(pubkey, since)` — pulls `(timestamp,
battery_mv)` rows from `observer_metrics WHERE LOWER(observer_id) =
LOWER(public_key) AND battery_mv IS NOT NULL`. Case-insensitive join
tolerates historical pubkey casing variation (observers persist
uppercase, nodes lowercase in this DB).
- `Server.handleNodeBattery` — `GET /api/nodes/{pubkey}/battery?days=N`
(default 7, max 365). Returns `{public_key, days, samples[], latest_mv,
latest_ts, status, thresholds}`.
- `Config.LowBatteryMv()` / `CriticalBatteryMv()` — defaults 3300 / 3000
mV.
- `cmd/server/config.go` — `BatteryThresholds *BatteryThresholdsConfig`
field.
- `cmd/server/routes.go` — route registration alongside existing
`/health`, `/analytics`.
### Frontend
- `public/node-analytics.js` — new "Battery Voltage" chart card with
status badge (🔋 OK / ⚠️ Low / 🪫 Critical / No data). Renders dashed
threshold lines at `lowMv` and `criticalMv`. Empty-state message when no
samples in window.
### Config
- `config.example.json` — `batteryThresholds: { lowMv: 3300, criticalMv:
3000 }` with `_comment` per Config Documentation Rule.
## Status semantics
| latest_mv | status |
|-----------------------|------------|
| no samples in window | `unknown` |
| `>= lowMv` | `ok` |
| `< lowMv`, `>= critMv`| `low` |
| `< criticalMv` | `critical` |
## What this PR does NOT do (deferred)
The issue's full Phase 1 (writing decoded sensor advert telemetry into
`nodes.battery_mv` / `temperature_c` from server-side decoder) and Phase
4 (firmware/active polling for repeaters without observers) are out of
scope here. This PR delivers the requested Phase 2/3 surfacing for the
data path that already lands rows: `observer_metrics`. Repeaters that
are also observers (i.e. publish status to MQTT) will get a voltage
trend immediately; pure passive nodes won't until Phase 1 lands.
## Tests
- `TestGetNodeBatteryHistory_FromObserverMetrics` — case-insensitive
join, NULL skipping, ordering.
- `TestNodeBatteryEndpoint` — full happy path with thresholds + status.
- `TestNodeBatteryEndpoint_NoData` — 200 + status=unknown.
- `TestNodeBatteryEndpoint_404` — unknown node.
- `TestBatteryThresholds_ConfigOverride` — config getters + defaults.
`cd cmd/server && go test ./...` — green.
## Performance
Endpoint is per-pubkey (called once on analytics page open), indexed by
`(observer_id, timestamp)` PK on `observer_metrics`. No hot-path impact.
---------
Co-authored-by: bot <bot@corescope>
## Summary
Implements repeater liveness detection per #662 — distinguishes a
repeater that is **actively relaying traffic** from one that is **alive
but idle** (only sending its own adverts).
## Approach
The backend already maintains a `byPathHop` index keyed by lowercase
hop/pubkey for every transmission. Decode-window writes also key it by
**resolved pubkey** for relay hops. We just weren't surfacing it.
`GetRepeaterRelayInfo(pubkey, windowHours)`:
- Reads `byPathHop[pubkey]`.
- Skips packets whose `payload_type == 4` (advert) — a self-advert
proves liveness, not relaying.
- Returns the most recent `FirstSeen` as `lastRelayed`, plus
`relayActive` (within window) and the `windowHours` actually used.
## Three states (per issue)
| State | Indicator | Condition |
|---|---|---|
| 🟢 Relaying | green | `last_relayed` within `relayActiveHours` |
| 🟡 Alive (idle) | yellow | repeater is in the DB but
`relay_active=false` (no recent path-hop appearance, or none ever) |
| ⚪ Stale | existing | falls out of the existing `getNodeStatus` logic |
## API
- `GET /api/nodes` — repeater/room rows now include `last_relayed`
(omitted if never observed) and `relay_active`.
- `GET /api/nodes/{pubkey}` — same fields plus `relay_window_hours`.
## Config
New optional field under `healthThresholds`:
```json
"healthThresholds": {
...,
"relayActiveHours": 24
}
```
Default 24h. Documented in `config.example.json`.
## Frontend
Node detail page gains a **Last Relayed** row for repeaters/rooms with
the 🟢/🟡 state badge. Tooltip explains the distinction from "Last Heard".
## TDD
- **Red commit** `4445f91`: `repeater_liveness_test.go` + stub
`GetRepeaterRelayInfo` returning zero. Active and Stale tests fail on
assertion (LastRelayed empty / mismatched). Idle and IgnoresAdverts
already match the desired behavior under the stub. Compiles, runs, fails
on assertions — not on imports.
- **Green commit** `5fcfb57`: Implementation. All four tests pass. Full
`cmd/server` suite green (~22s).
## Performance
`O(N)` over `byPathHop[pubkey]` per call. The index is bounded by store
eviction; a single repeater has at most a few hundred entries on real
data. The `/api/nodes` loop adds one map read + scan per repeater row —
negligible against the existing enrichment work.
## Limitations (per issue body)
1. Observer coverage gaps — if no observer hears a repeater's relay,
it'll show as idle even when actively relaying. This is inherent to
passive observation.
2. Low-traffic networks — a repeater in a quiet area legitimately shows
idle. The 🟡 indicator copy makes that explicit ("alive (idle)").
3. Hash collisions are mitigated by the existing `resolveWithContext`
path before pubkeys land in `byPathHop`.
Fixes#662
---------
Co-authored-by: clawbot <bot@corescope.local>
## Summary
Fixes#770 — selecting "All" in the region filter dropdown produced an
empty channel list.
## Root cause
`normalizeRegionCodes` (cmd/server/db.go) treated any non-empty input as
a literal IATA code. The frontend region filter labels its catch-all
option **"All"**; while `region-filter.js` normally sends an empty
string when "All" is selected, any code path that ends up sending
`?region=All` (deep-link URLs, manual queries, future callers) caused
the function to return `["ALL"]`. Downstream queries then filtered
observers for `iata = 'ALL'`, which never matches anything → empty
response.
## Fix
`normalizeRegionCodes` now treats `All` / `ALL` / `all`
(case-insensitive, with optional whitespace, mixed in CSV) as equivalent
to an empty value, returning `nil` to signal "no filter". Real IATA
codes (`SJC`, `PDX`, `sjc,PDX` → `[SJC PDX]`) still pass through
unchanged.
This is a defensive server-side fix: a single chokepoint that all
region-aware endpoints already flow through (channels, packets,
analytics, encrypted channels, observer ID resolution).
## Documentation
Expanded `_comment_regions` in `config.example.json` to explain:
- How IATA codes are resolved (payload > topic > source config — set in
#1012)
- What the `regions` map controls (display labels) vs runtime-discovered
codes
- That observers without an IATA tag only appear under "All Regions"
- That the `All` sentinel is server-side safe
## TDD
- **Red commit** (`4f65bf4`): `cmd/server/region_filter_test.go` —
`TestNormalizeRegionCodes_AllIsNoFilter` asserts `All` / `ALL` / `all` /
`""` / `"All,"` all collapse to `nil`. Compiles, runs, fails on
assertion (`got [ALL], want nil`). Companion test
`TestNormalizeRegionCodes_RealCodesPreserved` locks in that `sjc,PDX`
still returns `[SJC PDX]`.
- **Green commit** (`c9fb965`): two-line change in
`normalizeRegionCodes` + docs update.
## Verification
```
$ go test -run TestNormalizeRegionCodes -count=1 ./cmd/server
ok github.com/corescope/server 0.023s
$ go test -count=1 ./cmd/server
ok github.com/corescope/server 21.454s
```
Full suite green; no existing region tests regressed.
Fixes#770
---------
Co-authored-by: Kpa-clawbot <bot@corescope>
## Issue
Closes#819
## Summary
Adds Save Draft / Load Draft / Download buttons to
`/geofilter-builder.html` so operators can:
- Persist their work-in-progress polygon across sessions (localStorage)
- Reload it later to continue editing
- Download a ready-to-paste `geo_filter` JSON snippet for `config.json`
## Implementation
- New module `public/geofilter-draft.js` exposes `GeofilterDraft` global
with `saveDraft / loadDraft / clearDraft / buildConfigSnippet /
downloadConfig`.
- Builder HTML wires three new buttons; updates the help text to
document the new flow.
## TDD
- Red commit: `b0a1a4c` (tests fail — module doesn't exist)
- Green commit: `a717f33` (implementation added, all tests pass)
## How to test
1. Open `/geofilter-builder.html`
2. Click 3+ points on the map
3. Click "Save Draft" — reload page — click "Load Draft" → polygon
restored
4. Click "Download" → `geofilter-config-snippet.json` downloaded with
correct format
---
E2E assertion added: test-e2e-playwright.js:2264
---------
Co-authored-by: you <you@example.com>
Co-authored-by: openclaw-bot <bot@openclaw.local>
## Summary
Add optional `region` field to MQTT source config and JSON payload,
enabling publishers to explicitly provide region data without relying
solely on topic path structure.
## Changes
- **`MQTTSource.Region`** — new optional config field. When set, acts as
default region for all messages from that source (useful when a broker
serves a single region).
- **`MQTTPacketMessage.Region`** — new optional JSON payload field.
Publishers can include `"region": "PDX"` in their MQTT messages.
- **`PacketData.Region`** — carries the resolved region through to
storage.
- **Priority resolution**: payload `region` > topic-derived region >
source config `region`
- Observer IATA is updated with the effective region on every packet.
## Config example
```json
{
"mqttSources": [
{
"name": "cascadia",
"broker": "tcp://cascadia-broker:1883",
"topics": ["meshcore/#"],
"region": "PDX"
}
]
}
```
## Payload example
```json
{"raw": "0a1b2c...", "SNR": 5.2, "region": "PDX"}
```
## TDD
- Red commit: `980304c` (tests fail at compile — fields don't exist)
- Green commit: `4caf88b` (implementation, all tests pass)
## Unblocks
- #804, #770, #730 (all depend on region being available on
observations)
Fixes#788
---------
Co-authored-by: you <you@example.com>
## Summary
Adds a global `observerIATAWhitelist` config field that restricts which
observer IATA regions are processed by the ingestor.
## Problem
Operators running regional instances (e.g., Sweden) want to ensure only
observers physically in their region contribute data. The existing
per-source `iataFilter` only filters packet messages but still allows
status messages through, meaning observers from other regions appear in
the database.
## Solution
New top-level config field `observerIATAWhitelist`:
- When non-empty, **all** messages (status + packets) from observers
outside the whitelist are silently dropped
- Case-insensitive matching
- Empty list = all regions allowed (fully backwards compatible)
- Lazy O(1) lookup via cached uppercase set (same pattern as
`observerBlacklist`)
### Config example
```json
{
"observerIATAWhitelist": ["ARN", "GOT"]
}
```
## TDD
- **Red commit:** `f19c2b2` — tests for `ObserverIATAWhitelist` field
and `IsObserverIATAAllowed` method (build fails)
- **Green commit:** `782f516` — implementation + integration test
## Files changed
- `cmd/ingestor/config.go` — new field, new method
`IsObserverIATAAllowed`
- `cmd/ingestor/main.go` — whitelist check in `handleMessage` before
status processing
- `cmd/ingestor/config_test.go` — unit tests for config parsing and
matching
- `cmd/ingestor/main_test.go` — integration test for handleMessage
filtering
Fixes#914
---------
Co-authored-by: you <you@example.com>
## Summary
Per-source MQTT connect timeout, correctly targeting the `WaitTimeout`
startup gate (#931).
## What changed
- Added `connectTimeoutSec` field to `MQTTSource` struct (per-source,
not global) — `config.go:24`
- Added `ConnectTimeoutOrDefault()` helper returning configured value or
30 (default from #926) — `config.go:29`
- Replaced hardcoded `WaitTimeout(30 * time.Second)` with
`WaitTimeout(time.Duration(connectTimeout) * time.Second)` —
`main.go:173`
- Updated `config.example.json` with field at source level
- Unit tests for default (30) and custom values
## Why this supersedes #976
PR #976 made paho's `SetConnectTimeout` (per-TCP-dial, was 10s)
configurable via a **global** `mqttConnectTimeoutSeconds` field. Issue
#931 explicitly references the **30s timeout** — which is
`WaitTimeout(30s)`, the startup gate from #926. It also requests
**per-source** config, not global.
This PR targets the correct timeout at the correct granularity.
## Live verification (Rule 18)
Two sources pointed at unreachable brokers:
- `fast` (`connectTimeoutSec: 5`): timed out in 5s ✅
- `default` (unset): timed out in 30s ✅
```
19:00:35 MQTT [fast] connect timeout: 5s
19:00:40 MQTT [fast] initial connection timed out — retrying in background
19:00:40 MQTT [default] connect timeout: 30s
19:01:10 MQTT [default] initial connection timed out — retrying in background
```
Closes#931
Supersedes #976
Co-authored-by: you <you@example.com>
## Problem
`Load()` loaded all transmissions from the DB regardless of
`retentionHours`, so `buildSubpathIndex()` processed the full DB history
on every startup. On a DB with ~280K paths this produces ~13.5M subpath
index entries, OOM-killing the process before it ever starts listening —
causing a supervisord crash loop with no useful error message.
## Fix
Apply the same `retentionHours` cutoff to `Load()`'s SQL that
`EvictStale()` already uses at runtime. Both conditions
(`retentionHours` window and `maxPackets` cap) are combined with AND so
neither safety limit is bypassed.
Startup now builds indexes only over the retention window, making
startup time and memory proportional to recent activity rather than
total DB history.
## Docs
- `config.example.json`: adds `retentionHours` to the `packetStore`
block with recommended value `168` (7 days) and a warning about `0` on
large DBs
- `docs/user-guide/configuration.md`: documents the field and adds an
explicit OOM warning
## Test plan
- [x] `cd cmd/server && go test ./... -run TestRetentionLoad` — covers
the retention-filtered load: verifies packets outside the window are
excluded, and that `retentionHours: 0` still loads everything
- [x] Deploy on an instance with a large DB (>100K paths) and
`retentionHours: 168` — server reaches "listening" in seconds instead of
OOM-crashing
- [x] Verify `config.example.json` has `retentionHours: 168` in the
`packetStore` block
- [x] Verify `docs/user-guide/configuration.md` documents the field and
warning
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Kpa-clawbot <kpaclawbot@outlook.com>
Closes#919
## Summary
Enables SQLite incremental auto-vacuum so the database file actually
shrinks after retention reaper deletes old data. Previously, `DELETE`
operations freed pages internally but never returned disk space to the
OS.
## Changes
### 1. Auto-vacuum on new databases
- `PRAGMA auto_vacuum = INCREMENTAL` set via DSN pragma before
`journal_mode(WAL)` in the ingestor's `OpenStoreWithInterval`
- Must be set before any tables are created; DSN ordering ensures this
### 2. Post-reaper incremental vacuum
- `PRAGMA incremental_vacuum(N)` runs after every retention reaper cycle
(packets, metrics, observers, neighbor edges)
- N defaults to 1024 pages, configurable via `db.incrementalVacuumPages`
- Noop on `auto_vacuum=NONE` databases (safe before migration)
- Added to both server and ingestor
### 3. Opt-in full VACUUM for existing databases
- Startup check logs a clear warning if `auto_vacuum != INCREMENTAL`
- `db.vacuumOnStartup: true` config triggers one-time `PRAGMA
auto_vacuum = INCREMENTAL; VACUUM`
- Logs start/end time for operator visibility
### 4. Documentation
- `docs/user-guide/configuration.md`: retention section notes that
lowering retention doesn't immediately shrink the DB
- `docs/user-guide/database.md`: new guide covering WAL, auto-vacuum,
migration, manual VACUUM
### 5. Tests
- `TestNewDBHasIncrementalAutoVacuum` — fresh DB gets `auto_vacuum=2`
- `TestExistingDBHasAutoVacuumNone` — old DB stays at `auto_vacuum=0`
- `TestVacuumOnStartupMigratesDB` — full VACUUM sets `auto_vacuum=2`
- `TestIncrementalVacuumReducesFreelist` — DELETE + vacuum shrinks
freelist
- `TestCheckAutoVacuumLogs` — handles both modes without panic
- `TestConfigIncrementalVacuumPages` — config defaults and overrides
## Migration path for existing databases
1. On startup, CoreScope logs: `[db] auto_vacuum=NONE — DB needs
one-time VACUUM...`
2. Set `db.vacuumOnStartup: true` in config.json
3. Restart — VACUUM runs (blocks startup, minutes on large DBs)
4. Remove `vacuumOnStartup` after migration
## Test results
```
ok github.com/corescope/server 19.448s
ok github.com/corescope/ingestor 30.682s
```
---------
Co-authored-by: you <you@example.com>
## Problem
Some mesh participants set offensive names, report deliberately false
GPS positions, or otherwise troll the network. Instance operators
currently have no way to hide these nodes from public-facing APIs
without deleting the underlying data.
## Solution
Add a `nodeBlacklist` array to `config.json` containing public keys of
nodes to exclude from all API responses.
### Blacklisted nodes are filtered from:
- `GET /api/nodes` — list endpoint
- `GET /api/nodes/search` — search results
- `GET /api/nodes/{pubkey}` — detail (returns 404)
- `GET /api/nodes/{pubkey}/health` — returns 404
- `GET /api/nodes/{pubkey}/paths` — returns 404
- `GET /api/nodes/{pubkey}/analytics` — returns 404
- `GET /api/nodes/{pubkey}/neighbors` — returns 404
- `GET /api/nodes/bulk-health` — filtered from results
### Config example
```json
{
"nodeBlacklist": [
"aabbccdd...",
"11223344..."
]
}
```
### Design decisions
- **Case-insensitive** — public keys normalized to lowercase
- **Whitespace trimming** — leading/trailing whitespace handled
- **Empty entries ignored** — `""` or `" "` do not cause false positives
- **Nil-safe** — `IsBlacklisted()` on nil Config returns false
- **Backward-compatible** — empty/missing `nodeBlacklist` has zero
effect
- **Lazy-cached set** — blacklist converted to `map[string]bool` on
first lookup
### What this does NOT do (intentionally)
- Does **not** delete or modify database data — only filters API
responses
- Does **not** block packet ingestion — data still flows for analytics
- Does **not** filter `/api/packets` — only node-facing endpoints are
affected
## Testing
- Unit tests for `Config.IsBlacklisted()` (case sensitivity, whitespace,
empty entries, nil config)
- Integration tests for `/api/nodes`, `/api/nodes/{pubkey}`,
`/api/nodes/search`
- Full test suite passes with no regressions
## Summary
Observers that stop actively sending data now get removed after a
configurable retention period (default 14 days).
Previously, observers remained in the `observers` table forever. This
meant nodes that were once observers for an instance but are no longer
connected (even if still active in the mesh elsewhere) would continue
appearing in the observer list indefinitely.
## Key Design Decisions
- **Active data requirement**: `last_seen` is only updated when the
observer itself sends packets (via `stmtUpdateObserverLastSeen`). Being
seen by another node does NOT update this field. So an observer must
actively send data to stay listed.
- **Default: 14 days** — observers not seen in 14 days are removed
- **`-1` = keep forever** — for users who want observers to never be
removed
- **`0` = use default (14 days)** — same as not setting the field
- **Runs on startup + daily ticker** — staggered 3 minutes after metrics
prune to avoid DB contention
## Changes
| File | Change |
|------|--------|
| `cmd/ingestor/config.go` | Add `ObserverDays` to `RetentionConfig`,
add `ObserverDaysOrDefault()` |
| `cmd/ingestor/db.go` | Add `RemoveStaleObservers()` — deletes
observers with `last_seen` before cutoff |
| `cmd/ingestor/main.go` | Wire up startup + daily ticker for observer
retention |
| `cmd/server/config.go` | Add `ObserverDays` to `RetentionConfig`, add
`ObserverDaysOrDefault()` |
| `cmd/server/db.go` | Add `RemoveStaleObservers()` (server-side, uses
read-write connection) |
| `cmd/server/main.go` | Wire up startup + daily ticker, shutdown
cleanup |
| `cmd/server/routes.go` | Admin prune API now also removes stale
observers |
| `config.example.json` | Add `observerDays: 14` with documentation |
| `cmd/ingestor/coverage_boost_test.go` | 4 tests: basic removal, empty
store, keep forever (-1), default (0→14) |
| `cmd/server/config_test.go` | 4 tests: `ObserverDaysOrDefault` edge
cases |
## Config Example
```json
{
"retention": {
"nodeDays": 7,
"observerDays": 14,
"packetDays": 30,
"_comment": "observerDays: -1 = keep forever, 0 = use default (14)"
}
}
```
## Admin API
The `/api/admin/prune` endpoint now also removes stale observers (using
`observerDays` from config) and reports `observers_removed` in the
response alongside `packets_deleted`.
## Test Plan
- [x] `TestRemoveStaleObservers` — old observer removed, recent observer
kept
- [x] `TestRemoveStaleObserversNone` — empty store, no errors
- [x] `TestRemoveStaleObserversKeepForever` — `-1` keeps even year-old
observers
- [x] `TestRemoveStaleObserversDefault` — `0` defaults to 14 days
- [x] `TestObserverDaysOrDefault` (ingestor) —
nil/zero/positive/keep-forever
- [x] `TestObserverDaysOrDefault` (server) —
nil/zero/positive/keep-forever
- [x] Both binaries compile cleanly (`go build`)
- [ ] Manual: verify observer count decreases after retention period on
a live instance
## Summary
- Add missing `geo_filter` block to `config.example.json` with polygon
example, `bufferKm`, and inline `_comment`
- Add `docs/user-guide/geofilter.md`: full operator guide covering
config schema, GeoFilter Builder workflow, and prune script as one-time
migration tool
- Add Geographic filtering section to `docs/user-guide/configuration.md`
with link to the full guide
Closes#669 (M1: documentation)
## Test plan
- [x] `config.example.json` parses cleanly (no JSON errors)
- [x] `docs/user-guide/geofilter.md` renders correctly in GitHub preview
- [x] Link from `configuration.md` to `geofilter.md` resolves
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
Adds two config knobs for controlling backfill scope and neighbor graph
data retention, plus removes the dead synchronous backfill function.
## Changes
### Config knobs
#### `resolvedPath.backfillHours` (default: 24)
Controls how far back (in hours) the async backfill scans for
observations with NULL `resolved_path`. Transmissions with `first_seen`
older than this window are skipped, reducing startup time for instances
with large historical datasets.
#### `neighborGraph.maxAgeDays` (default: 30)
Controls the maximum age of `neighbor_edges` entries. Edges with
`last_seen` older than this are pruned from both SQLite and the
in-memory graph. Pruning runs on startup (after a 4-minute stagger) and
every 24 hours thereafter.
### Dead code removal
- Removed the synchronous `backfillResolvedPaths` function that was
replaced by the async version.
### Implementation details
- `backfillResolvedPathsAsync` now accepts a `backfillHours` parameter
and filters by `tx.FirstSeen`
- `NeighborGraph.PruneOlderThan(cutoff)` removes stale edges from the
in-memory graph
- `PruneNeighborEdges(conn, graph, maxAgeDays)` prunes both DB and
in-memory graph
- Periodic pruning ticker follows the same pattern as metrics pruning
(24h interval, staggered start)
- Graceful shutdown stops the edge prune ticker
### Config example
Both knobs added to `config.example.json` with `_comment` fields.
## Tests
- Config default/override tests for both knobs
- `TestGraphPruneOlderThan` — in-memory edge pruning
- `TestPruneNeighborEdgesDB` — SQLite + in-memory pruning together
- `TestBackfillRespectsHourWindow` — verifies old transmissions are
excluded by backfill window
---------
Co-authored-by: you <you@example.com>
## Summary
Several features and fixes from a live deployment of the Go v3.0.0
backend.
### geo_filter — full enforcement
- **Go backend config** (`cmd/server/config.go`,
`cmd/ingestor/config.go`): added `GeoFilterConfig` struct so
`geo_filter.polygon` and `bufferKm` from `config.json` are parsed by
both the server and ingestor
- **Ingestor** (`cmd/ingestor/geo_filter.go`, `cmd/ingestor/main.go`):
ADVERT packets from nodes outside the configured polygon + buffer are
dropped *before* any DB write — no transmission, node, or observation
data is stored
- **Server API** (`cmd/server/geo_filter.go`, `cmd/server/routes.go`):
`GET /api/config/geo-filter` endpoint returns the polygon + bufferKm to
the frontend; `/api/nodes` responses filter out any out-of-area nodes
already in the DB
- **Frontend** (`public/map.js`, `public/live.js`): blue polygon overlay
(solid inner + dashed buffer zone) on Map and Live pages, toggled via
"Mesh live area" checkbox, state shared via localStorage
### Automatic DB pruning
- Add `retention.packetDays` to `config.json` to delete transmissions +
observations older than N days on a daily schedule (1 min after startup,
then every 24h). Nodes and observers are never pruned.
- `POST /api/admin/prune?days=N` for manual runs (requires `X-API-Key`
header if `apiKey` is set)
```json
"retention": {
"nodeDays": 7,
"packetDays": 30
}
```
### tools/geofilter-builder.html
Standalone HTML tool (no server needed) — open in browser, click to
place polygon points on a Leaflet map, set `bufferKm`, copy the
generated `geo_filter` JSON block into `config.json`.
### scripts/prune-nodes-outside-geo-filter.py
Utility script to clean existing out-of-area nodes from the database
(dry-run + confirm). Useful after first enabling geo_filter on a
populated DB.
### HB column in packets table
Shows the hop hash size in bytes (1–4) decoded from the path byte of
each packet's raw hex. Displayed as **HB** between Size and Type
columns, hidden on small screens.
## Test plan
- [x] ADVERT from node outside polygon is not stored (no new row in
nodes or transmissions)
- [x] `GET /api/config/geo-filter` returns polygon + bufferKm when
configured, `{polygon: null, bufferKm: 0}` when not
- [x] `/api/nodes` excludes nodes outside polygon even if present in DB
- [x] Map and Live pages show blue polygon overlay when configured;
checkbox toggles it
- [x] `retention.packetDays: 30` deletes old transmissions/observations
on startup and daily
- [x] `POST /api/admin/prune?days=30` returns `{deleted: N, days: 30}`
- [x] `tools/geofilter-builder.html` opens standalone, draws polygon,
copies valid JSON
- [x] HB column shows 1–4 for all packets in grouped and flat view
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Change healthThresholds config from milliseconds to hours for readability.
Config keys: infraDegradedHours, infraSilentHours, nodeDegradedHours, nodeSilentHours.
Defaults: infra degraded 24h, silent 72h; node degraded 1h, silent 24h.
- Config stored in hours, converted to ms at comparison time
- /api/config/client sends ms to frontend (backward compatible)
- Frontend tooltips use dynamic thresholds instead of hardcoded strings
- Added healthThresholds section to config.example.json
- Updated Go and Node.js servers, tests
- Create inactive_nodes table with identical schema to nodes
- Add retention.nodeDays config (default 7) in Node.js and Go
- On startup: move nodes not seen in N days to inactive_nodes
- Daily timer (24h setInterval / goroutine ticker) repeats the move
- Log 'Moved X nodes to inactive_nodes (not seen in N days)'
- All existing queries unchanged — they only read nodes table
- Add 14 new tests for moveStaleNodes in test-db.js
- Both Node (db.js/server.js) and Go (ingestor/server) implemented
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add GET /api/config/theme endpoint serving branding, theme colors,
node colors, and home page content from config.json with sensible
defaults so unconfigured instances look identical to before.
Client-side (app.js):
- Fetch theme config on page load, before first render
- Override CSS variables from theme.* on document root
- Override ROLE_COLORS/ROLE_STYLE from nodeColors.*
- Replace nav brand text, logo, favicon from branding.*
- Store config in window.SITE_CONFIG for other pages
Home page (home.js):
- Hero title/subtitle from config.home
- Steps and checklist from config.home
- Footer links from config.home.footerLinks
- Chooser welcome text uses configured siteName
Config example updated with all available theme options.
No default appearance changes — all overrides are optional.
- New config.apiKey field — when set, POST endpoints require X-Api-Key header
- If apiKey not configured, endpoints remain open (dev/local mode)
- GET endpoints and /api/decode (read-only) remain public
- Closes the packet injection attack surface
- channel-rainbow.json: 592 pre-computed SHA256-derived keys for common
channel names (cities, topics, ham radio, emergency, etc.)
- server.js: Load rainbow table at startup as lowest-priority key source
- config.example.json: Add #LongFast to hashChannels list
Key derivation verified against MeshCore source: SHA256('#name')[:16bytes].
Rainbow table boosted decryption from ~48% to ~88% in testing.
Channel keys for #test, #sf, #wardrive, #yo, #bot, #queer, #bookclub, #shtf
are all SHA256(channelName).slice(0,32) — no need to hardcode them. Move to
hashChannels array for auto-derivation. Only the MeshCore default public key
(8b3387e9c5cdea6ac9e5edbaa115cd72) needs explicit specification since it's
not derived from its channel name.
Adds mapDefaults config option with center and zoom properties.
New /api/config/map endpoint serves the defaults. live.js and map.js
fetch the config with fallback to hardcoded Bay Area defaults.
FixesKpa-clawbot/meshcore-analyzer#115
- mqttSources[].topics is now an array of topic patterns
- mqttSources[].iataFilter optionally restricts to specific regions
- meshcore/<region>/<id>/status topic parsed for observer metadata:
name, model, firmware, client_version, radio, battery, uptime, noise_floor
- New observer columns with auto-migration for existing DBs
- Status updates don't inflate packet_count (separate updateObserverStatus)
config.mqttSources array allows multiple MQTT brokers with independent
topics, credentials, and TLS settings. Legacy config.mqtt still works
for backward compatibility.
- PacketStore loads all packets into memory on startup (~11MB for 27K packets)
- Indexed by id, hash, observer, and node pubkey for fast lookups
- /api/packets, /api/packets/timestamps, /api/packets/:id all served from RAM
- MQTT ingest writes to both RAM + SQLite
- Configurable maxMemoryMB (default 1024MB) in config.json packetStore section
- groupByHash queries computed in-memory
- Packet store stats exposed in /api/perf
- Expected: /api/packets goes from 77ms to <1ms
All cache TTLs now read from config.json cacheTTL section (seconds).
Client fetches config on load via GET /api/config/cache.
config.example.json updated with defaults.
Edit config.json, restart server — no code changes needed to tweak TTLs.