## Why
Diagnostic on #1129 shows PR #1117 (group commit M1 for #1115) is
fundamentally broken: it starves the MQTT goroutine via `gcMu` lock
contention, causing pingresp disconnects and lost packets at modest
ingest rates.
## Three structural defects
1. **Lock held across `sql.Stmt.Exec`** — every concurrent
`InsertTransmission` blocks for the full SQLite write latency, not just
the brief queue mutation.
2. **Lock held across `tx.Commit`** — the WAL fsync runs *under* `gcMu`,
so any backlog blocks all ingest writers AND the flusher ticker,
snowballing under load.
3. **Single-conn DB** (`MaxOpenConns=1`) — the flusher and the ingest
path serialise on one connection, turning the lock into a global ingest
stall.
Net effect: at modest packet rates the MQTT client loop misses its own
pingresp deadline, the broker drops the connection, and packets received
during the stall are lost.
## What this PR removes
- `Store.SetGroupCommit`, `Store.FlushGroupTx`, `Store.flushLocked`,
`Store.GroupCommitMs`
- `gcMu`, `activeTx`, `pendingRows`, `groupCommitMs`,
`groupCommitMaxRows` Store fields
- `groupCommitMs` / `groupCommitMaxRows` config fields and
`GroupCommitMsOrDefault` / `GroupCommitMaxRowsOrDefault` accessors
- The flusher goroutine in `cmd/ingestor/main.go`
- `cmd/ingestor/group_commit_test.go`
- The `if s.activeTx != nil { … pendingRows … }` branch in
`InsertTransmission` — reverts to plain prepared-stmt usage
## What this PR keeps (merged after #1117)
- #1119 `BackfillPathJSON` `path_json='[]'` fix
- #1120/#1123 perf metrics endpoints — `WALCommits` counter retained
- `GroupCommitFlushes` JSON field on `/api/perf/write-sources` is kept
as always-0 for API stability (server `perf_io.go` references it as a
string field name; no client breakage)
- `DBStats.GroupCommitFlushes` atomic field is removed from the Go
struct
## Tests
`cd cmd/ingestor && go test ./... -run "Test"` → `ok` (47.8s).
`cd cmd/server && go build ./...` → clean.
## #1115 stays open
The group-commit *idea* is sound — batching observation INSERTs would
meaningfully reduce WAL fsync rate. But it needs a redesign that does
**not** hold a mutex across blocking SQLite calls. Suggested directions
for a future M1:
- Channel-fed writer goroutine (single owner of the tx, ingest path is
non-blocking enqueue)
- Per-batch DB handle so the flusher doesn't serialise the ingest
connection
- Bounded queue with backpressure rather than a shared lock
Refs #1117#1129
## Summary
Implements **M1 from #1115**: batches observation/transmission INSERTs
into a single SQLite `BEGIN/COMMIT` window instead of fsyncing per
packet. At ~250 obs/sec this drops WAL fsync rate from ~20/s to ~1/s and
eliminates the `obs-persist skipped` / `SQLITE_BUSY` log spam that the
issue documents.
This is a **partial fix** — it ships the group-commit mechanism.
Acceptance items 6–7 (measured fsync rate / measured `obs-persist
skipped` rate at staging steady-state) require post-deploy observation,
and M2 (per-`tx_hash` observation buffering) is intentionally deferred.
The issue stays open for the user to verify on staging.
> Partial fix for #1115 — does not auto-close. Refs #1115.
## Mechanism
- `Store` gains an active `*sql.Tx`, `pendingRows` counter, `gcMu`, and
the `groupCommitMs` / `groupCommitMaxRows` knobs. `SetGroupCommit(ms,
maxRows)` enables the mode; `FlushGroupTx()` commits the in-flight tx.
- `InsertTransmission` lazily opens a tx on the first call after each
flush, then issues all writes through `tx.Stmt()` bindings of the
existing prepared statements. With `MaxOpenConns(1)` the connection is
already serialized; `gcMu` serializes group-commit state without
contention.
- A goroutine in `cmd/ingestor/main.go` calls `FlushGroupTx()` every
`groupCommitMs` ms. `pendingRows >= groupCommitMaxRows` triggers an
eager flush. `Close()` flushes before the WAL checkpoint so no rows are
lost on graceful shutdown.
- `groupCommitMs == 0` short-circuits to the legacy per-call auto-commit
path (statements bound to `s.db`, no tx) — current behavior preserved
byte-for-byte for operators who opt out.
## Config
Two new optional fields (ingestor-only), both documented in
`config.example.json`:
| Field | Default | Effect |
|---|---|---|
| `groupCommitMs` | `1000` | Flush window in ms. `0` disables batching
(legacy per-packet auto-commit). |
| `groupCommitMaxRows` | `1000` | Safety cap; when exceeded the queue
flushes immediately to bound memory and the crash-loss window. |
No DB schema change. No required config change on upgrade.
## Tests (TDD red → green visible in commits)
`cmd/ingestor/group_commit_test.go` — three assertions, written first as
the red commit:
- `TestGroupCommit_BatchesInsertsIntoOneTx` — 50 `InsertTransmission`
calls inside a wide window produce **0** commits until `FlushGroupTx`,
then exactly **1**; all 50 rows visible after flush. (This is the spec's
"50 observations → 1 SQLite write transaction" assertion.)
- `TestGroupCommit_Disabled` — `groupCommitMs=0` keeps every insert
immediately visible and `GroupCommitFlushes` never advances. (Spec's
"groupCommitMs=0 reverts to per-packet behavior" assertion.)
- `TestGroupCommit_MaxRowsForcesEarlyFlush` — cap=3, 7 inserts → 2
auto-flushes from the cap + 1 final manual flush = 3 total.
Red commit: `e2b0370` (stubs `SetGroupCommit` / `FlushGroupTx` so the
tests compile and fail on **assertions**, not import errors).
Green commit: `73f3559`.
Full ingestor suite (`go test ./...` in `cmd/ingestor`) stays green, ~49
s.
## Performance
This PR is the perf change itself. Local micro-test (the new
`TestGroupCommit_BatchesInsertsIntoOneTx`) shows the structural
property: 50 inserts → 1 commit. The fsync-rate measurement called out
in the M1 acceptance criteria (`~20/s → ~1/s` at 250 obs/sec) requires
staging deployment to confirm — that's the remaining open item that
keeps #1115 open after this merges.
No hot-path regressions: when `groupCommitMs > 0` we acquire one mutex
per insert (uncontended in the steady state — the connection was already
single-threaded via `MaxOpenConns(1)`). When `groupCommitMs == 0` the
code path is identical to before plus one nil-tx check.
## What this PR does NOT do (per spec)
- Does not collapse "30 observations of one packet" into 1 row write —
that's M2.
- Does not eliminate dual-writer contention with `cmd/server`'s
`resolved_path` writes.
- Does not change observation ordering or live broadcast latency.
---------
Co-authored-by: corescope-bot <bot@corescope.local>
## Summary
**Partial fix for #730 (M1 only — M2 frontend and M3 alerting
deferred).**
Today the ingestor **silently drops** ADVERTs whose GPS lies outside the
configured `geo_filter` polygon. That's the wrong default for an
analytics tool — operators get zero visibility into bridged or leaked
meshes.
This PR makes the new default **flag, don't drop**: foreign adverts are
stored, the node row is tagged `foreign_advert=1`, and the API surfaces
`"foreign": true` so dashboards / map overlays can be built on top.
## Behavior
| Mode | What happens to an ADVERT outside `geo_filter` |
|---|---|
| (default) flag | Stored, marked `foreign_advert=1`, exposed via API |
| drop (legacy) | Silently dropped (preserves old behavior for ops who
want it) |
## What's done (M1 — Backend)
- ingestor stores foreign adverts instead of dropping
- `nodes.foreign_advert` column added (migration)
- `/api/nodes` and `/api/nodes/{pk}` expose `foreign: true` field
- Config: `geofilter.action: "flag"|"drop"` (default `flag`)
- Tests + config docs
## What's NOT done (deferred to M2 + M3)
- **M2 — Frontend:** Map overlay showing foreign adverts as distinct
markers, foreign-advert filter on packets/nodes pages, dedicated
foreign-advert dashboard
- **M3 — Alerting:** Time-series detection of bridging events, alert
when foreign advert rate spikes, identify bridge entry-point nodes
Issue #730 remains open for M2 and M3.
---------
Co-authored-by: corescope-bot <bot@corescope>
## Summary
Closes#663 (Phase 2 + 3 partial — time-series tracking + thresholds for
nodes that are also observers).
Adds a per-node battery voltage trend chart and
`/api/nodes/{pubkey}/battery` endpoint, sourced from the existing
`observer_metrics.battery_mv` samples populated by observer status
messages. No new ingest or schema changes — purely surfaces data we were
already collecting.
## Scope (TDD red→green)
**RED commit:** test(node-battery) — DB query, endpoint shape
(200/404/no-data), and config getters all asserted.
**GREEN commit:** feat(node-battery) — implementation only.
## Changes
### Backend
- `cmd/server/node_battery.go` (new):
- `DB.GetNodeBatteryHistory(pubkey, since)` — pulls `(timestamp,
battery_mv)` rows from `observer_metrics WHERE LOWER(observer_id) =
LOWER(public_key) AND battery_mv IS NOT NULL`. Case-insensitive join
tolerates historical pubkey casing variation (observers persist
uppercase, nodes lowercase in this DB).
- `Server.handleNodeBattery` — `GET /api/nodes/{pubkey}/battery?days=N`
(default 7, max 365). Returns `{public_key, days, samples[], latest_mv,
latest_ts, status, thresholds}`.
- `Config.LowBatteryMv()` / `CriticalBatteryMv()` — defaults 3300 / 3000
mV.
- `cmd/server/config.go` — `BatteryThresholds *BatteryThresholdsConfig`
field.
- `cmd/server/routes.go` — route registration alongside existing
`/health`, `/analytics`.
### Frontend
- `public/node-analytics.js` — new "Battery Voltage" chart card with
status badge (🔋 OK / ⚠️ Low / 🪫 Critical / No data). Renders dashed
threshold lines at `lowMv` and `criticalMv`. Empty-state message when no
samples in window.
### Config
- `config.example.json` — `batteryThresholds: { lowMv: 3300, criticalMv:
3000 }` with `_comment` per Config Documentation Rule.
## Status semantics
| latest_mv | status |
|-----------------------|------------|
| no samples in window | `unknown` |
| `>= lowMv` | `ok` |
| `< lowMv`, `>= critMv`| `low` |
| `< criticalMv` | `critical` |
## What this PR does NOT do (deferred)
The issue's full Phase 1 (writing decoded sensor advert telemetry into
`nodes.battery_mv` / `temperature_c` from server-side decoder) and Phase
4 (firmware/active polling for repeaters without observers) are out of
scope here. This PR delivers the requested Phase 2/3 surfacing for the
data path that already lands rows: `observer_metrics`. Repeaters that
are also observers (i.e. publish status to MQTT) will get a voltage
trend immediately; pure passive nodes won't until Phase 1 lands.
## Tests
- `TestGetNodeBatteryHistory_FromObserverMetrics` — case-insensitive
join, NULL skipping, ordering.
- `TestNodeBatteryEndpoint` — full happy path with thresholds + status.
- `TestNodeBatteryEndpoint_NoData` — 200 + status=unknown.
- `TestNodeBatteryEndpoint_404` — unknown node.
- `TestBatteryThresholds_ConfigOverride` — config getters + defaults.
`cd cmd/server && go test ./...` — green.
## Performance
Endpoint is per-pubkey (called once on analytics page open), indexed by
`(observer_id, timestamp)` PK on `observer_metrics`. No hot-path impact.
---------
Co-authored-by: bot <bot@corescope>
## Summary
Implements repeater liveness detection per #662 — distinguishes a
repeater that is **actively relaying traffic** from one that is **alive
but idle** (only sending its own adverts).
## Approach
The backend already maintains a `byPathHop` index keyed by lowercase
hop/pubkey for every transmission. Decode-window writes also key it by
**resolved pubkey** for relay hops. We just weren't surfacing it.
`GetRepeaterRelayInfo(pubkey, windowHours)`:
- Reads `byPathHop[pubkey]`.
- Skips packets whose `payload_type == 4` (advert) — a self-advert
proves liveness, not relaying.
- Returns the most recent `FirstSeen` as `lastRelayed`, plus
`relayActive` (within window) and the `windowHours` actually used.
## Three states (per issue)
| State | Indicator | Condition |
|---|---|---|
| 🟢 Relaying | green | `last_relayed` within `relayActiveHours` |
| 🟡 Alive (idle) | yellow | repeater is in the DB but
`relay_active=false` (no recent path-hop appearance, or none ever) |
| ⚪ Stale | existing | falls out of the existing `getNodeStatus` logic |
## API
- `GET /api/nodes` — repeater/room rows now include `last_relayed`
(omitted if never observed) and `relay_active`.
- `GET /api/nodes/{pubkey}` — same fields plus `relay_window_hours`.
## Config
New optional field under `healthThresholds`:
```json
"healthThresholds": {
...,
"relayActiveHours": 24
}
```
Default 24h. Documented in `config.example.json`.
## Frontend
Node detail page gains a **Last Relayed** row for repeaters/rooms with
the 🟢/🟡 state badge. Tooltip explains the distinction from "Last Heard".
## TDD
- **Red commit** `4445f91`: `repeater_liveness_test.go` + stub
`GetRepeaterRelayInfo` returning zero. Active and Stale tests fail on
assertion (LastRelayed empty / mismatched). Idle and IgnoresAdverts
already match the desired behavior under the stub. Compiles, runs, fails
on assertions — not on imports.
- **Green commit** `5fcfb57`: Implementation. All four tests pass. Full
`cmd/server` suite green (~22s).
## Performance
`O(N)` over `byPathHop[pubkey]` per call. The index is bounded by store
eviction; a single repeater has at most a few hundred entries on real
data. The `/api/nodes` loop adds one map read + scan per repeater row —
negligible against the existing enrichment work.
## Limitations (per issue body)
1. Observer coverage gaps — if no observer hears a repeater's relay,
it'll show as idle even when actively relaying. This is inherent to
passive observation.
2. Low-traffic networks — a repeater in a quiet area legitimately shows
idle. The 🟡 indicator copy makes that explicit ("alive (idle)").
3. Hash collisions are mitigated by the existing `resolveWithContext`
path before pubkeys land in `byPathHop`.
Fixes#662
---------
Co-authored-by: clawbot <bot@corescope.local>
## Summary
Fixes#770 — selecting "All" in the region filter dropdown produced an
empty channel list.
## Root cause
`normalizeRegionCodes` (cmd/server/db.go) treated any non-empty input as
a literal IATA code. The frontend region filter labels its catch-all
option **"All"**; while `region-filter.js` normally sends an empty
string when "All" is selected, any code path that ends up sending
`?region=All` (deep-link URLs, manual queries, future callers) caused
the function to return `["ALL"]`. Downstream queries then filtered
observers for `iata = 'ALL'`, which never matches anything → empty
response.
## Fix
`normalizeRegionCodes` now treats `All` / `ALL` / `all`
(case-insensitive, with optional whitespace, mixed in CSV) as equivalent
to an empty value, returning `nil` to signal "no filter". Real IATA
codes (`SJC`, `PDX`, `sjc,PDX` → `[SJC PDX]`) still pass through
unchanged.
This is a defensive server-side fix: a single chokepoint that all
region-aware endpoints already flow through (channels, packets,
analytics, encrypted channels, observer ID resolution).
## Documentation
Expanded `_comment_regions` in `config.example.json` to explain:
- How IATA codes are resolved (payload > topic > source config — set in
#1012)
- What the `regions` map controls (display labels) vs runtime-discovered
codes
- That observers without an IATA tag only appear under "All Regions"
- That the `All` sentinel is server-side safe
## TDD
- **Red commit** (`4f65bf4`): `cmd/server/region_filter_test.go` —
`TestNormalizeRegionCodes_AllIsNoFilter` asserts `All` / `ALL` / `all` /
`""` / `"All,"` all collapse to `nil`. Compiles, runs, fails on
assertion (`got [ALL], want nil`). Companion test
`TestNormalizeRegionCodes_RealCodesPreserved` locks in that `sjc,PDX`
still returns `[SJC PDX]`.
- **Green commit** (`c9fb965`): two-line change in
`normalizeRegionCodes` + docs update.
## Verification
```
$ go test -run TestNormalizeRegionCodes -count=1 ./cmd/server
ok github.com/corescope/server 0.023s
$ go test -count=1 ./cmd/server
ok github.com/corescope/server 21.454s
```
Full suite green; no existing region tests regressed.
Fixes#770
---------
Co-authored-by: Kpa-clawbot <bot@corescope>
## Issue
Closes#819
## Summary
Adds Save Draft / Load Draft / Download buttons to
`/geofilter-builder.html` so operators can:
- Persist their work-in-progress polygon across sessions (localStorage)
- Reload it later to continue editing
- Download a ready-to-paste `geo_filter` JSON snippet for `config.json`
## Implementation
- New module `public/geofilter-draft.js` exposes `GeofilterDraft` global
with `saveDraft / loadDraft / clearDraft / buildConfigSnippet /
downloadConfig`.
- Builder HTML wires three new buttons; updates the help text to
document the new flow.
## TDD
- Red commit: `b0a1a4c` (tests fail — module doesn't exist)
- Green commit: `a717f33` (implementation added, all tests pass)
## How to test
1. Open `/geofilter-builder.html`
2. Click 3+ points on the map
3. Click "Save Draft" — reload page — click "Load Draft" → polygon
restored
4. Click "Download" → `geofilter-config-snippet.json` downloaded with
correct format
---
E2E assertion added: test-e2e-playwright.js:2264
---------
Co-authored-by: you <you@example.com>
Co-authored-by: openclaw-bot <bot@openclaw.local>
## Summary
Add optional `region` field to MQTT source config and JSON payload,
enabling publishers to explicitly provide region data without relying
solely on topic path structure.
## Changes
- **`MQTTSource.Region`** — new optional config field. When set, acts as
default region for all messages from that source (useful when a broker
serves a single region).
- **`MQTTPacketMessage.Region`** — new optional JSON payload field.
Publishers can include `"region": "PDX"` in their MQTT messages.
- **`PacketData.Region`** — carries the resolved region through to
storage.
- **Priority resolution**: payload `region` > topic-derived region >
source config `region`
- Observer IATA is updated with the effective region on every packet.
## Config example
```json
{
"mqttSources": [
{
"name": "cascadia",
"broker": "tcp://cascadia-broker:1883",
"topics": ["meshcore/#"],
"region": "PDX"
}
]
}
```
## Payload example
```json
{"raw": "0a1b2c...", "SNR": 5.2, "region": "PDX"}
```
## TDD
- Red commit: `980304c` (tests fail at compile — fields don't exist)
- Green commit: `4caf88b` (implementation, all tests pass)
## Unblocks
- #804, #770, #730 (all depend on region being available on
observations)
Fixes#788
---------
Co-authored-by: you <you@example.com>
## Summary
Adds a global `observerIATAWhitelist` config field that restricts which
observer IATA regions are processed by the ingestor.
## Problem
Operators running regional instances (e.g., Sweden) want to ensure only
observers physically in their region contribute data. The existing
per-source `iataFilter` only filters packet messages but still allows
status messages through, meaning observers from other regions appear in
the database.
## Solution
New top-level config field `observerIATAWhitelist`:
- When non-empty, **all** messages (status + packets) from observers
outside the whitelist are silently dropped
- Case-insensitive matching
- Empty list = all regions allowed (fully backwards compatible)
- Lazy O(1) lookup via cached uppercase set (same pattern as
`observerBlacklist`)
### Config example
```json
{
"observerIATAWhitelist": ["ARN", "GOT"]
}
```
## TDD
- **Red commit:** `f19c2b2` — tests for `ObserverIATAWhitelist` field
and `IsObserverIATAAllowed` method (build fails)
- **Green commit:** `782f516` — implementation + integration test
## Files changed
- `cmd/ingestor/config.go` — new field, new method
`IsObserverIATAAllowed`
- `cmd/ingestor/main.go` — whitelist check in `handleMessage` before
status processing
- `cmd/ingestor/config_test.go` — unit tests for config parsing and
matching
- `cmd/ingestor/main_test.go` — integration test for handleMessage
filtering
Fixes#914
---------
Co-authored-by: you <you@example.com>
## Summary
Per-source MQTT connect timeout, correctly targeting the `WaitTimeout`
startup gate (#931).
## What changed
- Added `connectTimeoutSec` field to `MQTTSource` struct (per-source,
not global) — `config.go:24`
- Added `ConnectTimeoutOrDefault()` helper returning configured value or
30 (default from #926) — `config.go:29`
- Replaced hardcoded `WaitTimeout(30 * time.Second)` with
`WaitTimeout(time.Duration(connectTimeout) * time.Second)` —
`main.go:173`
- Updated `config.example.json` with field at source level
- Unit tests for default (30) and custom values
## Why this supersedes #976
PR #976 made paho's `SetConnectTimeout` (per-TCP-dial, was 10s)
configurable via a **global** `mqttConnectTimeoutSeconds` field. Issue
#931 explicitly references the **30s timeout** — which is
`WaitTimeout(30s)`, the startup gate from #926. It also requests
**per-source** config, not global.
This PR targets the correct timeout at the correct granularity.
## Live verification (Rule 18)
Two sources pointed at unreachable brokers:
- `fast` (`connectTimeoutSec: 5`): timed out in 5s ✅
- `default` (unset): timed out in 30s ✅
```
19:00:35 MQTT [fast] connect timeout: 5s
19:00:40 MQTT [fast] initial connection timed out — retrying in background
19:00:40 MQTT [default] connect timeout: 30s
19:01:10 MQTT [default] initial connection timed out — retrying in background
```
Closes#931
Supersedes #976
Co-authored-by: you <you@example.com>
## Problem
`Load()` loaded all transmissions from the DB regardless of
`retentionHours`, so `buildSubpathIndex()` processed the full DB history
on every startup. On a DB with ~280K paths this produces ~13.5M subpath
index entries, OOM-killing the process before it ever starts listening —
causing a supervisord crash loop with no useful error message.
## Fix
Apply the same `retentionHours` cutoff to `Load()`'s SQL that
`EvictStale()` already uses at runtime. Both conditions
(`retentionHours` window and `maxPackets` cap) are combined with AND so
neither safety limit is bypassed.
Startup now builds indexes only over the retention window, making
startup time and memory proportional to recent activity rather than
total DB history.
## Docs
- `config.example.json`: adds `retentionHours` to the `packetStore`
block with recommended value `168` (7 days) and a warning about `0` on
large DBs
- `docs/user-guide/configuration.md`: documents the field and adds an
explicit OOM warning
## Test plan
- [x] `cd cmd/server && go test ./... -run TestRetentionLoad` — covers
the retention-filtered load: verifies packets outside the window are
excluded, and that `retentionHours: 0` still loads everything
- [x] Deploy on an instance with a large DB (>100K paths) and
`retentionHours: 168` — server reaches "listening" in seconds instead of
OOM-crashing
- [x] Verify `config.example.json` has `retentionHours: 168` in the
`packetStore` block
- [x] Verify `docs/user-guide/configuration.md` documents the field and
warning
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Kpa-clawbot <kpaclawbot@outlook.com>
Closes#919
## Summary
Enables SQLite incremental auto-vacuum so the database file actually
shrinks after retention reaper deletes old data. Previously, `DELETE`
operations freed pages internally but never returned disk space to the
OS.
## Changes
### 1. Auto-vacuum on new databases
- `PRAGMA auto_vacuum = INCREMENTAL` set via DSN pragma before
`journal_mode(WAL)` in the ingestor's `OpenStoreWithInterval`
- Must be set before any tables are created; DSN ordering ensures this
### 2. Post-reaper incremental vacuum
- `PRAGMA incremental_vacuum(N)` runs after every retention reaper cycle
(packets, metrics, observers, neighbor edges)
- N defaults to 1024 pages, configurable via `db.incrementalVacuumPages`
- Noop on `auto_vacuum=NONE` databases (safe before migration)
- Added to both server and ingestor
### 3. Opt-in full VACUUM for existing databases
- Startup check logs a clear warning if `auto_vacuum != INCREMENTAL`
- `db.vacuumOnStartup: true` config triggers one-time `PRAGMA
auto_vacuum = INCREMENTAL; VACUUM`
- Logs start/end time for operator visibility
### 4. Documentation
- `docs/user-guide/configuration.md`: retention section notes that
lowering retention doesn't immediately shrink the DB
- `docs/user-guide/database.md`: new guide covering WAL, auto-vacuum,
migration, manual VACUUM
### 5. Tests
- `TestNewDBHasIncrementalAutoVacuum` — fresh DB gets `auto_vacuum=2`
- `TestExistingDBHasAutoVacuumNone` — old DB stays at `auto_vacuum=0`
- `TestVacuumOnStartupMigratesDB` — full VACUUM sets `auto_vacuum=2`
- `TestIncrementalVacuumReducesFreelist` — DELETE + vacuum shrinks
freelist
- `TestCheckAutoVacuumLogs` — handles both modes without panic
- `TestConfigIncrementalVacuumPages` — config defaults and overrides
## Migration path for existing databases
1. On startup, CoreScope logs: `[db] auto_vacuum=NONE — DB needs
one-time VACUUM...`
2. Set `db.vacuumOnStartup: true` in config.json
3. Restart — VACUUM runs (blocks startup, minutes on large DBs)
4. Remove `vacuumOnStartup` after migration
## Test results
```
ok github.com/corescope/server 19.448s
ok github.com/corescope/ingestor 30.682s
```
---------
Co-authored-by: you <you@example.com>
## Problem
Some mesh participants set offensive names, report deliberately false
GPS positions, or otherwise troll the network. Instance operators
currently have no way to hide these nodes from public-facing APIs
without deleting the underlying data.
## Solution
Add a `nodeBlacklist` array to `config.json` containing public keys of
nodes to exclude from all API responses.
### Blacklisted nodes are filtered from:
- `GET /api/nodes` — list endpoint
- `GET /api/nodes/search` — search results
- `GET /api/nodes/{pubkey}` — detail (returns 404)
- `GET /api/nodes/{pubkey}/health` — returns 404
- `GET /api/nodes/{pubkey}/paths` — returns 404
- `GET /api/nodes/{pubkey}/analytics` — returns 404
- `GET /api/nodes/{pubkey}/neighbors` — returns 404
- `GET /api/nodes/bulk-health` — filtered from results
### Config example
```json
{
"nodeBlacklist": [
"aabbccdd...",
"11223344..."
]
}
```
### Design decisions
- **Case-insensitive** — public keys normalized to lowercase
- **Whitespace trimming** — leading/trailing whitespace handled
- **Empty entries ignored** — `""` or `" "` do not cause false positives
- **Nil-safe** — `IsBlacklisted()` on nil Config returns false
- **Backward-compatible** — empty/missing `nodeBlacklist` has zero
effect
- **Lazy-cached set** — blacklist converted to `map[string]bool` on
first lookup
### What this does NOT do (intentionally)
- Does **not** delete or modify database data — only filters API
responses
- Does **not** block packet ingestion — data still flows for analytics
- Does **not** filter `/api/packets` — only node-facing endpoints are
affected
## Testing
- Unit tests for `Config.IsBlacklisted()` (case sensitivity, whitespace,
empty entries, nil config)
- Integration tests for `/api/nodes`, `/api/nodes/{pubkey}`,
`/api/nodes/search`
- Full test suite passes with no regressions
## Summary
Observers that stop actively sending data now get removed after a
configurable retention period (default 14 days).
Previously, observers remained in the `observers` table forever. This
meant nodes that were once observers for an instance but are no longer
connected (even if still active in the mesh elsewhere) would continue
appearing in the observer list indefinitely.
## Key Design Decisions
- **Active data requirement**: `last_seen` is only updated when the
observer itself sends packets (via `stmtUpdateObserverLastSeen`). Being
seen by another node does NOT update this field. So an observer must
actively send data to stay listed.
- **Default: 14 days** — observers not seen in 14 days are removed
- **`-1` = keep forever** — for users who want observers to never be
removed
- **`0` = use default (14 days)** — same as not setting the field
- **Runs on startup + daily ticker** — staggered 3 minutes after metrics
prune to avoid DB contention
## Changes
| File | Change |
|------|--------|
| `cmd/ingestor/config.go` | Add `ObserverDays` to `RetentionConfig`,
add `ObserverDaysOrDefault()` |
| `cmd/ingestor/db.go` | Add `RemoveStaleObservers()` — deletes
observers with `last_seen` before cutoff |
| `cmd/ingestor/main.go` | Wire up startup + daily ticker for observer
retention |
| `cmd/server/config.go` | Add `ObserverDays` to `RetentionConfig`, add
`ObserverDaysOrDefault()` |
| `cmd/server/db.go` | Add `RemoveStaleObservers()` (server-side, uses
read-write connection) |
| `cmd/server/main.go` | Wire up startup + daily ticker, shutdown
cleanup |
| `cmd/server/routes.go` | Admin prune API now also removes stale
observers |
| `config.example.json` | Add `observerDays: 14` with documentation |
| `cmd/ingestor/coverage_boost_test.go` | 4 tests: basic removal, empty
store, keep forever (-1), default (0→14) |
| `cmd/server/config_test.go` | 4 tests: `ObserverDaysOrDefault` edge
cases |
## Config Example
```json
{
"retention": {
"nodeDays": 7,
"observerDays": 14,
"packetDays": 30,
"_comment": "observerDays: -1 = keep forever, 0 = use default (14)"
}
}
```
## Admin API
The `/api/admin/prune` endpoint now also removes stale observers (using
`observerDays` from config) and reports `observers_removed` in the
response alongside `packets_deleted`.
## Test Plan
- [x] `TestRemoveStaleObservers` — old observer removed, recent observer
kept
- [x] `TestRemoveStaleObserversNone` — empty store, no errors
- [x] `TestRemoveStaleObserversKeepForever` — `-1` keeps even year-old
observers
- [x] `TestRemoveStaleObserversDefault` — `0` defaults to 14 days
- [x] `TestObserverDaysOrDefault` (ingestor) —
nil/zero/positive/keep-forever
- [x] `TestObserverDaysOrDefault` (server) —
nil/zero/positive/keep-forever
- [x] Both binaries compile cleanly (`go build`)
- [ ] Manual: verify observer count decreases after retention period on
a live instance
## Summary
- Add missing `geo_filter` block to `config.example.json` with polygon
example, `bufferKm`, and inline `_comment`
- Add `docs/user-guide/geofilter.md`: full operator guide covering
config schema, GeoFilter Builder workflow, and prune script as one-time
migration tool
- Add Geographic filtering section to `docs/user-guide/configuration.md`
with link to the full guide
Closes#669 (M1: documentation)
## Test plan
- [x] `config.example.json` parses cleanly (no JSON errors)
- [x] `docs/user-guide/geofilter.md` renders correctly in GitHub preview
- [x] Link from `configuration.md` to `geofilter.md` resolves
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
Adds two config knobs for controlling backfill scope and neighbor graph
data retention, plus removes the dead synchronous backfill function.
## Changes
### Config knobs
#### `resolvedPath.backfillHours` (default: 24)
Controls how far back (in hours) the async backfill scans for
observations with NULL `resolved_path`. Transmissions with `first_seen`
older than this window are skipped, reducing startup time for instances
with large historical datasets.
#### `neighborGraph.maxAgeDays` (default: 30)
Controls the maximum age of `neighbor_edges` entries. Edges with
`last_seen` older than this are pruned from both SQLite and the
in-memory graph. Pruning runs on startup (after a 4-minute stagger) and
every 24 hours thereafter.
### Dead code removal
- Removed the synchronous `backfillResolvedPaths` function that was
replaced by the async version.
### Implementation details
- `backfillResolvedPathsAsync` now accepts a `backfillHours` parameter
and filters by `tx.FirstSeen`
- `NeighborGraph.PruneOlderThan(cutoff)` removes stale edges from the
in-memory graph
- `PruneNeighborEdges(conn, graph, maxAgeDays)` prunes both DB and
in-memory graph
- Periodic pruning ticker follows the same pattern as metrics pruning
(24h interval, staggered start)
- Graceful shutdown stops the edge prune ticker
### Config example
Both knobs added to `config.example.json` with `_comment` fields.
## Tests
- Config default/override tests for both knobs
- `TestGraphPruneOlderThan` — in-memory edge pruning
- `TestPruneNeighborEdgesDB` — SQLite + in-memory pruning together
- `TestBackfillRespectsHourWindow` — verifies old transmissions are
excluded by backfill window
---------
Co-authored-by: you <you@example.com>
## Summary
Several features and fixes from a live deployment of the Go v3.0.0
backend.
### geo_filter — full enforcement
- **Go backend config** (`cmd/server/config.go`,
`cmd/ingestor/config.go`): added `GeoFilterConfig` struct so
`geo_filter.polygon` and `bufferKm` from `config.json` are parsed by
both the server and ingestor
- **Ingestor** (`cmd/ingestor/geo_filter.go`, `cmd/ingestor/main.go`):
ADVERT packets from nodes outside the configured polygon + buffer are
dropped *before* any DB write — no transmission, node, or observation
data is stored
- **Server API** (`cmd/server/geo_filter.go`, `cmd/server/routes.go`):
`GET /api/config/geo-filter` endpoint returns the polygon + bufferKm to
the frontend; `/api/nodes` responses filter out any out-of-area nodes
already in the DB
- **Frontend** (`public/map.js`, `public/live.js`): blue polygon overlay
(solid inner + dashed buffer zone) on Map and Live pages, toggled via
"Mesh live area" checkbox, state shared via localStorage
### Automatic DB pruning
- Add `retention.packetDays` to `config.json` to delete transmissions +
observations older than N days on a daily schedule (1 min after startup,
then every 24h). Nodes and observers are never pruned.
- `POST /api/admin/prune?days=N` for manual runs (requires `X-API-Key`
header if `apiKey` is set)
```json
"retention": {
"nodeDays": 7,
"packetDays": 30
}
```
### tools/geofilter-builder.html
Standalone HTML tool (no server needed) — open in browser, click to
place polygon points on a Leaflet map, set `bufferKm`, copy the
generated `geo_filter` JSON block into `config.json`.
### scripts/prune-nodes-outside-geo-filter.py
Utility script to clean existing out-of-area nodes from the database
(dry-run + confirm). Useful after first enabling geo_filter on a
populated DB.
### HB column in packets table
Shows the hop hash size in bytes (1–4) decoded from the path byte of
each packet's raw hex. Displayed as **HB** between Size and Type
columns, hidden on small screens.
## Test plan
- [x] ADVERT from node outside polygon is not stored (no new row in
nodes or transmissions)
- [x] `GET /api/config/geo-filter` returns polygon + bufferKm when
configured, `{polygon: null, bufferKm: 0}` when not
- [x] `/api/nodes` excludes nodes outside polygon even if present in DB
- [x] Map and Live pages show blue polygon overlay when configured;
checkbox toggles it
- [x] `retention.packetDays: 30` deletes old transmissions/observations
on startup and daily
- [x] `POST /api/admin/prune?days=30` returns `{deleted: N, days: 30}`
- [x] `tools/geofilter-builder.html` opens standalone, draws polygon,
copies valid JSON
- [x] HB column shows 1–4 for all packets in grouped and flat view
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Change healthThresholds config from milliseconds to hours for readability.
Config keys: infraDegradedHours, infraSilentHours, nodeDegradedHours, nodeSilentHours.
Defaults: infra degraded 24h, silent 72h; node degraded 1h, silent 24h.
- Config stored in hours, converted to ms at comparison time
- /api/config/client sends ms to frontend (backward compatible)
- Frontend tooltips use dynamic thresholds instead of hardcoded strings
- Added healthThresholds section to config.example.json
- Updated Go and Node.js servers, tests
- Create inactive_nodes table with identical schema to nodes
- Add retention.nodeDays config (default 7) in Node.js and Go
- On startup: move nodes not seen in N days to inactive_nodes
- Daily timer (24h setInterval / goroutine ticker) repeats the move
- Log 'Moved X nodes to inactive_nodes (not seen in N days)'
- All existing queries unchanged — they only read nodes table
- Add 14 new tests for moveStaleNodes in test-db.js
- Both Node (db.js/server.js) and Go (ingestor/server) implemented
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add GET /api/config/theme endpoint serving branding, theme colors,
node colors, and home page content from config.json with sensible
defaults so unconfigured instances look identical to before.
Client-side (app.js):
- Fetch theme config on page load, before first render
- Override CSS variables from theme.* on document root
- Override ROLE_COLORS/ROLE_STYLE from nodeColors.*
- Replace nav brand text, logo, favicon from branding.*
- Store config in window.SITE_CONFIG for other pages
Home page (home.js):
- Hero title/subtitle from config.home
- Steps and checklist from config.home
- Footer links from config.home.footerLinks
- Chooser welcome text uses configured siteName
Config example updated with all available theme options.
No default appearance changes — all overrides are optional.
- New config.apiKey field — when set, POST endpoints require X-Api-Key header
- If apiKey not configured, endpoints remain open (dev/local mode)
- GET endpoints and /api/decode (read-only) remain public
- Closes the packet injection attack surface
- channel-rainbow.json: 592 pre-computed SHA256-derived keys for common
channel names (cities, topics, ham radio, emergency, etc.)
- server.js: Load rainbow table at startup as lowest-priority key source
- config.example.json: Add #LongFast to hashChannels list
Key derivation verified against MeshCore source: SHA256('#name')[:16bytes].
Rainbow table boosted decryption from ~48% to ~88% in testing.
Channel keys for #test, #sf, #wardrive, #yo, #bot, #queer, #bookclub, #shtf
are all SHA256(channelName).slice(0,32) — no need to hardcode them. Move to
hashChannels array for auto-derivation. Only the MeshCore default public key
(8b3387e9c5cdea6ac9e5edbaa115cd72) needs explicit specification since it's
not derived from its channel name.
Adds mapDefaults config option with center and zoom properties.
New /api/config/map endpoint serves the defaults. live.js and map.js
fetch the config with fallback to hardcoded Bay Area defaults.
FixesKpa-clawbot/meshcore-analyzer#115
- mqttSources[].topics is now an array of topic patterns
- mqttSources[].iataFilter optionally restricts to specific regions
- meshcore/<region>/<id>/status topic parsed for observer metadata:
name, model, firmware, client_version, radio, battery, uptime, noise_floor
- New observer columns with auto-migration for existing DBs
- Status updates don't inflate packet_count (separate updateObserverStatus)
config.mqttSources array allows multiple MQTT brokers with independent
topics, credentials, and TLS settings. Legacy config.mqtt still works
for backward compatibility.
- PacketStore loads all packets into memory on startup (~11MB for 27K packets)
- Indexed by id, hash, observer, and node pubkey for fast lookups
- /api/packets, /api/packets/timestamps, /api/packets/:id all served from RAM
- MQTT ingest writes to both RAM + SQLite
- Configurable maxMemoryMB (default 1024MB) in config.json packetStore section
- groupByHash queries computed in-memory
- Packet store stats exposed in /api/perf
- Expected: /api/packets goes from 77ms to <1ms
All cache TTLs now read from config.json cacheTTL section (seconds).
Client fetches config on load via GET /api/config/cache.
config.example.json updated with defaults.
Edit config.json, restart server — no code changes needed to tweak TTLs.