## RF Health Dashboard — M1: Observer Metrics Storage, API & Small
Multiples Grid
Implements M1 of #600.
### What this does
Adds a complete RF health monitoring pipeline: MQTT stats ingestion →
SQLite storage → REST API → interactive dashboard with small multiples
grid.
### Backend Changes
**Ingestor (`cmd/ingestor/`)**
- New `observer_metrics` table via migration system (`_migrations`
pattern)
- Parse `tx_air_secs`, `rx_air_secs`, `recv_errors` from MQTT status
messages (same pattern as existing `noise_floor` and `battery_mv`)
- `INSERT OR REPLACE` with timestamps rounded to nearest 5-min interval
boundary (using ingestor wall clock, not observer timestamps)
- Missing fields stored as NULLs — partial data is always better than no
data
- Configurable retention pruning: `retention.metricsDays` (default 30),
runs on startup + every 24h
**Server (`cmd/server/`)**
- `GET /api/observers/{id}/metrics?since=...&until=...` — per-observer
time-series data
- `GET /api/observers/metrics/summary?window=24h` — fleet summary with
current NF, avg/max NF, sample count
- `parseWindowDuration()` supports `1h`, `24h`, `3d`, `7d`, `30d` etc.
- Server-side metrics retention pruning (same config, staggered 2min
after packet prune)
### Frontend Changes
**RF Health tab (`public/analytics.js`, `public/style.css`)**
- Small multiples grid showing all observers simultaneously — anomalies
pop out visually
- Per-observer cell: name, current NF value, battery voltage, sparkline,
avg/max stats
- NF status coloring: warning (amber) at ≥-100 dBm, critical (red) at
≥-85 dBm — text color only, no background fills
- Click any cell → expanded detail view with full noise floor line chart
- Reference lines with direct text labels (`-100 warning`, `-85
critical`) — not color bands
- Min/max points labeled directly on the chart
- Time range selector: preset buttons (1h/3h/6h/12h/24h/3d/7d/30d) +
custom from/to datetime picker
- Deep linking: `#/analytics?tab=rf-health&observer=...&range=...`
- All charts use SVG, matching existing analytics.js patterns
- Responsive: 3-4 columns on desktop, 1 on mobile
### Design Decisions (from spec)
- Labels directly on data, not in legends
- Reference lines with text labels, not color bands
- Small multiples grid, not card+accordion (Tufte: instant visual fleet
comparison)
- Ingestor wall clock for all timestamps (observer clocks may drift)
### Tests Added
**Ingestor tests:**
- `TestRoundToInterval` — 5 cases for rounding to 5-min boundaries
- `TestInsertMetrics` — basic insertion with all fields
- `TestInsertMetricsIdempotent` — INSERT OR REPLACE deduplication
- `TestInsertMetricsNullFields` — partial data with NULLs
- `TestPruneOldMetrics` — retention pruning
- `TestExtractObserverMetaNewFields` — parsing tx_air_secs, rx_air_secs,
recv_errors
**Server tests:**
- `TestGetObserverMetrics` — time-series query with since/until filters,
NULL handling
- `TestGetMetricsSummary` — fleet summary aggregation
- `TestObserverMetricsAPIEndpoints` — DB query verification
- `TestMetricsAPIEndpoints` — HTTP endpoint response shape
- `TestParseWindowDuration` — duration parsing for h/d formats
### Test Results
```
cd cmd/ingestor && go test ./... → PASS (26s)
cd cmd/server && go test ./... → PASS (5s)
```
### What's NOT in this PR (deferred to M2+)
- Server-side delta computation for cumulative counters
- Airtime charts (TX/RX percentage lines)
- Channel quality chart (recv_error_rate)
- Battery voltage chart
- Reboot detection and chart annotations
- Resolution downsampling (1h, 1d aggregates)
- Pattern detection / automated diagnosis
---------
Co-authored-by: you <you@example.com>
## Summary
`txToMap()` previously always allocated observation sub-maps for every
packet, even though the `/api/packets` handler immediately stripped them
via `delete(p, "observations")` unless `expand=observations` was
requested. A typical page of 50 packets with ~5 observations each caused
300+ unnecessary map allocations per request.
## Changes
- **`txToMap`**: Add variadic `includeObservations bool` parameter.
Observations are only built when `true` is passed, eliminating
allocations when they'd just be discarded.
- **`PacketQuery`**: Add `ExpandObservations bool` field to thread the
caller's intent through the query pipeline.
- **`routes.go`**: Set `ExpandObservations` based on
`expand=observations` query param. Removed the post-hoc `delete(p,
"observations")` loop — observations are simply never created when not
requested.
- **Single-packet lookups** (`GetPacketByID`, `GetPacketByHash`): Always
pass `true` since detail views need observations.
- **Multi-node/analytics queries**: Default (no flag) = no observations,
matching prior behavior.
## Testing
- Added `TestTxToMapLazyObservations` covering all three cases: no flag,
`false`, and `true`.
- All existing tests pass (`go test ./...`).
## Perf Impact
Eliminates ~250 observation map allocations per /api/packets request (at
default page size of 50 with ~5 observations each). This is a
constant-factor improvement per request — no algorithmic complexity
change.
Fixes#374
Co-authored-by: you <you@example.com>
## Summary
Optimizes `QueryGroupedPackets()` in `store.go` to eliminate two major
inefficiencies on every grouped packet list request:
### Changes
1. **Cache `UniqueObserverCount` on `StoreTx`** — Instead of iterating
all observations to count unique observers on every query
(O(total_observations) per request), we now track unique observers at
ingest time via an `observerSet` map and pre-computed
`UniqueObserverCount` field. This is updated incrementally as
observations arrive.
2. **Defer map construction until after pagination** — Previously,
`map[string]interface{}` was built for ALL 30K+ filtered results before
sorting and paginating. Now the grouped cache stores sorted `[]*StoreTx`
pointers (lightweight), and `groupedTxsToPage()` builds maps only for
the requested page (typically 50 items). This eliminates ~30K map
allocations per cache miss.
3. **Lighter cache footprint** — The grouped cache now stores
`[]*StoreTx` instead of `*PacketResult` with pre-built maps, reducing
memory pressure and GC work.
### Complexity
- Observer counting: O(1) per query (was O(total_observations))
- Map construction: O(page_size) per query (was O(n) where n = all
filtered results)
- Sort remains O(n log n) on cache miss, but the cache (3s TTL) absorbs
repeated requests
### Testing
- `cd cmd/server && go test ./...` — all tests pass
- `cd cmd/ingestor && go build ./...` — builds clean
Fixes#370
---------
Co-authored-by: you <you@example.com>
## Summary
Replace `time.Tick()` with `time.NewTicker()` in the auto-prune
goroutine so it stops cleanly during graceful shutdown.
## Problem
`time.Tick` creates a ticker that can never be garbage collected or
stopped. While the prune goroutine runs for the process lifetime, it
won't stop during graceful shutdown — the goroutine leaks past the
shutdown sequence.
## Fix
- Create a `time.NewTicker` and a done channel
- Use `select` to listen on both the ticker and done channel
- Stop the ticker and close the done channel in the shutdown path (after
`poller.Stop()`)
- Pattern matches the existing `StartEvictionTicker()` approach
## Testing
- `go build ./...` — compiles cleanly
- `go test ./...` — all tests pass
Fixes#377
Co-authored-by: you <you@example.com>
## Summary
Combines the chained `filterTxSlice` calls in `filterPackets()` into a
single pass over the packet slice.
## Problem
When multiple filter parameters are specified (e.g.,
`type=4&route=1&since=...&until=...`), each filter created a new
intermediate `[]*StoreTx` slice. With N filters, this meant N separate
scans and N-1 unnecessary allocations.
## Fix
All filter predicates (type, route, observer, hash, since, until,
region, node) are pre-computed before the loop, then evaluated in a
single `filterTxSlice` call. This eliminates all intermediate
allocations.
**Preserved behavior:**
- Fast-path index lookups for hash-only and observer-only queries remain
unchanged
- Node-only fast-path via `byNode` index preserved
- All existing filter semantics maintained (same comparison operators,
same null checks)
**Complexity:** Single `O(n)` pass regardless of how many filters are
active, vs previous `O(n * k)` where k = number of active filters (each
pass is O(n) but allocates).
## Testing
All existing tests pass (`cd cmd/server && go test ./...`).
Fixes#373
Co-authored-by: you <you@example.com>
## Summary
Sort `snrVals` and `rssiVals` once upfront in `computeAnalyticsRF()` and
read min/max/median directly from the sorted slices, instead of copying
and sorting per stat call.
## Changes
- Sort both slices once before computing stats (2 sorts total instead of
4+ copy+sorts)
- Read `min` from `sorted[0]`, `max` from `sorted[len-1]`, `median` from
`sorted[len/2]`
- Remove the now-unused `sortedF64` and `medianF64` helper closures
## Performance impact
With 100K+ observations, this eliminates multiple O(n log n) copy+sort
operations. Previously each call to `medianF64` did a full copy + sort,
and `minF64`/`maxF64` did O(n) scans on the unsorted array. Now: 2
in-place sorts total, O(1) lookups for min/max/median.
Fixes#366
Co-authored-by: you <you@example.com>
## Summary
`EvictStale()` was doing O(n) linear scans per evicted item to remove
from secondary indexes (`byObserver`, `byPayloadType`, `byNode`).
Evicting 1000 packets from an observer with 50K observations meant 1000
× 50K = 50M comparisons — all under a write lock.
## Fix
Replace per-item removal with batch single-pass filtering:
1. **Collect phase**: Walk evicted packets once, building sets of
evicted tx IDs, observation IDs, and affected index keys
2. **Filter phase**: For each affected index slice, do a single pass
keeping only non-evicted entries
**Before**: O(evicted_count × index_slice_size) per index — quadratic in
practice
**After**: O(evicted_count + index_slice_size) per affected key — linear
## Changes
- `cmd/server/store.go`: Restructured `EvictStale()` eviction loop into
collect + batch-filter pattern
## Testing
- All existing tests pass (`cd cmd/server && go test ./...`)
Fixes#368
Co-authored-by: you <you@example.com>
## Summary
`QueryMultiNodePackets()` was scanning ALL packets with
`strings.Contains` on JSON blobs — O(packets × pubkeys × json_length).
With 30K+ packets and multiple pubkeys, this caused noticeable latency
on `/api/packets?nodes=...`.
## Fix
Replace the full scan with lookups into the existing `byNode` index,
which already maps pubkeys to their transmissions. Merge results with
hash-based deduplication, then apply time filters.
**Before:** O(N × P × J) where N=all packets, P=pubkeys, J=avg JSON
length
**After:** O(M × P) where M=packets per pubkey (typically small), plus
O(R log R) sort for pagination correctness
Results are sorted by `FirstSeen` after merging to maintain the
oldest-first ordering expected by the pagination logic.
Fixes#357
Co-authored-by: you <you@example.com>
## Problem
`GetNodeAnalytics()` in `store.go` scans ALL 30K+ packets doing
`strings.Contains` on every JSON blob when the node has a name, then
filters by time range *after* the full scan. This is `O(packets ×
json_length)` on every `/api/nodes/{pubkey}/analytics` request.
## Fix
Move the `fromISO` time check inside the scan loop so old packets are
skipped **before** the expensive `strings.Contains` matching. For the
non-name path (indexed-only), the time filter is also applied inline,
eliminating the separate `allPkts` intermediate slice.
### Before
1. Scan all packets → collect matches (including old ones) → `allPkts`
2. Filter `allPkts` by time → `packets`
### After
1. Scan packets, skip `tx.FirstSeen <= fromISO` immediately → `packets`
This avoids `strings.Contains` calls on packets outside the requested
time window (typically 7 days out of months of data).
## Complexity
- **Before:** `O(total_packets × avg_json_length)` for name matching
- **After:** `O(recent_packets × avg_json_length)` — only packets within
the time window are string-matched
## Testing
- `cd cmd/server && go test ./...` — all tests pass
Fixes#367
Co-authored-by: you <you@example.com>
## Summary
Consolidates the 4 parallel `/api/analytics/subpaths` calls in the Route
Patterns tab into a single `/api/analytics/subpaths-bulk` endpoint,
eliminating 3 redundant server-side scans of the subpath index on cache
miss.
## Changes
### Backend (`cmd/server/routes.go`, `cmd/server/store.go`)
- New `GET
/api/analytics/subpaths-bulk?groups=2-2:50,3-3:30,4-4:20,5-8:15`
endpoint
- Groups format: `minLen-maxLen:limit` comma-separated
- `GetAnalyticsSubpathsBulk()` iterates `spIndex` once, bucketing
entries into per-group accumulators by hop length
- Hop name resolution is done once per raw hop and shared across groups
- Results are cached per-group for compatibility with existing
single-key cache lookups
- Region-filtered queries fall back to individual
`GetAnalyticsSubpaths()` calls (region filtering requires
per-transmission observer checks)
### Frontend (`public/analytics.js`)
- `renderSubpaths()` now makes 1 API call instead of 4
- Response shape: `{ results: [{ subpaths, totalPaths }, ...] }` —
destructured into the same `[d2, d3, d4, d5]` variables
### Tests (`cmd/server/routes_test.go`)
- `TestAnalyticsSubpathsBulk`: validates 3-group response shape, missing
params error, invalid format error
## Performance
- **Before:** 4 API calls → 4 scans of `spIndex` + 4× hop resolution on
cache miss
- **After:** 1 API call → 1 scan of `spIndex` + 1× hop resolution
(shared cache)
- Cache miss cost reduced by ~75% for this tab
- No change on cache hit (individual group caching still works)
Fixes#398
Co-authored-by: you <you@example.com>
## Summary
Fixes the N+1 API call pattern when changing observation sort mode on
the packets page. Previously, switching sort to Path or Time fired
individual `/api/packets/{hash}` requests for **every**
multi-observation group without cached children — potentially 100+
concurrent requests.
## Changes
### Backend: Batch observations endpoint
- **New endpoint:** `POST /api/packets/observations` accepts `{"hashes":
["h1", "h2", ...]}` and returns all observations keyed by hash in a
single response
- Capped at 200 hashes per request to prevent abuse
- 4 test cases covering empty input, invalid JSON, too-many-hashes, and
valid requests
### Frontend: Use batch endpoint
- `packets.js` sort change handler now collects all hashes needing
observation data and sends a single POST request instead of N individual
GETs
- Same behavior, single round-trip
## Performance
- **Before:** Changing sort with 100 visible groups → 100 concurrent API
requests, browser connection queueing (6 per host), several seconds of
lag
- **After:** Single POST request regardless of group count, response
time proportional to store lookup (sub-millisecond per hash in memory)
Fixes#389
---------
Co-authored-by: you <you@example.com>
## Summary
`handleObservers()` in `routes.go` was calling `GetNodeLocations()`
which fetches ALL nodes from the DB just to match ~10 observer IDs
against node public keys. With 500+ nodes this is wasteful.
## Changes
- **`db.go`**: Added `GetNodeLocationsByKeys(keys []string)` — queries
only the rows matching the given public keys using a parameterized
`WHERE LOWER(public_key) IN (?, ?, ...)` clause.
- **`routes.go`**: `handleObservers` now collects observer IDs and calls
the targeted method instead of the full-table scan.
- **`coverage_test.go`**: Added `TestGetNodeLocationsByKeys` covering
known key, empty keys, and unknown key cases.
## Performance
With ~10 observers and 500+ nodes, the query goes from scanning all 500
rows to fetching only ~10. The original `GetNodeLocations()` is
preserved for any other callers.
Fixes#378
Co-authored-by: you <you@example.com>
## Summary
Replace N+1 per-hop DB queries in `handleResolveHops` with O(1) lookups
against the in-memory prefix map that already exists in the packet
store.
## Problem
Each hop in the `resolve-hops` API triggered a separate `SELECT ... LIKE
?` query against the nodes table. With 10 hops, that's 10 DB round-trips
— unnecessary when `getCachedNodesAndPM()` already maintains an
in-memory prefix map that can resolve hops instantly.
## Changes
- **routes.go**: Replace the per-hop DB query loop with `pm.m[hopLower]`
lookups from the prefix map. Convert `nodeInfo` → `HopCandidate` inline.
Remove unused `rows`/`sql.Scan` code.
- **store.go**: Add `InvalidateNodeCache()` method to force prefix map
rebuild (needed by tests that insert nodes after store initialization).
- **routes_test.go**: Give `TestResolveHopsAmbiguous` a proper store so
hops resolve via the prefix map.
- **resolve_context_test.go**: Call `InvalidateNodeCache()` after
inserting test nodes. Fix confidence assertion — with GPS candidates and
no affinity context, `resolveWithContext` correctly returns
`gps_preference` (previously masked because the prefix map didn't have
the test nodes).
## Complexity
O(1) per hop lookup via hash map vs O(n) DB scan per hop. No hot-path
impact — this endpoint is called on-demand, not in a render loop.
Fixes#369
---------
Co-authored-by: you <you@example.com>
## Summary
Replace full `buildDistanceIndex()` rebuild with incremental
`removeTxFromDistanceIndex`/`addTxToDistanceIndex` for only the
transmissions whose paths actually changed during
`IngestNewObservations`.
## Problem
When any transmission's best path changed during observation ingestion,
the **entire distance index was rebuilt** — iterating all 30K+ packets,
resolving all hops, and computing haversine distances. This
`O(total_packets × avg_hops)` operation ran under a write lock, blocking
all API readers.
A 30-second debounce (`distRebuildInterval`) was added in #557 to
mitigate this, but it only delayed the pain — the full rebuild still
happened, just less frequently.
## Fix
- Added `removeTxFromDistanceIndex(tx)` — filters out all
`distHopRecord` and `distPathRecord` entries for a specific transmission
- Added `addTxToDistanceIndex(tx)` — computes and appends new distance
records for a single transmission
- In `IngestNewObservations`, changed path-change handling to call
remove+add for each affected tx instead of marking dirty and waiting for
a full rebuild
- Removed `distDirty`, `distLast`, and `distRebuildInterval` since
incremental updates are cheap enough to apply immediately
## Complexity
- **Before:** `O(total_packets × avg_hops)` per rebuild (30K+ packets)
- **After:** `O(changed_txs × avg_hops + total_dist_records)` — the
remove is a linear scan of the distance slices, but only for affected
txs; the add is `O(hops)` per changed tx
The remove scan over `distHops`/`distPaths` slices is linear in slice
length, but this is still far cheaper than the full rebuild which also
does JSON parsing, hop resolution, and haversine math for every packet.
## Tests
- Updated `TestDistanceRebuildDebounce` →
`TestDistanceIncrementalUpdate` to verify incremental behavior and check
for duplicate path records
- All existing tests pass (`go test ./...` in both `cmd/server` and
`cmd/ingestor`)
Fixes#365
---------
Co-authored-by: you <you@example.com>
## Summary
Cache `resolveRegionObservers()` results with a 30-second TTL to
eliminate repeated database queries for region→observer ID mappings.
## Problem
`resolveRegionObservers()` queried the database on every call despite
the observers table changing infrequently (~20 rows). It's called from
10+ hot paths including `filterPackets()`, `GetChannels()`, and multiple
analytics compute functions. When analytics caches are cold, parallel
requests each hit the DB independently.
## Solution
- Added a dedicated `regionObsMu` mutex + `regionObsCache` map with 30s
TTL
- Uses a separate mutex (not `s.mu`) to avoid deadlocks — callers
already hold `s.mu.RLock()`
- Cache is lazily populated per-region and fully invalidated after TTL
expires
- Follows the same pattern as `getCachedNodesAndPM()` (30s TTL,
on-demand rebuild)
## Changes
- **`cmd/server/store.go`**: Added `regionObsMu`, `regionObsCache`,
`regionObsCacheTime` fields; rewrote `resolveRegionObservers()` to check
cache first; added `fetchAndCacheRegionObs()` helper
- **`cmd/server/coverage_test.go`**: Added
`TestResolveRegionObserversCaching` — verifies cache population, cache
hits, and nil handling for unknown regions
## Testing
- All existing Go tests pass (`go test ./...`)
- New test verifies caching behavior (population, hits, nil for unknown
regions)
Fixes#362
---------
Co-authored-by: you <you@example.com>
## Summary
`GetStoreStats()` ran 5 sequential DB queries on every call. This
combines them into **2 concurrent queries**:
1. **Node/observer counts** — single query using subqueries: `SELECT
(SELECT COUNT(*) FROM nodes WHERE ...), (SELECT COUNT(*) FROM nodes),
(SELECT COUNT(*) FROM observers)`
2. **Observation counts** — single query using conditional aggregation:
`SUM(CASE WHEN timestamp > ? THEN 1 ELSE 0 END)` scoped to the 24h
window, avoiding a full table scan for the 1h count
Both queries run concurrently via goroutines + `sync.WaitGroup`.
## What changed
- `cmd/server/store.go`: Rewrote `GetStoreStats()` — 5 sequential
`QueryRow` calls → 2 concurrent combined queries
- Error handling now propagates query errors instead of silently
ignoring them
## Performance justification
- **Before:** 5 sequential round-trips to SQLite, with 2 potentially
expensive `COUNT(*)` scans on the `observations` table
- **After:** 2 concurrent round-trips; the observation query scans the
24h window once instead of separately scanning for 1h and 24h
- The 10s cache (`statsTTL`) remains, so this fires at most once per 10s
— but when it does fire, it's ~2.5x fewer round-trips and the
observation scan is halved
## Tests
- `go test ./...` passes for both `cmd/server` and `cmd/ingestor`
Fixes#363
---------
Co-authored-by: you <you@example.com>
## Summary
`GetSubpathDetail()` iterated ALL packets to find those containing a
specific subpath — `O(packets × hops × subpath_length)`. With 30K+
packets this caused user-visible latency on every subpath detail click.
## Changes
### `cmd/server/store.go`
- Added `spTxIndex map[string][]*StoreTx` alongside existing `spIndex` —
tracks which transmissions contain each subpath key
- Extended `addTxToSubpathIndexFull()` and
`removeTxFromSubpathIndexFull()` to maintain both indexes simultaneously
- Original `addTxToSubpathIndex()`/`removeTxFromSubpathIndex()` wrappers
preserved for backward compatibility
- `buildSubpathIndex()` now populates both `spIndex` and `spTxIndex`
during `Load()`
- All incremental update sites (ingest, path change, eviction) use the
`Full` variants
- `GetSubpathDetail()` rewritten: direct `O(1)` map lookup on
`spTxIndex[key]` instead of scanning all packets
### `cmd/server/coverage_test.go`
- Added `TestSubpathTxIndexPopulated`: verifies `spTxIndex` is
populated, counts match `spIndex`, and `GetSubpathDetail` returns
correct results for both existing and non-existent subpaths
## Complexity
- **Before:** `O(total_packets × avg_hops × subpath_length)` per request
- **After:** `O(matched_txs)` per request (direct map lookup)
## Tests
All tests pass: `cmd/server` (4.6s), `cmd/ingestor` (25.6s)
Fixes#358
---------
Co-authored-by: you <you@example.com>
## Summary
`buildPrefixMap()` was generating map entries for every prefix length
from 2 to `len(pubkey)` (up to 64 chars), creating ~31 entries per node.
With 500 nodes that's ~15K map entries; with 1K+ nodes it balloons to
31K+.
## Changes
**`cmd/server/store.go`:**
- Added `maxPrefixLen = 8` constant — MeshCore path hops use 2–6 char
prefixes, 8 gives headroom
- Capped the prefix generation loop at `maxPrefixLen` instead of
`len(pk)`
- Added full pubkey as a separate map entry when key is longer than
`maxPrefixLen`, ensuring exact-match lookups (used by
`resolveWithContext`) still work
**`cmd/server/coverage_test.go`:**
- Added `TestPrefixMapCap` with subtests for:
- Short prefix resolution still works
- Full pubkey exact-match resolution still works
- Intermediate prefixes beyond the cap correctly return nil
- Short keys (≤8 chars) have all prefix entries
- Map size is bounded
## Impact
- Map entries per node: ~31 → ~8 (one per prefix length 2–8, plus one
full-key entry)
- Total map size for 500 nodes: ~15K entries → ~4K entries (~75%
reduction)
- No behavioral change for path hop resolution (2–6 char prefixes)
- No behavioral change for exact pubkey lookups
## Tests
All existing tests pass:
- `cmd/server`: ✅
- `cmd/ingestor`: ✅Fixes#364
---------
Co-authored-by: you <you@example.com>
## Summary
Index node path lookups in `handleNodePaths()` instead of scanning all
packets on every request.
## Problem
`handleNodePaths()` iterated ALL packets in the store (`O(total_packets
× avg_hops)`) with prefix string matching on every hop. This caused
user-facing latency on every node detail page load with 30K+ packets.
## Fix
Added a `byPathHop` index (`map[string][]*StoreTx`) that maps lowercase
hop prefixes and resolved full pubkeys to their transmissions. The
handler now does direct map lookups instead of a full scan.
### Index lifecycle
- **Built** during `Load()` via `buildPathHopIndex()`
- **Incrementally updated** during `IngestNewFromDB()` (new packets) and
`IngestNewObservations()` (path changes)
- **Cleaned up** during `EvictStale()` (packet removal)
### Query strategy
The handler looks up candidates from the index using:
1. Full pubkey (matches resolved hops from `resolved_path`)
2. 2-char prefix (matches short raw hops)
3. 4-char prefix (matches medium raw hops)
4. Any longer raw hops starting with the 4-char prefix
This reduces complexity from `O(total_packets × avg_hops)` to
`O(matching_txs + unique_hop_keys)`.
## Tests
- `TestNodePathsEndpointUsesIndex` — verifies the endpoint returns
correct results using the index
- `TestPathHopIndexIncrementalUpdate` — verifies add/remove operations
on the index
All existing tests pass.
Fixes#359
Co-authored-by: you <you@example.com>
## Summary
Fixes#566 — The "Inconsistent Hash Sizes" list on the Analytics page
included all node types and had no time window, causing false positives.
## Changes
### 1. Role filter on inconsistent nodes (`cmd/server/store.go`)
Added role filter to the `inconsistentNodes` loop in
`computeHashCollisions()` so only repeaters and room servers are
included. Companions are excluded since they were never affected by the
firmware bug. This matches the existing role filter on collision
bucketing from #441.
```go
// Before:
if cn.HashSizeInconsistent {
// After:
if cn.HashSizeInconsistent && (cn.Role == "repeater" || cn.Role == "room_server") {
```
### 2. 7-day time window on hash size computation
(`cmd/server/store.go`)
Added a 7-day recency cutoff to `computeNodeHashSizeInfo()`. Adverts
older than 7 days are now skipped, preventing legitimate historical
config changes (e.g., testing different byte sizes) from creating
permanent false positives.
### 3. Frontend description text (`public/analytics.js`)
Updated the description to reflect the filtered scope: now says
"Repeaters and room servers" instead of "Nodes", mentions the 7-day
window, and notes that companions are excluded.
## Tests
- `TestInconsistentNodesExcludesCompanions` — verifies companions are
excluded while repeaters and room servers are included
- `TestHashSizeInfoTimeWindow` — verifies adverts older than 7 days are
excluded from hash size computation
- Updated existing hash size tests to use recent timestamps (compatible
with the new time window)
- All existing tests pass: `cmd/server` ✅, `cmd/ingestor` ✅
## Perf justification
The time window filter adds a single string comparison per advert in the
scan loop — O(n) with a tiny constant. No impact on hot paths.
---------
Co-authored-by: you <you@example.com>
## Summary
Replace O(n) map iteration in `MaxTransmissionID()` and
`MaxObservationID()` with O(1) field lookups.
## What Changed
- Added `maxTxID` and `maxObsID` fields to `PacketStore`
- Updated `Load()`, `IngestNewFromDB()`, and `IngestNewObservations()`
to track max IDs incrementally as entries are added
- `MaxTransmissionID()` and `MaxObservationID()` now return the tracked
field directly instead of iterating the entire map
## Performance
Before: O(n) iteration over 30K+ map entries under a read lock
After: O(1) field return
## Tests
- Added `TestMaxTransmissionIDIncremental` verifying the incremental
field matches brute-force iteration over the maps
- All existing tests pass (`cmd/server` and `cmd/ingestor`)
Fixes#356
Co-authored-by: you <you@example.com>
## Problem
Closes#563. Addresses the *Packet store estimated memory* item in #559.
`estimatedMemoryMB()` used a hardcoded formula:
```go
return float64(len(s.packets)*5120+s.totalObs*500) / 1048576.0
```
This ignored three data structures that grow continuously with every
ingest cycle:
| Structure | Production size | Heap not counted |
|---|---|---|
| `distHops []distHopRecord` | 1,556,833 records | ~300 MB |
| `distPaths []distPathRecord` | 93,090 records | ~25 MB |
| `spIndex map[string]int` | 4,113,234 entries | ~400 MB |
Result: formula reported ~1.2 GB while actual heap was ~5 GB. With
`maxMemoryMB: 1024`, eviction calculated it only needed to shed ~200 MB,
removed a handful of packets, and stopped. Memory kept growing until the
OOM killer fired.
## Fix
Replace `estimatedMemoryMB()` with `runtime.ReadMemStats` so all data
structures are automatically counted:
```go
func (s *PacketStore) estimatedMemoryMB() float64 {
if s.memoryEstimator != nil {
return s.memoryEstimator()
}
var ms runtime.MemStats
runtime.ReadMemStats(&ms)
return float64(ms.HeapAlloc) / 1048576.0
}
```
Replace the eviction simulation loop (which re-used the same wrong
formula) with a proportional calculation: if heap is N× over budget,
evict enough packets to keep `(1/N) × 0.9` of the current count. The 0.9
factor adds a 10% buffer so the next ingest cycle doesn't immediately
re-trigger. All major data structures (distHops, distPaths, spIndex)
scale with packet count, so removing a fraction of packets frees roughly
the same fraction of total heap.
## Testing
- Updated `TestEvictStale_MemoryBasedEviction` to inject a deterministic
estimator via the new `memoryEstimator` field.
- Added `TestEvictStale_MemoryBasedEviction_UnderestimatedHeap`:
verifies that when actual heap is 5× over limit (the production failure
scenario), eviction correctly removes ~80%+ of packets.
```
=== RUN TestEvictStale_MemoryBasedEviction
[store] Evicted 538 packets (1076 obs)
--- PASS
=== RUN TestEvictStale_MemoryBasedEviction_UnderestimatedHeap
[store] Evicted 820 packets (1640 obs)
--- PASS
```
Full suite: `go test ./...` — ok (10.3s)
## Perf note
`runtime.ReadMemStats` runs once per eviction tick (every 60 s) and once
per `/api/perf/store` call. Cost is negligible.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
Fixes critical performance issue in neighbor graph computation that
consumed 65% of CPU (30+ seconds) on a 325K packet dataset.
## Changes
### Fix 1: Cache strings.ToLower results
- Added cachedToLower() helper that caches lowercased strings in a local
map
- Pubkeys repeat across hundreds of thousands of observations
- Pre-computes fromLower once per transaction instead of once per
observation
- **Impact:** Eliminates ~8.4s (25.3% CPU)
### Fix 2: Cache parsed DecodedJSON via StoreTx.ParsedDecoded()
- Added ParsedDecoded() method on StoreTx using sync.Once for
thread-safe lazy caching
- json.Unmarshal on decoded_json now runs at most once per packet
lifetime
- Result reused by extractFromNode, indexByNode, trackAdvertPubkey
- **Impact:** Eliminates ~8.8s (26.3% CPU)
### Fix 3: Extend neighbor graph TTL from 60s to 5 minutes
- The graph depends on traffic patterns, not individual packets
- Reduces rebuild frequency 5x
- **Impact:** ~80% reduction in sustained CPU from graph rebuilds
## Tests
- 7 new tests added, all 26+ existing neighbor graph tests pass
- BenchmarkBuildFromStore: 727us/op, 237KB/op, 6030 allocs/op
Related: #559
---------
Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: you <you@example.com>
detectSchema() runs at DB open time before ensureResolvedPathColumn()
adds the column during Load(). On first run (or any run where the column
was just added), hasResolvedPath stayed false, causing Load() to skip
reading resolved_path from SQLite. This forced a full backfill of all
observations on every restart, burning CPU for minutes on large DBs.
Fix: set hasResolvedPath = true after ensureResolvedPathColumn succeeds.
## Summary
Implements server-side hop prefix resolution at ingest time with a
persisted neighbor graph. Hop prefixes in `path_json` are now resolved
to full 64-char pubkeys at ingest and stored as `resolved_path` on each
observation, eliminating the need for client-side resolution via
`HopResolver`.
Fixes#555
## What changed
### New file: `cmd/server/neighbor_persist.go`
SQLite persistence layer for the neighbor graph and resolved paths:
- `neighbor_edges` table creation and management
- Load/build/persist neighbor edges from/to SQLite
- `resolved_path` column migration on observations
- `resolvePathForObs()` — resolves hop prefixes using
`resolveWithContext` with 4-tier priority (affinity → geo → GPS → first
match)
- Cold startup backfill for observations missing `resolved_path`
- Async persistence of edges and resolved paths during ingest
(non-blocking)
### Modified: `cmd/server/store.go`
- `StoreObs` gains `ResolvedPath []*string` field
- `StoreTx` gains `ResolvedPath []*string` (cached from best
observation)
- `Load()` dynamically includes `resolved_path` in SQL query when column
exists
- `IngestNewFromDB()` resolves paths at ingest time and persists
asynchronously
- `pickBestObservation()` propagates `ResolvedPath` to transmission
- `txToMap()` and `enrichObs()` include `resolved_path` in API responses
- All 7 `pm.resolve()` call sites migrated to `pm.resolveWithContext()`
with the persisted graph
- Broadcast maps include `resolved_path` per observation
### Modified: `cmd/server/db.go`
- `DB` struct gains `hasResolvedPath bool` flag
- `detectSchema()` checks for `resolved_path` column existence
- Graceful degradation when column is absent (test DBs, old schemas)
### Modified: `cmd/server/main.go`
- Startup sequence: ensure tables → load/build graph → backfill resolved
paths → re-pick best observations
### Modified: `cmd/server/routes.go`
- `mapSliceToTransmissions()` and `mapSliceToObservations()` propagate
`resolved_path`
- Node paths handler uses `resolveWithContext` with graph
### Modified: `cmd/server/types.go`
- `TransmissionResp` and `ObservationResp` gain `ResolvedPath []*string`
with `omitempty`
### New file: `cmd/server/neighbor_persist_test.go`
16 tests covering:
- Path resolution (unambiguous, empty, unresolvable prefixes)
- Marshal/unmarshal of resolved_path JSON
- SQLite table creation and column migration (idempotent)
- Edge persistence and loading
- Schema detection
- Full Load() with resolved_path
- API response serialization (present when set, omitted when nil)
## Design decisions
1. **Async persistence** — resolved paths and neighbor edges are written
to SQLite in a goroutine to avoid blocking the ingest loop. The
in-memory state is authoritative.
2. **Schema compatibility** — `DB.hasResolvedPath` flag allows the
server to work with databases that don't yet have the `resolved_path`
column. SQL queries dynamically include/exclude the column.
3. **`pm.resolve()` retained** — Not removed as dead code because
existing tests use it directly. All production call sites now use
`resolveWithContext` with the persisted graph.
4. **Edge persistence is conservative** — Only unambiguous edges (single
candidate) are persisted to `neighbor_edges`. Ambiguous prefixes are
handled by the in-memory `NeighborGraph` via Jaccard disambiguation.
5. **`null` = unresolved** — Ambiguous prefixes store `null` in the
resolved_path array. Frontend falls back to prefix display.
## Performance
- `resolveWithContext` per hop: ~1-5μs (map lookups, no DB queries)
- Typical packet has 0-5 hops → <25μs total resolution overhead per
packet
- Edge/path persistence is async → zero impact on ingest latency
- Backfill is one-time on first startup with the new column
## Test results
```
cd cmd/server && go test ./... -count=1 → ok (4.4s)
cd cmd/ingestor && go test ./... -count=1 → ok (25.5s)
```
---------
Co-authored-by: you <you@example.com>
## Problem
On busy meshes (325K+ transmissions, 50 observers), the distance index
rebuild runs on **every ingest poll** (~1s interval), computing
haversine distances for 1M+ hop records. Each rebuild takes 2-3 seconds
but new observations arrive faster than it can finish, creating a CPU
hot loop that starves the HTTP server.
Discovered on the Cascadia Mesh instance where `corescope-server` was
consuming 15 minutes of CPU time in 10 minutes of uptime, the API was
completely unresponsive, and health checks were timing out.
### Server logs showing the hot loop:
```
[store] Built distance index: 1797778 hop records, 207072 path records
[store] Built distance index: 1797806 hop records, 207075 path records
[store] Built distance index: 1797811 hop records, 207075 path records
[store] Built distance index: 1797820 hop records, 207075 path records
```
Every 2 seconds, nonstop.
## Root Cause
`IngestNewObservations` calls `buildDistanceIndex()` synchronously
whenever `pickBestObservation` selects a longer path. With 50 observers
sending observations every second, paths change on nearly every poll
cycle, triggering a full rebuild each time.
## Fix
- Mark distance index dirty on path changes instead of rebuilding inline
- Rebuild at most every **30 seconds** (configurable via `distLast`
timer)
- Set `distLast` after initial `Load()` to prevent immediate re-rebuild
on first ingest
- Distance data is at most 30s stale — acceptable for an analytics view
## Testing
- `go build`, `go vet`, `go test` all pass
- No behavioral change for the initial load or the analytics API
response shape
- Distance data freshness goes from real-time to 30s max staleness
---------
Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: you <you@example.com>
Fixes#441
## Summary
Hash collision analysis was including ALL node types, inflating
collision counts with irrelevant data. Per MeshCore firmware analysis,
**only repeaters matter for collision analysis** — they're the only role
that forwards packets and appears in routing `path[]` arrays.
## Root Causes Fixed
1. **`hash_size==0` nodes counted in all buckets** — nodes with unknown
hash size were included via `cn.HashSize == bytes || cn.HashSize == 0`,
polluting every bucket
2. **Non-repeater roles included** — companions, rooms, sensors, and
observers were counted even though their hash collisions never cause
routing ambiguity
## Fix
Changed `computeHashCollisions()` filter from:
```go
// Before: include everything except companions
if cn.HashSize == bytes && cn.Role != "companion" {
```
To:
```go
// After: only include repeaters (per firmware analysis)
if cn.HashSize == bytes && cn.Role == "repeater" {
```
## Why only repeaters?
From [MeshCore firmware
analysis](https://github.com/Kpa-clawbot/CoreScope/issues/441#issuecomment-4185218547):
- Only repeaters override `allowPacketForward()` to return `true`
- Only repeaters append their hash to `path[]` during relay
- Companions, rooms, sensors, observers never forward packets
- Cross-role collisions are benign (companion silently drops, real
repeater still forwards)
## Tests
- `TestHashCollisionsOnlyRepeaters` — verifies companions, rooms,
sensors, and hash_size==0 nodes are all excluded
---------
Co-authored-by: you <you@example.com>
## Summary
Fixes#533 — server cache hit rate always 0%.
## Root Cause
`invalidateCachesFor()` is called at the end of every
`IngestNewFromDB()` and `IngestNewObservations()` cycle (~2-5s). Since
new data arrives continuously, caches are cleared faster than any
analytics request can hit them, resulting in a permanent 0% cache hit
rate. The cache TTL (15s/60s) is irrelevant because entries are evicted
by invalidation long before they expire.
## Fix
Rate-limit cache invalidation with a 10-second cooldown:
- First call after cooldown goes through immediately
- Subsequent calls during cooldown accumulate dirty flags in
`pendingInv`
- Next call after cooldown merges pending + current flags and applies
them
- Eviction bypasses cooldown (data removal requires immediate clearing)
Analytics data may be at most ~10s stale, which is acceptable for a
dashboard.
## Changes
- **`store.go`**: Added `lastInvalidated`, `pendingInv`, `invCooldown`
fields. Refactored `invalidateCachesFor()` to rate-limit non-eviction
invalidation. Extracted `applyCacheInvalidation()` helper.
- **`cache_invalidation_test.go`**: Added 4 new tests:
- `TestInvalidationRateLimited` — verifies caches survive during
cooldown
- `TestInvalidationCooldownAccumulatesFlags` — verifies flag merging
- `TestEvictionBypassesCooldown` — verifies eviction always clears
immediately
- `BenchmarkCacheHitDuringIngestion` — confirms 100% hit rate during
rapid ingestion (was 0%)
## Perf Proof
```
BenchmarkCacheHitDuringIngestion-16 3467889 1018 ns/op 100.0 hit%
```
Before: 0% hit rate under continuous ingestion. After: 100% hit rate
during cooldown periods.
Co-authored-by: you <you@example.com>
## Summary
`GetPerfStoreStats()` and `GetPerfStoreStatsTyped()` iterated **all**
ADVERT packets and called `json.Unmarshal` on each one — under a read
lock — on every `/api/perf` and `/api/health` request. With 5K+ adverts,
each health check triggered thousands of JSON parses.
## Fix
Added a refcounted `advertPubkeys map[string]int` to `PacketStore` that
tracks distinct pubkeys incrementally during `Load()`,
`IngestNewFromDB()`, and eviction. The perf/health handlers now just
read `len(s.advertPubkeys)` — O(1) with zero allocations.
## Benchmark Results (5K adverts, 200 distinct pubkeys)
| Method | ns/op | allocs/op |
|--------|-------|-----------|
| `GetPerfStoreStatsTyped` | **78** | **0** |
| `GetPerfStoreStats` | **2,565** | **9** |
Before this change, both methods performed O(N) JSON unmarshals per
call.
## Tests Added
- `TestAdvertPubkeyTracking` — verifies incremental tracking through
add/evict lifecycle
- `TestAdvertPubkeyPublicKeyField` — covers the `public_key` JSON field
variant
- `TestAdvertPubkeyNonAdvert` — ensures non-ADVERT packets don't affect
count
- `BenchmarkGetPerfStoreStats` — 5K adverts benchmark
- `BenchmarkGetPerfStoreStatsTyped` — 5K adverts benchmark
Fixes#360
---------
Co-authored-by: you <you@example.com>
## Summary
Fixes#355 — replaces O(n²) observation dedup in `Load()`,
`IngestNewFromDB()`, and `IngestNewObservations()` with an O(1)
map-based lookup.
## Changes
- Added `obsKeys map[string]bool` field to `StoreTx` for O(1) dedup
keyed on `observerID + "|" + pathJSON`
- Replaced all 3 linear-scan dedup sites in `store.go` with map lookups
- Lazy-init `obsKeys` for transmissions created before this change (in
`IngestNewFromDB` and `IngestNewObservations`)
- Added regression test (`TestObsDedupCorrectness`) verifying dedup
correctness
- Added nil-map safety test (`TestObsDedupNilMapSafety`)
- Added benchmark comparing map vs linear scan
## Benchmark Results (ARM64, 16 cores)
| Observations | Map (O(1)) | Linear (O(n)) | Speedup |
|---|---|---|---|
| 10 | 34 ns/op | 41 ns/op | 1.2x |
| 50 | 34 ns/op | 186 ns/op | 5.5x |
| 100 | 34 ns/op | 361 ns/op | 10.6x |
| 500 | 34 ns/op | 4,903 ns/op | **146x** |
Map lookup is constant time regardless of observation count. The linear
scan degrades quadratically — at 500 observations per transmission
(realistic for popular packets seen by many observers), the old code is
146x slower per dedup check.
All existing tests pass.
---------
Co-authored-by: you <you@example.com>
## Summary
Fixes#354
Replaces the O(n²) selection sort in `sortedCopy()` with Go's built-in
`sort.Float64s()` (O(n log n)).
## Changes
- **`cmd/server/routes.go`**: Replaced manual nested-loop selection sort
with `sort.Float64s(cp)`
- **`cmd/server/helpers_test.go`**: Added regression test with
1000-element random input + benchmark
## Benchmark Results (ARM64)
```
BenchmarkSortedCopy/n=256 ~16μs/op 1 alloc
BenchmarkSortedCopy/n=1000 ~95μs/op 1 alloc
BenchmarkSortedCopy/n=10000 ~1.3ms/op 1 alloc
```
With the old O(n²) sort, n=10000 would take ~50ms+. The new
implementation scales as O(n log n).
## Testing
- All existing `TestSortedCopy` tests pass (unchanged behavior)
- New `TestSortedCopyLarge` validates correctness on 1000 random
elements
- `go test ./...` passes in `cmd/server`
Co-authored-by: you <you@example.com>
## Summary
- When `groupByHash=true`, each group only carries its representative
(best-path) `observer_id`. The client-side filter was checking only that
field, silently dropping groups that were seen by the selected observer
but had a different representative.
- `loadPackets` now passes the `observer` param to the server so
`filterPackets`/`buildGroupedWhere` do the correct "any observation
matches" check.
- Client-side observer filter in `renderTableRows` is skipped for
grouped mode (server already filtered correctly).
- Both `db.go` and `store.go` observer filtering extended to support
comma-separated IDs (multi-select UI).
## Test plan
- [ ] Set an observer filter on the Packets screen with grouping enabled
— all groups that have **any** observation from the selected observer(s)
should appear, not just groups where that observer is the representative
- [ ] Multi-select two observers — groups seen by either should appear
- [ ] Toggle to flat (ungrouped) mode — per-observation filter still
works correctly
- [ ] Existing grouped packets tests pass: `cd cmd/server && go test
./...`
Fixes#464🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: you <you@example.com>
## Summary
Fixes#525 — Customizer v2 home section shows empty fields and adding
FAQ kills steps.
## Root Cause
Server returned `home: null` from `/api/config/theme` when no home
config existed in config.json or theme.json. The customizer had no
built-in defaults, so all home fields appeared empty. When a user added
a single override (e.g. FAQ), `computeEffective` started from `home:
null`, created `home: {}`, and only applied the user's override — wiping
steps and everything else.
## Fix
### Server-side (primary)
In `handleConfigTheme()`, replaced the conditional `home` assignment
with `mergeMap` using built-in defaults matching what `home.js`
hardcodes:
- `heroTitle`: "CoreScope"
- `heroSubtitle`: "Real-time MeshCore LoRa mesh network analyzer"
- `steps`: 4 default getting-started steps
- `footerLinks`: Packets + Network Map links
Config/theme overrides merge on top, so customization still works.
### Client-side (defense-in-depth)
Added `DEFAULT_HOME` constant in `customize-v2.js`. `computeEffective()`
now falls back to these defaults when server returns `home: null`,
ensuring the customizer works even without server defaults.
## Tests
- **Go**: `TestConfigThemeHomeDefaults` — verifies `/api/config/theme`
returns non-null home with heroTitle, steps, footerLinks when no config
is set
- **JS**: Two new tests in `test-frontend-helpers.js` — verifies
`computeEffective` provides defaults when home is null, and that user
overrides merge correctly with defaults
Co-authored-by: you <you@example.com>
## Summary
Fixes the neighbor affinity graph returning empty results despite
abundant ADVERT data in the store.
**Root cause:** `extractFromNode()` in `neighbor_graph.go` only checked
for `"from_node"` and `"from"` fields in the decoded JSON, but real
ADVERT packets store the originator public key as `"pubKey"`. This meant
`fromNode` was always empty, so:
- Zero-hop edges (originator↔observer) were never created
- Originator↔path[0] edges were never created
- Only observer↔path[last] edges could be created (and only for
non-empty paths)
**Fix:** Check `"pubKey"` first in `extractFromNode()`, then fall
through to `"from_node"` and `"from"` for other packet types.
## Bugs Fixed
| Bug | Issue | Fix |
|-----|-------|-----|
| Empty graph results | #522 | `extractFromNode()` now reads `pubKey`
field from ADVERTs |
| 3-4s response time | #523 comment | Graph was rebuilding correctly
with 60s TTL cache — the slow response was due to iterating all packets
finding zero matches. With edges now being found, the cache works as
designed. |
| Incomplete visualization | #523 comment | Downstream of bug 1+2 —
fixed by fixing the builder |
| Accessibility | #523 comment | Added text-based neighbor list, dynamic
aria-label, keyboard focus CSS, dashed lines for ambiguous edges,
confidence symbols |
## Changes
- **`cmd/server/neighbor_graph.go`** — Fixed `extractFromNode()` to
check `pubKey` field (real ADVERT format)
- **`cmd/server/neighbor_graph_test.go`** — Added 2 new tests:
`TestBuildNeighborGraph_AdvertPubKeyField` (real ADVERT format) and
`TestBuildNeighborGraph_OneByteHashPrefixes` (1-byte prefix collision
scenario)
- **`public/analytics.js`** — Added accessible text-based neighbor list,
dynamic aria-label, dashed line pattern for ambiguous edges
- **`public/style.css`** — Added `:focus-visible` keyboard focus
indicator for canvas
## Testing
All Go tests pass (`go test ./... -count=1`). New tests verify the fix
prevents regression.
Fixes#523, Fixes#522
---------
Co-authored-by: you <you@example.com>
Fixes#518, Fixes#514, Fixes#515, Fixes#516
## Summary
Fixes all customizer v2 bugs from the consolidated tracker (#518). Both
server and client changes.
## Server Changes (`routes.go`)
- **typeColors defaults** — added all 10 type color defaults matching
`roles.js` `TYPE_COLORS`. Previously returned `{}`, causing all type
colors to render as black.
- **themeDark defaults** — added 22 dark mode color defaults matching
the Default preset. Previously returned `{}`, causing dark mode to have
no server-side defaults.
## Client Changes (`customize-v2.js`)
- [x] **P0: Phantom override cleanup on init** — new
`_cleanPhantomOverrides()` runs on startup, scanning
`cs-theme-overrides` and removing any values that match server defaults
(arrays via `JSON.stringify`, scalars via `===`).
- [x] **P1: `setOverride` auto-prunes matching defaults** — after
debounced write, iterates the delta and removes any key whose value
matches the server default. Prevents phantom overrides from
accumulating.
- [x] **P1: `_countOverrides` counts only real diffs** — now iterates
keys and calls `_isOverridden()` instead of blindly counting
`Object.keys().length`. Badge count reflects actual overrides only.
- [x] **P1: `_isOverridden` handles arrays/objects** — uses
`JSON.stringify` comparison for non-scalar values (home.steps,
home.checklist, etc.).
- [x] **P1: Type color fallback** — `_renderNodes()` falls back to
`window.TYPE_COLORS` when effective typeColors are empty, preventing
black color swatches.
- [x] **P1: Dark/light toggle re-renders panel** — MutationObserver on
`data-theme` now calls `_refreshPanel()` when panel is open, so
switching modes updates the Theme tab immediately.
## Tests
6 new unit tests added to `test-customizer-v2.js`:
- Phantom scalar overrides cleaned on init
- Phantom array overrides cleaned on init
- Real overrides preserved after cleanup
- `isOverridden` handles matching arrays (returns false)
- `isOverridden` handles differing arrays (returns true)
- `setOverride` prunes value matching server default
All 48 tests pass. Go tests pass.
---------
Co-authored-by: you <you@example.com>
## Summary
Milestone 4 of #482: adds affinity-aware hop resolution to improve
disambiguation accuracy across all hop resolution in the app.
### What changed
**Backend — `prefixMap.resolveWithContext()` (store.go)**
New method that applies a 4-tier disambiguation priority when multiple
nodes match a hop prefix:
| Priority | Strategy | When it wins |
|----------|----------|-------------|
| 1 | **Affinity graph score** | Neighbor graph has data, score ratio ≥
3× runner-up |
| 2 | **Geographic proximity** | Context nodes have GPS, pick closest
candidate |
| 3 | **GPS preference** | At least one candidate has coordinates |
| 4 | **First match** | No signal — current naive fallback |
The existing `resolve()` method is unchanged for backward compatibility.
New callers that have context (originator, observer, adjacent hops) can
use `resolveWithContext()` for better results.
**API — `handleResolveHops` (routes.go)**
Enhanced `/api/resolve-hops` endpoint:
- New query params: `from_node`, `observer` — provide context for
affinity scoring
- New response fields on `HopCandidate`: `affinityScore` (float,
0.0–1.0)
- New response fields on `HopResolution`: `bestCandidate` (pubkey when
confident), `confidence` (one of `unique_prefix`, `neighbor_affinity`,
`ambiguous`)
- Backward compatible: without context params, behavior is identical to
before (just adds `confidence` field)
**Types (types.go)**
- `HopCandidate.AffinityScore *float64`
- `HopResolution.BestCandidate *string`
- `HopResolution.Confidence string`
### Tests
- 7 unit tests for `resolveWithContext` covering all 4 priority tiers +
edge cases
- 2 unit tests for `geoDistApprox`
- 4 API tests for enhanced `/api/resolve-hops` response shape
- All existing tests pass (no regressions)
### Impact
This improves ALL hop resolution across the app — analytics, route
display, subpath analysis, and any future feature that resolves hop
prefixes. The affinity graph (from M1/M2) now feeds directly into
disambiguation decisions.
Part of #482
---------
Co-authored-by: you <you@example.com>
## Summary
- `db.GetNodes` accepted a `region` param from the HTTP handler but
never used it — every region-filter selection was silently ignored and
all nodes were always returned
- Added a subquery filtering `nodes.public_key` against ADVERT
transmissions (payload_type=4) observed by observers with matching IATA
codes
- Handles both v2 (`observer_id TEXT`) and v3 (`observer_idx INT`)
schemas
## Test plan
- [x] 4 new subtests added to `TestGetNodesFiltering`: SJC (1 node), SFO
(1 node), SJC,SFO multi (1 node deduped), AMS unknown (0 nodes)
- [x] All existing Go tests still pass
- [x] Deploy to staging, open `/nodes`, select a region in the filter
bar — only nodes observed by observers in that region should appear
Closes#496🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: you <you@example.com>
## Summary
- `indexByNode` was calling `json.Unmarshal` for every packet during
`Load()` and `IngestNewFromDB()`, even channel messages and other
payloads that can never contain node pubkey fields
- All three target fields (`"pubKey"`, `"destPubKey"`, `"srcPubKey"`)
share the common substring `"ubKey"` — added a `strings.Contains`
pre-check that skips the JSON parse entirely for packets that don't
match
- At 30K+ packets on startup, this eliminates the majority of
`json.Unmarshal` calls in `indexByNode` (channel messages, status
packets, etc. all bypass it)
## Test plan
- [x] 5 new subtests in `TestIndexByNodePreCheck`: ADVERT with pubKey
indexed, destPubKey indexed, channel message skipped, empty JSON
skipped, duplicate hash deduped
- [x] All existing Go tests pass
- [x] Deploy to staging and verify node-filtered packet queries still
work correctly
Closes#376🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: you <you@example.com>
## Summary
- `BuildBreakdown` was never ported from the deleted Node.js
`decoder.js` to Go — the server has returned `breakdown: {}` since the
Go migration (commit `742ed865`), so `createColoredHexDump()` and
`buildHexLegend()` in the frontend always received an empty `ranges`
array and rendered everything as monochrome
- Implemented `BuildBreakdown()` in `decoder.go` — computes labeled byte
ranges matching the frontend's `LABEL_CLASS` map: `Header`, `Transport
Codes`, `Path Length`, `Path`, `Payload`; ADVERT packets get sub-ranges:
`PubKey`, `Timestamp`, `Signature`, `Flags`, `Latitude`, `Longitude`,
`Name`
- Wired into `handlePacketDetail` (was `struct{}{}`)
- Also adds per-section color classes to the field breakdown table
(`section-header`, `section-transport`, `section-path`,
`section-payload`) so the table rows get matching background tints
## Test plan
- [x] Open any packet detail pane — hex dump should show color-coded
sections (red header, orange path length, blue transport codes, green
path hops, yellow/colored payload)
- [x] Legend below action buttons should appear with color swatches
- [x] ADVERT packets: PubKey/Timestamp/Signature/Flags each get their
own distinct color
- [x] Field breakdown table section header rows should be tinted per
section
- [x] 8 new Go tests: all pass
Closes#329🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Problem
Every PR that touches `public/` files requires manually bumping cache
buster timestamps in `index.html` (e.g. `?v=1775111407`). Since all PRs
change the same lines in the same file, this causes **constant merge
conflicts** — it's been the #1 source of unnecessary PR friction.
## Solution
Replace all hardcoded `?v=TIMESTAMP` values in `index.html` with a
`?v=__BUST__` placeholder. The Go server replaces `__BUST__` with the
current Unix timestamp **once at startup** when it reads `index.html`,
then serves the pre-processed HTML from memory.
Every server restart automatically picks up fresh cache busters — no
manual intervention needed.
## What changed
| File | Change |
|------|--------|
| `public/index.html` | All `v=1775111407` → `v=__BUST__` (28
occurrences) |
| `cmd/server/main.go` | `spaHandler` reads index.html at init, replaces
`__BUST__` with Unix timestamp, serves from memory for `/`,
`/index.html`, and SPA fallback |
| `cmd/server/helpers_test.go` | New `TestSpaHandlerCacheBust` —
verifies placeholder replacement works for root, SPA fallback, and
direct `/index.html` requests. Also added tests for root `/` and
`/index.html` routes |
| `AGENTS.md` | Rule 3 updated: cache busters are now automatic, agents
should not manually edit them |
## Testing
- `go build ./...` — compiles cleanly
- `go test ./...` — all tests pass (including new cache-bust tests)
- `node test-frontend-helpers.js && node test-packet-filter.js && node
test-aging.js` — all frontend tests pass
- No hardcoded timestamps remain in `index.html`
---------
Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: you <you@example.com>
## Summary
Fixes#433 — Replace the inaccurate Euclidean distance approximation in
`analytics.js` hop distances with proper haversine calculation, matching
the server-side computation introduced in PR #415.
## Problem
PR #415 moved collision analysis server-side and switched from the
frontend's Euclidean approximation (`dLat×111, dLon×85`) to proper
haversine. However, the **hop distance** calculation in `analytics.js`
(subpath detail panel) still used the old Euclidean formula. This
caused:
- **Inconsistent distances** between hop distances and collision
distances
- **Significant errors at high latitudes** — e.g., Oslo→Stockholm:
Euclidean gives ~627km, haversine gives ~415km (51% error)
- The `dLon×85` constant assumes ~40° latitude; at 60° latitude the real
scale factor is ~55.5km/degree, not 85
## Changes
| File | Change |
|------|--------|
| `public/analytics.js` | Replace `dLat*111, dLon*85` Euclidean with
`HopResolver.haversineKm()` (with inline fallback) |
| `public/hop-resolver.js` | Export `haversineKm` in the public API for
reuse |
| `test-frontend-helpers.js` | Add 4 tests: export check, zero distance,
SF→LA accuracy, Euclidean vs haversine divergence |
| `cmd/server/helpers_test.go` | Add `TestHaversineKm`: zero, SF→LA,
symmetry, Oslo→Stockholm accuracy |
| `public/index.html` | Cache buster bump |
## Performance
No performance impact — `haversineKm` replaces an inline arithmetic
expression with another inline arithmetic expression of identical O(1)
complexity. Only called per hop pair in the subpath detail panel
(typically <10 hops).
## Testing
- `node test-frontend-helpers.js` — 248 passed, 0 failed
- `go test -run TestHaversineKm` — PASS
Co-authored-by: you <you@example.com>
## Summary
The `/api/analytics/hash-collisions` endpoint always returned global
results, ignoring the active region filter. Every other analytics
endpoint (RF, topology, hash-sizes, channels, distance, subpaths)
respected the `?region=` query parameter — this was the only one that
didn't.
Fixes#438
## Changes
### Backend (`cmd/server/`)
- **routes.go**: Extract `region` query param and pass to
`GetAnalyticsHashCollisions(region)`
- **store.go**:
- `collisionCache` changed from `*cachedResult` →
`map[string]*cachedResult` (keyed by region, `""` = global) — consistent
with `rfCache`, `topoCache`, etc.
- `GetAnalyticsHashCollisions(region)` and
`computeHashCollisions(region)` now accept a region parameter
- When region is specified, resolves regional observers, scans packets
for nodes seen by those observers, and filters the node list before
computing collisions
- Cache invalidation updated to clear the map (not set to nil)
### Frontend (`public/`)
- **analytics.js**: The hash-collisions fetch was missing `+ sep` (the
region query string). All other fetches in the same `Promise.all` block
had it — this was simply overlooked in PR #415.
- **index.html**: Cache busters bumped
### Tests (`cmd/server/routes_test.go`)
- `TestHashCollisionsRegionParamIgnored` → renamed to
`TestHashCollisionsRegionParam` with updated comments reflecting that
region is now accepted (with no configured regional observers, results
match global — which the test verifies)
## Performance
No new hot-path work. Region filtering adds one scan of `s.packets`
(same as every other region-filtered analytics endpoint) only when
`?region=` is provided. Results are cached per-region with the existing
60s TTL. Without `?region=`, behavior is unchanged.
Co-authored-by: you <you@example.com>
## Summary
Fixes#361 — `perfMiddleware()` wrote to shared `PerfStats` fields
(`Requests`, `TotalMs`, `Endpoints` map, `SlowQueries` slice) without
any synchronization, causing data races under concurrent HTTP requests.
## Changes
### `cmd/server/routes.go`
- **Added `sync.Mutex` to `PerfStats` struct** — single mutex protects
all fields
- **`perfMiddleware`** — all shared state mutations (counter increments,
endpoint map access, slice appends) now happen under lock. Key
normalization (regex, mux route lookup) moved outside the lock since it
uses no shared state
- **`handleHealth`** — snapshots `Requests`, `TotalMs`, `SlowQueries`
under lock before building response
- **`handlePerf`** — copies all endpoint data and slow queries under
lock into local snapshots, then does expensive work (sorting, percentile
calculation) outside the lock
- **`handlePerfReset`** — resets fields in-place instead of replacing
the pointer (avoids unlocking a different mutex)
### `cmd/server/perfstats_race_test.go` (new)
- Regression test: 50 concurrent writer goroutines + 10 concurrent
reader goroutines hammering `PerfStats` simultaneously
- Verifies no race conditions (via `-race` flag) and counter consistency
## Design Decisions
- **Single mutex over atomics**: The issue suggested `atomic.Int64` for
counters, but since slices/maps need a mutex anyway, a single mutex is
simpler and the critical section is small (microseconds). No measurable
contention at CoreScope's scale.
- **Copy-under-lock pattern**: Expensive operations (sorting, percentile
computation) happen outside the lock to minimize hold time.
- **In-place reset**: `handlePerfReset` clears fields rather than
replacing the `PerfStats` pointer, ensuring the mutex remains valid for
concurrent goroutines.
## Testing
- `go test -race -count=1 ./cmd/server/...` — **PASS** (all existing
tests + new race test)
- New `TestPerfStatsConcurrentAccess` specifically validates concurrent
access patterns
Co-authored-by: you <you@example.com>