## Summary
Adds collapsible/minimizable UI panels on the live map page so overlay
panels don't block map content on medium-sized screens.
Fixes#279
## Changes
### Collapsible Legend Panel (all screen sizes)
- The legend toggle button (🎨/✕) is now visible at **all** screen sizes,
not just mobile
- Clicking it smoothly collapses/expands the legend with a CSS
transition
- Collapsed state persists in `localStorage` (`live-legend-hidden`)
- Feed panel already had hide/show with localStorage — no changes needed
there
### Medium Breakpoint (768px)
New `@media (max-width: 768px)` rules for tablet/small laptop screens:
- Feed panel: 360px → 280px wide, max-height 340px → 200px
- Node detail panel: 320px → 260px wide
- Legend: smaller font (10px) and tighter padding
- Header: reduced gap and padding
- Stats/toggles: smaller font sizes
### What's NOT changed
- Mobile (≤640px): existing behavior preserved (feed/legend hidden
entirely)
- Desktop (>768px): no changes — panels render at full size as before
## Testing
- `test-packet-filter.js`: 62 passed
- `test-aging.js`: 29 passed
- `test-frontend-helpers.js`: 445 passed
---------
Co-authored-by: you <you@example.com>
The button click handler used document.getElementById() which fails on
/packet/[ID] pages because renderDetail() runs before the container is
appended to the DOM. Changed to panel.querySelector() which searches
within the detached element tree.
Fixes#601
## M2: Airtime + Channel Quality + Battery Charts
Implements M2 of #600 — server-side delta computation and three new
charts in the RF Health detail view.
### Backend Changes
**Delta computation** for cumulative counters (`tx_air_secs`,
`rx_air_secs`, `recv_errors`):
- Computes per-interval deltas between consecutive samples
- **Reboot handling:** detects counter reset (current < previous), skips
that delta, records reboot timestamp
- **Gap handling:** if time between samples > 2× interval, inserts null
(no interpolation)
- Returns `tx_airtime_pct` and `rx_airtime_pct` as percentages
(delta_secs / interval_secs × 100)
- Returns `recv_error_rate` as delta_errors / (delta_recv +
delta_errors) × 100
**`resolution` query param** on `/api/observers/{id}/metrics`:
- `5m` (default) — raw samples
- `1h` — hourly aggregates (GROUP BY hour with AVG/MAX)
- `1d` — daily aggregates
**Schema additions:**
- `packets_sent` and `packets_recv` columns added to `observer_metrics`
(migration)
- Ingestor parses these fields from MQTT stats messages
**API response** now includes:
- `tx_airtime_pct`, `rx_airtime_pct`, `recv_error_rate` (computed
deltas)
- `reboots` array with timestamps of detected reboots
- `is_reboot_sample` flag on affected samples
### Frontend Changes
Three new charts in the RF Health detail view, stacked vertically below
noise floor:
1. **Airtime chart** — TX (red) + RX (blue) as separate SVG lines,
Y-axis 0-100%, direct labels at endpoints
2. **Error Rate chart** — `recv_error_rate` line, shown only when data
exists
3. **Battery chart** — voltage line with 3.3V low reference, shown only
when battery_mv > 0
All charts:
- Share X-axis and time range (aligned vertically)
- Reboot markers as vertical hairlines spanning all charts
- Direct labels on data (no legends)
- Resolution auto-selected: `1h` for 7d/30d ranges
- Charts hidden when no data exists
### Tests
- `TestComputeDeltas`: normal deltas, reboot detection, gap detection
- `TestGetObserverMetricsResolution`: 5m/1h/1d downsampling verification
- Updated `TestGetObserverMetrics` for new API signature
---------
Co-authored-by: you <you@example.com>
- Change RF Health detail view from bottom-of-page to a right-sliding side panel
- Grid stays visible and stable when detail is open (no layout shift)
- Click another observer updates panel in place; close button (×) dismisses
- On mobile (<640px): panel stacks below grid at full width
- Filter out observers with insufficient data (<2 sparkline points) from grid entirely
- Follows the same split-layout pattern used by the nodes page
## RF Health Dashboard — M1: Observer Metrics Storage, API & Small
Multiples Grid
Implements M1 of #600.
### What this does
Adds a complete RF health monitoring pipeline: MQTT stats ingestion →
SQLite storage → REST API → interactive dashboard with small multiples
grid.
### Backend Changes
**Ingestor (`cmd/ingestor/`)**
- New `observer_metrics` table via migration system (`_migrations`
pattern)
- Parse `tx_air_secs`, `rx_air_secs`, `recv_errors` from MQTT status
messages (same pattern as existing `noise_floor` and `battery_mv`)
- `INSERT OR REPLACE` with timestamps rounded to nearest 5-min interval
boundary (using ingestor wall clock, not observer timestamps)
- Missing fields stored as NULLs — partial data is always better than no
data
- Configurable retention pruning: `retention.metricsDays` (default 30),
runs on startup + every 24h
**Server (`cmd/server/`)**
- `GET /api/observers/{id}/metrics?since=...&until=...` — per-observer
time-series data
- `GET /api/observers/metrics/summary?window=24h` — fleet summary with
current NF, avg/max NF, sample count
- `parseWindowDuration()` supports `1h`, `24h`, `3d`, `7d`, `30d` etc.
- Server-side metrics retention pruning (same config, staggered 2min
after packet prune)
### Frontend Changes
**RF Health tab (`public/analytics.js`, `public/style.css`)**
- Small multiples grid showing all observers simultaneously — anomalies
pop out visually
- Per-observer cell: name, current NF value, battery voltage, sparkline,
avg/max stats
- NF status coloring: warning (amber) at ≥-100 dBm, critical (red) at
≥-85 dBm — text color only, no background fills
- Click any cell → expanded detail view with full noise floor line chart
- Reference lines with direct text labels (`-100 warning`, `-85
critical`) — not color bands
- Min/max points labeled directly on the chart
- Time range selector: preset buttons (1h/3h/6h/12h/24h/3d/7d/30d) +
custom from/to datetime picker
- Deep linking: `#/analytics?tab=rf-health&observer=...&range=...`
- All charts use SVG, matching existing analytics.js patterns
- Responsive: 3-4 columns on desktop, 1 on mobile
### Design Decisions (from spec)
- Labels directly on data, not in legends
- Reference lines with text labels, not color bands
- Small multiples grid, not card+accordion (Tufte: instant visual fleet
comparison)
- Ingestor wall clock for all timestamps (observer clocks may drift)
### Tests Added
**Ingestor tests:**
- `TestRoundToInterval` — 5 cases for rounding to 5-min boundaries
- `TestInsertMetrics` — basic insertion with all fields
- `TestInsertMetricsIdempotent` — INSERT OR REPLACE deduplication
- `TestInsertMetricsNullFields` — partial data with NULLs
- `TestPruneOldMetrics` — retention pruning
- `TestExtractObserverMetaNewFields` — parsing tx_air_secs, rx_air_secs,
recv_errors
**Server tests:**
- `TestGetObserverMetrics` — time-series query with since/until filters,
NULL handling
- `TestGetMetricsSummary` — fleet summary aggregation
- `TestObserverMetricsAPIEndpoints` — DB query verification
- `TestMetricsAPIEndpoints` — HTTP endpoint response shape
- `TestParseWindowDuration` — duration parsing for h/d formats
### Test Results
```
cd cmd/ingestor && go test ./... → PASS (26s)
cd cmd/server && go test ./... → PASS (5s)
```
### What's NOT in this PR (deferred to M2+)
- Server-side delta computation for cumulative counters
- Airtime charts (TX/RX percentage lines)
- Channel quality chart (recv_error_rate)
- Battery voltage chart
- Reboot detection and chart annotations
- Resolution downsampling (1h, 1d aggregates)
- Pattern detection / automated diagnosis
---------
Co-authored-by: you <you@example.com>
## Summary
- Adds a new **Prefix Tool** tab to the Analytics page (alongside Hash
Stats / Hash Issues)
- **Network Overview**: per-tier collision stats (1/2/3-byte) and a
network-size-based recommendation — collapsible, folded by default
- **Prefix Checker**: accepts a 1/2/3-byte hex prefix or full public
key; shows colliding nodes at each tier with severity badges (✅ / ⚠️ /
🔴); clicking a node navigates to its detail page
- **Prefix Generator**: picks a random collision-free prefix at the
chosen hash size; links to
[meshcore-web-keygen](https://agessaman.github.io/meshcore-web-keygen/)
with the prefix pre-filled
- **Hash Issues tab**: adds a "🔎 Check a prefix →" shortcut in the nav
- **Deep-link support**: `#/analytics?tab=prefix-tool&prefix=A3F1`
pre-fills and runs the checker; `?generate=2` pre-selects and runs the
generator
- **No new API endpoints** — 100% client-side using the existing
`/nodes` list
## Verification
Live on staging:
**https://staging.on8ar.eu/#/analytics?tab=prefix-tool**
## Test plan
- [x] Network Overview card is collapsed by default; expands on click;
stats are correct
- [x] Prefix Checker: 2-char input shows 1-byte results; 4-char shows
2-byte; 6-char shows 3-byte; 64-char pubkey shows all three tiers
- [x] Prefix Checker: invalid hex shows error; odd-length input shows
error
- [x] Prefix Generator: Generate picks an unused prefix; "Try another"
cycles; keygen link opens with prefix pre-filled
- [x] Deep link `?prefix=A3F1` pre-fills checker and scrolls to it
- [x] Deep link `?generate=2` pre-selects 2-byte and runs generator
- [x] Hash Issues tab shows "🔎 Check a prefix →" in the nav
- [x] FAQ link at bottom of generator opens correct MeshCore docs anchor
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
The error-state `<tbody>` row (shown when packet loading fails)
hardcoded `colspan="10"`, while the virtual scroll spacers and the
empty-state row both use `_getColCount()` (which reads from the actual
`<thead>` and falls back to 11). One-line fix: replace the hardcoded
value with `_getColCount()`.
Fixes#406
## Test plan
- [x] Trigger the error state (e.g. kill the backend mid-load) — error
row should span all columns with no gap on the right
- [x] `node test-packets.js` — 72 passed, 0 failed
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
- Replace full \`tbody\` teardown+rebuild on every scroll frame with a
range-diff that only adds/removes the delta rows at the edges of the
visible window
- \`buildFlatRowHtml\` / \`buildGroupRowHtml\` now accept an
\`entryIdx\` parameter and emit \`data-entry-idx\` on every \`<tr>\` so
the diff can target rows precisely (including expanded group children)
- Full rebuild is retained for initial render and large scroll jumps
past the buffer (no range overlap)
- Also loads \`packet-helpers.js\` in the test sandbox, fixing 7
pre-existing test failures for the builder functions; adds 4 new tests
covering \`data-entry-idx\` output
Fixes#414
## Test plan
- [x] Open packets page with 500+ packets, scroll rapidly — DOM
inspector should show incremental \`<tr>\` adds/removes rather than full
\`tbody\` teardown
- [x] Expand a grouped packet, scroll away and back — expanded children
re-render correctly
- [x] Large scroll jump (jump to bottom via scrollbar) — full rebuild
fires, no visual glitch
- [x] \`node test-packets.js\` — 72 passed, 0 failed
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: you <you@example.com>
Weak passphrases with no KDF stretching are the #1 practical threat.
Timestamp in plaintext block 0 serves as known-plaintext oracle for
instant key verification from a single captured packet.
Key findings:
- decode_base64() output used directly as AES key, no KDF
- Short passphrases produce <16 byte keys (reduced key space)
- No salt means global precomputed attacks work
- 3-word passphrase crackable in ~2 min on commodity GPU
Reviewed by djb and Dijkstra personas. Corrections applied:
- GPU throughput upgraded from 10^9 to 10^10 AES/sec baseline
- Oracle strengthened: bytes 4+ (type byte, sender name) also predictable
- Dictionary size assumptions made explicit
- Zipf's law caveat added (humans don't choose uniformly)
- base64 short-passphrase key truncation issue documented
Formal analysis of MeshCore's ECB encryption for channel and direct messages.
Reviewed by djb and Dijkstra expert personas through 3 revisions.
Key findings:
- Block 0 has accidental nonce (4-byte timestamp) preventing repetition
- Blocks 1+ are pure deterministic ECB with no nonce — vulnerable to
frequency analysis for repeated message content
- Partial final block attack: zero-padding reduces search space
- HMAC key reuse: AES key is first 16 bytes of HMAC key (same material)
- Recommended fix: switch to AES-128-CTR mode
## Summary
`txToMap()` previously always allocated observation sub-maps for every
packet, even though the `/api/packets` handler immediately stripped them
via `delete(p, "observations")` unless `expand=observations` was
requested. A typical page of 50 packets with ~5 observations each caused
300+ unnecessary map allocations per request.
## Changes
- **`txToMap`**: Add variadic `includeObservations bool` parameter.
Observations are only built when `true` is passed, eliminating
allocations when they'd just be discarded.
- **`PacketQuery`**: Add `ExpandObservations bool` field to thread the
caller's intent through the query pipeline.
- **`routes.go`**: Set `ExpandObservations` based on
`expand=observations` query param. Removed the post-hoc `delete(p,
"observations")` loop — observations are simply never created when not
requested.
- **Single-packet lookups** (`GetPacketByID`, `GetPacketByHash`): Always
pass `true` since detail views need observations.
- **Multi-node/analytics queries**: Default (no flag) = no observations,
matching prior behavior.
## Testing
- Added `TestTxToMapLazyObservations` covering all three cases: no flag,
`false`, and `true`.
- All existing tests pass (`go test ./...`).
## Perf Impact
Eliminates ~250 observation map allocations per /api/packets request (at
default page size of 50 with ~5 observations each). This is a
constant-factor improvement per request — no algorithmic complexity
change.
Fixes#374
Co-authored-by: you <you@example.com>
## Summary
Optimizes `QueryGroupedPackets()` in `store.go` to eliminate two major
inefficiencies on every grouped packet list request:
### Changes
1. **Cache `UniqueObserverCount` on `StoreTx`** — Instead of iterating
all observations to count unique observers on every query
(O(total_observations) per request), we now track unique observers at
ingest time via an `observerSet` map and pre-computed
`UniqueObserverCount` field. This is updated incrementally as
observations arrive.
2. **Defer map construction until after pagination** — Previously,
`map[string]interface{}` was built for ALL 30K+ filtered results before
sorting and paginating. Now the grouped cache stores sorted `[]*StoreTx`
pointers (lightweight), and `groupedTxsToPage()` builds maps only for
the requested page (typically 50 items). This eliminates ~30K map
allocations per cache miss.
3. **Lighter cache footprint** — The grouped cache now stores
`[]*StoreTx` instead of `*PacketResult` with pre-built maps, reducing
memory pressure and GC work.
### Complexity
- Observer counting: O(1) per query (was O(total_observations))
- Map construction: O(page_size) per query (was O(n) where n = all
filtered results)
- Sort remains O(n log n) on cache miss, but the cache (3s TTL) absorbs
repeated requests
### Testing
- `cd cmd/server && go test ./...` — all tests pass
- `cd cmd/ingestor && go build ./...` — builds clean
Fixes#370
---------
Co-authored-by: you <you@example.com>
## Summary
Replace `time.Tick()` with `time.NewTicker()` in the auto-prune
goroutine so it stops cleanly during graceful shutdown.
## Problem
`time.Tick` creates a ticker that can never be garbage collected or
stopped. While the prune goroutine runs for the process lifetime, it
won't stop during graceful shutdown — the goroutine leaks past the
shutdown sequence.
## Fix
- Create a `time.NewTicker` and a done channel
- Use `select` to listen on both the ticker and done channel
- Stop the ticker and close the done channel in the shutdown path (after
`poller.Stop()`)
- Pattern matches the existing `StartEvictionTicker()` approach
## Testing
- `go build ./...` — compiles cleanly
- `go test ./...` — all tests pass
Fixes#377
Co-authored-by: you <you@example.com>
## Summary
Combines the chained `filterTxSlice` calls in `filterPackets()` into a
single pass over the packet slice.
## Problem
When multiple filter parameters are specified (e.g.,
`type=4&route=1&since=...&until=...`), each filter created a new
intermediate `[]*StoreTx` slice. With N filters, this meant N separate
scans and N-1 unnecessary allocations.
## Fix
All filter predicates (type, route, observer, hash, since, until,
region, node) are pre-computed before the loop, then evaluated in a
single `filterTxSlice` call. This eliminates all intermediate
allocations.
**Preserved behavior:**
- Fast-path index lookups for hash-only and observer-only queries remain
unchanged
- Node-only fast-path via `byNode` index preserved
- All existing filter semantics maintained (same comparison operators,
same null checks)
**Complexity:** Single `O(n)` pass regardless of how many filters are
active, vs previous `O(n * k)` where k = number of active filters (each
pass is O(n) but allocates).
## Testing
All existing tests pass (`cd cmd/server && go test ./...`).
Fixes#373
Co-authored-by: you <you@example.com>
## Summary
Sort `snrVals` and `rssiVals` once upfront in `computeAnalyticsRF()` and
read min/max/median directly from the sorted slices, instead of copying
and sorting per stat call.
## Changes
- Sort both slices once before computing stats (2 sorts total instead of
4+ copy+sorts)
- Read `min` from `sorted[0]`, `max` from `sorted[len-1]`, `median` from
`sorted[len/2]`
- Remove the now-unused `sortedF64` and `medianF64` helper closures
## Performance impact
With 100K+ observations, this eliminates multiple O(n log n) copy+sort
operations. Previously each call to `medianF64` did a full copy + sort,
and `minF64`/`maxF64` did O(n) scans on the unsorted array. Now: 2
in-place sorts total, O(1) lookups for min/max/median.
Fixes#366
Co-authored-by: you <you@example.com>
## Summary
`EvictStale()` was doing O(n) linear scans per evicted item to remove
from secondary indexes (`byObserver`, `byPayloadType`, `byNode`).
Evicting 1000 packets from an observer with 50K observations meant 1000
× 50K = 50M comparisons — all under a write lock.
## Fix
Replace per-item removal with batch single-pass filtering:
1. **Collect phase**: Walk evicted packets once, building sets of
evicted tx IDs, observation IDs, and affected index keys
2. **Filter phase**: For each affected index slice, do a single pass
keeping only non-evicted entries
**Before**: O(evicted_count × index_slice_size) per index — quadratic in
practice
**After**: O(evicted_count + index_slice_size) per affected key — linear
## Changes
- `cmd/server/store.go`: Restructured `EvictStale()` eviction loop into
collect + batch-filter pattern
## Testing
- All existing tests pass (`cd cmd/server && go test ./...`)
Fixes#368
Co-authored-by: you <you@example.com>
## Summary
`QueryMultiNodePackets()` was scanning ALL packets with
`strings.Contains` on JSON blobs — O(packets × pubkeys × json_length).
With 30K+ packets and multiple pubkeys, this caused noticeable latency
on `/api/packets?nodes=...`.
## Fix
Replace the full scan with lookups into the existing `byNode` index,
which already maps pubkeys to their transmissions. Merge results with
hash-based deduplication, then apply time filters.
**Before:** O(N × P × J) where N=all packets, P=pubkeys, J=avg JSON
length
**After:** O(M × P) where M=packets per pubkey (typically small), plus
O(R log R) sort for pagination correctness
Results are sorted by `FirstSeen` after merging to maintain the
oldest-first ordering expected by the pagination logic.
Fixes#357
Co-authored-by: you <you@example.com>
## Problem
`GetNodeAnalytics()` in `store.go` scans ALL 30K+ packets doing
`strings.Contains` on every JSON blob when the node has a name, then
filters by time range *after* the full scan. This is `O(packets ×
json_length)` on every `/api/nodes/{pubkey}/analytics` request.
## Fix
Move the `fromISO` time check inside the scan loop so old packets are
skipped **before** the expensive `strings.Contains` matching. For the
non-name path (indexed-only), the time filter is also applied inline,
eliminating the separate `allPkts` intermediate slice.
### Before
1. Scan all packets → collect matches (including old ones) → `allPkts`
2. Filter `allPkts` by time → `packets`
### After
1. Scan packets, skip `tx.FirstSeen <= fromISO` immediately → `packets`
This avoids `strings.Contains` calls on packets outside the requested
time window (typically 7 days out of months of data).
## Complexity
- **Before:** `O(total_packets × avg_json_length)` for name matching
- **After:** `O(recent_packets × avg_json_length)` — only packets within
the time window are string-matched
## Testing
- `cd cmd/server && go test ./...` — all tests pass
Fixes#367
Co-authored-by: you <you@example.com>
## Summary
Consolidates the 4 parallel `/api/analytics/subpaths` calls in the Route
Patterns tab into a single `/api/analytics/subpaths-bulk` endpoint,
eliminating 3 redundant server-side scans of the subpath index on cache
miss.
## Changes
### Backend (`cmd/server/routes.go`, `cmd/server/store.go`)
- New `GET
/api/analytics/subpaths-bulk?groups=2-2:50,3-3:30,4-4:20,5-8:15`
endpoint
- Groups format: `minLen-maxLen:limit` comma-separated
- `GetAnalyticsSubpathsBulk()` iterates `spIndex` once, bucketing
entries into per-group accumulators by hop length
- Hop name resolution is done once per raw hop and shared across groups
- Results are cached per-group for compatibility with existing
single-key cache lookups
- Region-filtered queries fall back to individual
`GetAnalyticsSubpaths()` calls (region filtering requires
per-transmission observer checks)
### Frontend (`public/analytics.js`)
- `renderSubpaths()` now makes 1 API call instead of 4
- Response shape: `{ results: [{ subpaths, totalPaths }, ...] }` —
destructured into the same `[d2, d3, d4, d5]` variables
### Tests (`cmd/server/routes_test.go`)
- `TestAnalyticsSubpathsBulk`: validates 3-group response shape, missing
params error, invalid format error
## Performance
- **Before:** 4 API calls → 4 scans of `spIndex` + 4× hop resolution on
cache miss
- **After:** 1 API call → 1 scan of `spIndex` + 1× hop resolution
(shared cache)
- Cache miss cost reduced by ~75% for this tab
- No change on cache hit (individual group caching still works)
Fixes#398
Co-authored-by: you <you@example.com>
## Summary
Fixes the N+1 API call pattern when changing observation sort mode on
the packets page. Previously, switching sort to Path or Time fired
individual `/api/packets/{hash}` requests for **every**
multi-observation group without cached children — potentially 100+
concurrent requests.
## Changes
### Backend: Batch observations endpoint
- **New endpoint:** `POST /api/packets/observations` accepts `{"hashes":
["h1", "h2", ...]}` and returns all observations keyed by hash in a
single response
- Capped at 200 hashes per request to prevent abuse
- 4 test cases covering empty input, invalid JSON, too-many-hashes, and
valid requests
### Frontend: Use batch endpoint
- `packets.js` sort change handler now collects all hashes needing
observation data and sends a single POST request instead of N individual
GETs
- Same behavior, single round-trip
## Performance
- **Before:** Changing sort with 100 visible groups → 100 concurrent API
requests, browser connection queueing (6 per host), several seconds of
lag
- **After:** Single POST request regardless of group count, response
time proportional to store lookup (sub-millisecond per hash in memory)
Fixes#389
---------
Co-authored-by: you <you@example.com>
## Summary
Coalesce WS-triggered `renderTableRows()` calls using
`requestAnimationFrame` instead of `setTimeout` debouncing.
Fixes#396
## Problem
During high WebSocket throughput, multiple WS batches could each trigger
a `renderTableRows()` call via `setTimeout(..., 200)`. With rapid
batches, this caused the 50K-row table to be fully rebuilt every few
hundred milliseconds, causing UI jank.
## Solution
Replace the `setTimeout`-based debounce with a `requestAnimationFrame`
coalescing pattern:
1. **`scheduleWSRender()`** — sets a dirty flag and schedules a single
rAF callback
2. **Dirty flag** — multiple WS batches within the same frame just set
the flag; only one render fires
3. **Cleanup** — `destroy()` cancels any pending rAF and resets the
dirty flag
This ensures at most **one `renderTableRows()` per animation frame**
(~16ms), regardless of how many WS batches arrive.
## Performance justification
- **Before:** Each WS batch → `setTimeout(renderTableRows, 200)` — N
batches in <200ms = N renders
- **After:** N batches in one frame → 1 render on next rAF (~16ms)
- Worst case goes from O(N) renders per second to O(60) renders per
second (frame-capped)
## Changes
- `public/packets.js`: Add `scheduleWSRender()` with rAF + dirty flag;
replace setTimeout in WS handler; clean up in `destroy()`
- `test-frontend-helpers.js`: Update tests to verify rAF coalescing
pattern instead of setTimeout debounce
## Testing
- All existing tests pass (`npm test` — 0 failures)
- Updated 2 test cases to verify new rAF coalescing behavior
Co-authored-by: you <you@example.com>
## Summary
Compress `public/og-image.png` from **1,159,050 bytes (1.1MB)** to
**234,899 bytes (235KB)** — an **80% reduction**.
## What Changed
- Applied lossy PNG quantization via `pngquant` (quality 45-65, speed 1)
- Image dimensions unchanged: 1200×630px (standard OG image size)
- Visual quality remains suitable for social media previews
## Why
A 1.1MB OpenGraph image is excessive. Typical OG images are 50-200KB.
This reduces deployment size and Git repo bloat without affecting
functionality (browsers don't preload OG images).
## Testing
- Unit tests pass (`npm run test:unit`)
- No code changes — image-only commit
- `index.html` reference unchanged (`<meta property="og:image"
content="/og-image.png">`)
Fixes#397
Co-authored-by: you <you@example.com>
## Summary
Reduces the analytics nodes tab from 3 parallel API calls to 2 by
computing network status (active/degraded/silent counts) client-side
instead of fetching from `/nodes/network-status`.
## What Changed
**`public/analytics.js` — `renderNodesTab()`:**
- Removed the `/nodes/network-status` API call from the `Promise.all`
batch
- Added client-side computation of active/degraded/silent counts using
the shared `getHealthThresholds()` function from `roles.js`
- Uses `nodesResp.total` and `nodesResp.counts` (already returned by
`/nodes` endpoint) for total node count and role breakdown
## Why This Works
The `/nodes` response already includes:
- `total` — count of all matching nodes (server-computed across full DB)
- `counts` — role counts across all nodes (from `GetAllRoleCounts()`)
- Per-node `last_seen`/`last_heard` timestamps
The `getHealthThresholds()` function in `roles.js` provides the same
degraded/silent thresholds used server-side, so client-side status
computation produces equivalent results for the loaded node set.
## Performance
- **Before:** 3 parallel API calls (`/nodes`, `/nodes/bulk-health`,
`/nodes/network-status`)
- **After:** 2 parallel API calls (`/nodes`, `/nodes/bulk-health`)
- Network status computation is O(n) over the 200 loaded nodes —
negligible client-side cost
- The `/nodes/network-status` endpoint scanned ALL nodes in the DB on
every call; this eliminates that server-side work entirely
## Testing
- All frontend helper tests pass (445/445)
- All packet filter tests pass (62/62)
- All aging tests pass (29/29)
- All Go backend tests pass
Fixes#392
---------
Co-authored-by: you <you@example.com>
## Summary
Eliminates visible marker flicker on zoom/resize events in the map page
when displaying 500+ nodes.
## Problem
`renderMarkers()` was called on every `zoomend` and `resize` event,
which did `markerLayer.clearLayers()` followed by a full rebuild of all
markers. With many nodes, this caused a visible flash where all markers
disappeared briefly before being re-added.
## Solution
Instead of rebuilding all markers from scratch on zoom/resize:
1. **Store Leaflet layer references** on marker data objects
(`_leafletMarker`, `_leafletLine`, `_leafletDot`) during the initial
full render
2. **Add `_repositionMarkers()`** — re-runs `deconflictLabels()` at the
new zoom level and updates existing marker positions via
`setLatLng()`/`setLatLngs()` without clearing the layer group
3. **Debounce zoom/resize handlers** (150ms) to coalesce rapid events
during animated zooms
4. **Dynamically manage offset indicators** — adds/removes deconfliction
offset lines and dots as positions change at different zoom levels
Full `renderMarkers()` is still called for filter changes, data updates,
and theme changes — only zoom/resize uses the lightweight repositioning
path.
## Complexity
- `_repositionMarkers()`: O(n) — single pass over stored marker data
- `deconflictLabels()`: O(n × k) where k is max spiral offsets (48) —
unchanged
- No new API calls, no DOM rebuilds
Fixes#393
---------
Co-authored-by: you <you@example.com>
## Summary
`replayRecent()` in `live.js` fetched observation details for 8 packet
groups **sequentially** — each `await fetch()` waited for the previous
to complete before starting the next.
## Change
Replaced the sequential `for` loop with `Promise.all()` to fetch all 8
detail API calls **concurrently**. The mapping from results to live
packets is unchanged.
**Before:** 8 sequential fetches (total time ≈ sum of all request
durations)
**After:** 8 parallel fetches (total time ≈ max of all request
durations)
## Notes
- `replayRecent()` is currently disabled (commented out at line 856), so
this is dormant code — no runtime risk
- No behavioral change: same data mapping, same rendering, same VCR
buffer population
- All existing tests pass
Fixes#394
---------
Co-authored-by: you <you@example.com>
## Summary
Eliminates the N+1 API call storm when toggling off "Group by Hash" in
the packets table.
## Problem
When ungrouped mode was active, `loadPackets()` fired individual
`/api/packets/{hash}` requests for every multi-observation packet. With
200+ multi-obs packets, this created 200+ parallel HTTP requests —
overwhelming both browser connection limits and the server.
## Fix
The server already supports `expand=observations` on the `/api/packets`
endpoint, which returns observations inline. Instead of:
1. Always fetching grouped (`groupByHash=true`)
2. Then N+1 fetching each packet's children individually
We now:
1. Fetch grouped when grouped mode is active (`groupByHash=true`)
2. Fetch with `expand=observations` when ungrouped — **single API call**
3. Flatten observations client-side
**Result: 200+ API calls → 1 API call.**
## Changes
- `public/packets.js`: Replaced N+1 observation fetching loop with
single `expand=observations` query parameter, flatten inline
observations client-side.
## Testing
- All frontend tests pass (packet-filter: 62/62, frontend-helpers:
445/445)
- All Go backend tests pass
Fixes#382
Co-authored-by: you <you@example.com>
## Summary
`handleObservers()` in `routes.go` was calling `GetNodeLocations()`
which fetches ALL nodes from the DB just to match ~10 observer IDs
against node public keys. With 500+ nodes this is wasteful.
## Changes
- **`db.go`**: Added `GetNodeLocationsByKeys(keys []string)` — queries
only the rows matching the given public keys using a parameterized
`WHERE LOWER(public_key) IN (?, ?, ...)` clause.
- **`routes.go`**: `handleObservers` now collects observer IDs and calls
the targeted method instead of the full-table scan.
- **`coverage_test.go`**: Added `TestGetNodeLocationsByKeys` covering
known key, empty keys, and unknown key cases.
## Performance
With ~10 observers and 500+ nodes, the query goes from scanning all 500
rows to fetching only ~10. The original `GetNodeLocations()` is
preserved for any other callers.
Fixes#378
Co-authored-by: you <you@example.com>
## Summary
Skip `updateTimeline()` canvas redraws in `bufferPacket()` when the
browser tab is hidden (`_tabHidden === true`). Instead, batch-update the
timeline once when the tab becomes visible again via the
`visibilitychange` handler.
Fixes#385
## What Changed
**`public/live.js`** — two surgical edits:
1. **`bufferPacket()`**: Removed `updateTimeline()` call from the
`_tabHidden` early-return path. When the tab is backgrounded, packets
are still buffered (for VCR) but no canvas work is done.
2. **`visibilitychange` handler**: Added `updateTimeline()` call when
the tab is restored, so the timeline catches up in a single repaint
instead of N repaints (one per buffered packet).
## Performance Impact
At 5+ packets/sec with a backgrounded tab, this eliminates continuous
canvas redraws (`updateTimeline()` calls `ctx.clearRect` + full canvas
redraw + `updateTimelinePlayhead()`) that are invisible to the user. CPU
usage drops to near-zero for timeline rendering while backgrounded.
## Tests
All existing tests pass:
- `test-packet-filter.js` — 62 passed
- `test-aging.js` — 29 passed
- `test-frontend-helpers.js` — 445 passed
Co-authored-by: you <you@example.com>
## Summary
Replace N+1 per-hop DB queries in `handleResolveHops` with O(1) lookups
against the in-memory prefix map that already exists in the packet
store.
## Problem
Each hop in the `resolve-hops` API triggered a separate `SELECT ... LIKE
?` query against the nodes table. With 10 hops, that's 10 DB round-trips
— unnecessary when `getCachedNodesAndPM()` already maintains an
in-memory prefix map that can resolve hops instantly.
## Changes
- **routes.go**: Replace the per-hop DB query loop with `pm.m[hopLower]`
lookups from the prefix map. Convert `nodeInfo` → `HopCandidate` inline.
Remove unused `rows`/`sql.Scan` code.
- **store.go**: Add `InvalidateNodeCache()` method to force prefix map
rebuild (needed by tests that insert nodes after store initialization).
- **routes_test.go**: Give `TestResolveHopsAmbiguous` a proper store so
hops resolve via the prefix map.
- **resolve_context_test.go**: Call `InvalidateNodeCache()` after
inserting test nodes. Fix confidence assertion — with GPS candidates and
no affinity context, `resolveWithContext` correctly returns
`gps_preference` (previously masked because the prefix map didn't have
the test nodes).
## Complexity
O(1) per hop lookup via hash map vs O(n) DB scan per hop. No hot-path
impact — this endpoint is called on-demand, not in a render loop.
Fixes#369
---------
Co-authored-by: you <you@example.com>
## Summary
Replace full `buildDistanceIndex()` rebuild with incremental
`removeTxFromDistanceIndex`/`addTxToDistanceIndex` for only the
transmissions whose paths actually changed during
`IngestNewObservations`.
## Problem
When any transmission's best path changed during observation ingestion,
the **entire distance index was rebuilt** — iterating all 30K+ packets,
resolving all hops, and computing haversine distances. This
`O(total_packets × avg_hops)` operation ran under a write lock, blocking
all API readers.
A 30-second debounce (`distRebuildInterval`) was added in #557 to
mitigate this, but it only delayed the pain — the full rebuild still
happened, just less frequently.
## Fix
- Added `removeTxFromDistanceIndex(tx)` — filters out all
`distHopRecord` and `distPathRecord` entries for a specific transmission
- Added `addTxToDistanceIndex(tx)` — computes and appends new distance
records for a single transmission
- In `IngestNewObservations`, changed path-change handling to call
remove+add for each affected tx instead of marking dirty and waiting for
a full rebuild
- Removed `distDirty`, `distLast`, and `distRebuildInterval` since
incremental updates are cheap enough to apply immediately
## Complexity
- **Before:** `O(total_packets × avg_hops)` per rebuild (30K+ packets)
- **After:** `O(changed_txs × avg_hops + total_dist_records)` — the
remove is a linear scan of the distance slices, but only for affected
txs; the add is `O(hops)` per changed tx
The remove scan over `distHops`/`distPaths` slices is linear in slice
length, but this is still far cheaper than the full rebuild which also
does JSON parsing, hop resolution, and haversine math for every packet.
## Tests
- Updated `TestDistanceRebuildDebounce` →
`TestDistanceIncrementalUpdate` to verify incremental behavior and check
for duplicate path records
- All existing tests pass (`go test ./...` in both `cmd/server` and
`cmd/ingestor`)
Fixes#365
---------
Co-authored-by: you <you@example.com>
## Summary
Cache `resolveRegionObservers()` results with a 30-second TTL to
eliminate repeated database queries for region→observer ID mappings.
## Problem
`resolveRegionObservers()` queried the database on every call despite
the observers table changing infrequently (~20 rows). It's called from
10+ hot paths including `filterPackets()`, `GetChannels()`, and multiple
analytics compute functions. When analytics caches are cold, parallel
requests each hit the DB independently.
## Solution
- Added a dedicated `regionObsMu` mutex + `regionObsCache` map with 30s
TTL
- Uses a separate mutex (not `s.mu`) to avoid deadlocks — callers
already hold `s.mu.RLock()`
- Cache is lazily populated per-region and fully invalidated after TTL
expires
- Follows the same pattern as `getCachedNodesAndPM()` (30s TTL,
on-demand rebuild)
## Changes
- **`cmd/server/store.go`**: Added `regionObsMu`, `regionObsCache`,
`regionObsCacheTime` fields; rewrote `resolveRegionObservers()` to check
cache first; added `fetchAndCacheRegionObs()` helper
- **`cmd/server/coverage_test.go`**: Added
`TestResolveRegionObserversCaching` — verifies cache population, cache
hits, and nil handling for unknown regions
## Testing
- All existing Go tests pass (`go test ./...`)
- New test verifies caching behavior (population, hits, nil for unknown
regions)
Fixes#362
---------
Co-authored-by: you <you@example.com>
## Summary
`GetStoreStats()` ran 5 sequential DB queries on every call. This
combines them into **2 concurrent queries**:
1. **Node/observer counts** — single query using subqueries: `SELECT
(SELECT COUNT(*) FROM nodes WHERE ...), (SELECT COUNT(*) FROM nodes),
(SELECT COUNT(*) FROM observers)`
2. **Observation counts** — single query using conditional aggregation:
`SUM(CASE WHEN timestamp > ? THEN 1 ELSE 0 END)` scoped to the 24h
window, avoiding a full table scan for the 1h count
Both queries run concurrently via goroutines + `sync.WaitGroup`.
## What changed
- `cmd/server/store.go`: Rewrote `GetStoreStats()` — 5 sequential
`QueryRow` calls → 2 concurrent combined queries
- Error handling now propagates query errors instead of silently
ignoring them
## Performance justification
- **Before:** 5 sequential round-trips to SQLite, with 2 potentially
expensive `COUNT(*)` scans on the `observations` table
- **After:** 2 concurrent round-trips; the observation query scans the
24h window once instead of separately scanning for 1h and 24h
- The 10s cache (`statsTTL`) remains, so this fires at most once per 10s
— but when it does fire, it's ~2.5x fewer round-trips and the
observation scan is halved
## Tests
- `go test ./...` passes for both `cmd/server` and `cmd/ingestor`
Fixes#363
---------
Co-authored-by: you <you@example.com>
## Summary
Extracts a shared `fetchNodeDetail(pubkey)` helper in `nodes.js` that
fetches both `/nodes/{pubkey}` and `/nodes/{pubkey}/health` in parallel.
Both `selectNode()` (side panel) and `loadFullNode()` (full-screen view)
now call this single function instead of duplicating the fetch logic.
## What Changed
- **New:** `fetchNodeDetail(pubkey)` — shared async function that
returns node data with `.healthData` attached
- **Modified:** `loadFullNode()` — uses `fetchNodeDetail()` instead of
inline `Promise.all`
- **Modified:** `selectNode()` — uses `fetchNodeDetail()` instead of
inline `Promise.all`
## Why
The duplicate `api()` calls weren't a major perf issue (TTL caching
mitigates most cases), but the duplicated logic was unnecessary tech
debt. On mobile, `selectNode()` redirects to `loadFullNode()` via hash
change, so the two code paths could fire sequentially with expired
cache.
## Testing
- All frontend helper tests pass (445/445)
- All packet filter tests pass (62/62)
- All aging tests pass (29/29)
- No behavioral change — only code structure improvement
Fixes#391
Co-authored-by: you <you@example.com>
## Summary
`GetSubpathDetail()` iterated ALL packets to find those containing a
specific subpath — `O(packets × hops × subpath_length)`. With 30K+
packets this caused user-visible latency on every subpath detail click.
## Changes
### `cmd/server/store.go`
- Added `spTxIndex map[string][]*StoreTx` alongside existing `spIndex` —
tracks which transmissions contain each subpath key
- Extended `addTxToSubpathIndexFull()` and
`removeTxFromSubpathIndexFull()` to maintain both indexes simultaneously
- Original `addTxToSubpathIndex()`/`removeTxFromSubpathIndex()` wrappers
preserved for backward compatibility
- `buildSubpathIndex()` now populates both `spIndex` and `spTxIndex`
during `Load()`
- All incremental update sites (ingest, path change, eviction) use the
`Full` variants
- `GetSubpathDetail()` rewritten: direct `O(1)` map lookup on
`spTxIndex[key]` instead of scanning all packets
### `cmd/server/coverage_test.go`
- Added `TestSubpathTxIndexPopulated`: verifies `spTxIndex` is
populated, counts match `spIndex`, and `GetSubpathDetail` returns
correct results for both existing and non-existent subpaths
## Complexity
- **Before:** `O(total_packets × avg_hops × subpath_length)` per request
- **After:** `O(matched_txs)` per request (direct map lookup)
## Tests
All tests pass: `cmd/server` (4.6s), `cmd/ingestor` (25.6s)
Fixes#358
---------
Co-authored-by: you <you@example.com>
## Summary
`buildPrefixMap()` was generating map entries for every prefix length
from 2 to `len(pubkey)` (up to 64 chars), creating ~31 entries per node.
With 500 nodes that's ~15K map entries; with 1K+ nodes it balloons to
31K+.
## Changes
**`cmd/server/store.go`:**
- Added `maxPrefixLen = 8` constant — MeshCore path hops use 2–6 char
prefixes, 8 gives headroom
- Capped the prefix generation loop at `maxPrefixLen` instead of
`len(pk)`
- Added full pubkey as a separate map entry when key is longer than
`maxPrefixLen`, ensuring exact-match lookups (used by
`resolveWithContext`) still work
**`cmd/server/coverage_test.go`:**
- Added `TestPrefixMapCap` with subtests for:
- Short prefix resolution still works
- Full pubkey exact-match resolution still works
- Intermediate prefixes beyond the cap correctly return nil
- Short keys (≤8 chars) have all prefix entries
- Map size is bounded
## Impact
- Map entries per node: ~31 → ~8 (one per prefix length 2–8, plus one
full-key entry)
- Total map size for 500 nodes: ~15K entries → ~4K entries (~75%
reduction)
- No behavioral change for path hop resolution (2–6 char prefixes)
- No behavioral change for exact pubkey lookups
## Tests
All existing tests pass:
- `cmd/server`: ✅
- `cmd/ingestor`: ✅Fixes#364
---------
Co-authored-by: you <you@example.com>
## Summary
Index node path lookups in `handleNodePaths()` instead of scanning all
packets on every request.
## Problem
`handleNodePaths()` iterated ALL packets in the store (`O(total_packets
× avg_hops)`) with prefix string matching on every hop. This caused
user-facing latency on every node detail page load with 30K+ packets.
## Fix
Added a `byPathHop` index (`map[string][]*StoreTx`) that maps lowercase
hop prefixes and resolved full pubkeys to their transmissions. The
handler now does direct map lookups instead of a full scan.
### Index lifecycle
- **Built** during `Load()` via `buildPathHopIndex()`
- **Incrementally updated** during `IngestNewFromDB()` (new packets) and
`IngestNewObservations()` (path changes)
- **Cleaned up** during `EvictStale()` (packet removal)
### Query strategy
The handler looks up candidates from the index using:
1. Full pubkey (matches resolved hops from `resolved_path`)
2. 2-char prefix (matches short raw hops)
3. 4-char prefix (matches medium raw hops)
4. Any longer raw hops starting with the 4-char prefix
This reduces complexity from `O(total_packets × avg_hops)` to
`O(matching_txs + unique_hop_keys)`.
## Tests
- `TestNodePathsEndpointUsesIndex` — verifies the endpoint returns
correct results using the index
- `TestPathHopIndexIncrementalUpdate` — verifies add/remove operations
on the index
All existing tests pass.
Fixes#359
Co-authored-by: you <you@example.com>
## Summary
Fixes#566 — The "Inconsistent Hash Sizes" list on the Analytics page
included all node types and had no time window, causing false positives.
## Changes
### 1. Role filter on inconsistent nodes (`cmd/server/store.go`)
Added role filter to the `inconsistentNodes` loop in
`computeHashCollisions()` so only repeaters and room servers are
included. Companions are excluded since they were never affected by the
firmware bug. This matches the existing role filter on collision
bucketing from #441.
```go
// Before:
if cn.HashSizeInconsistent {
// After:
if cn.HashSizeInconsistent && (cn.Role == "repeater" || cn.Role == "room_server") {
```
### 2. 7-day time window on hash size computation
(`cmd/server/store.go`)
Added a 7-day recency cutoff to `computeNodeHashSizeInfo()`. Adverts
older than 7 days are now skipped, preventing legitimate historical
config changes (e.g., testing different byte sizes) from creating
permanent false positives.
### 3. Frontend description text (`public/analytics.js`)
Updated the description to reflect the filtered scope: now says
"Repeaters and room servers" instead of "Nodes", mentions the 7-day
window, and notes that companions are excluded.
## Tests
- `TestInconsistentNodesExcludesCompanions` — verifies companions are
excluded while repeaters and room servers are included
- `TestHashSizeInfoTimeWindow` — verifies adverts older than 7 days are
excluded from hash size computation
- Updated existing hash size tests to use recent timestamps (compatible
with the new time window)
- All existing tests pass: `cmd/server` ✅, `cmd/ingestor` ✅
## Perf justification
The time window filter adds a single string comparison per advert in the
scan loop — O(n) with a tiny constant. No impact on hot paths.
---------
Co-authored-by: you <you@example.com>
## Summary
Replace O(n) map iteration in `MaxTransmissionID()` and
`MaxObservationID()` with O(1) field lookups.
## What Changed
- Added `maxTxID` and `maxObsID` fields to `PacketStore`
- Updated `Load()`, `IngestNewFromDB()`, and `IngestNewObservations()`
to track max IDs incrementally as entries are added
- `MaxTransmissionID()` and `MaxObservationID()` now return the tracked
field directly instead of iterating the entire map
## Performance
Before: O(n) iteration over 30K+ map entries under a read lock
After: O(1) field return
## Tests
- Added `TestMaxTransmissionIDIncremental` verifying the incremental
field matches brute-force iteration over the maps
- All existing tests pass (`cmd/server` and `cmd/ingestor`)
Fixes#356
Co-authored-by: you <you@example.com>
## Summary
Adds a byte-size filter to the map page, allowing users to filter
repeater markers by their hash prefix size (1-byte, 2-byte, or 3-byte).
## What changed
**`public/map.js`** — single file change:
1. **New filter state**: Added `byteSize` to the `filters` object
(default: `'all'`), persisted in `localStorage`
2. **New UI section**: Added a "Byte Size" fieldset with button group
(`All | 1-byte | 2-byte | 3-byte`) in the map controls panel, between
"Node Types" and "Display"
3. **Filter logic**: In `_renderMarkersInner`, when `byteSize !==
'all'`, repeater nodes are filtered by their `hash_size` field.
Non-repeater nodes (companions, rooms, sensors) are unaffected — they
pass through regardless of the byte-size filter setting
4. **Event binding**: Button click handlers update the filter, persist
to localStorage, and re-render markers
## Design decisions
- **Client-side only** — no backend changes needed. The `hash_size`
field is already included in the `/api/nodes` response
- **Repeaters only** — byte size is a repeater configuration concept;
other node roles don't have configurable path prefix sizes
- **Matches existing pattern** — uses the same button-group UI as the
Status filter (All/Active/Stale)
- **`hash_size` defaults to 1** — consistent with how the rest of the
codebase treats missing `hash_size` (`node.hash_size || 1`)
## Performance
No new API calls. Filter is a simple string comparison inside the
existing `nodes.filter()` loop in `_renderMarkersInner` — O(1) per node,
negligible overhead.
Fixes#565
Co-authored-by: you <you@example.com>
## Problem
Closes#563. Addresses the *Packet store estimated memory* item in #559.
`estimatedMemoryMB()` used a hardcoded formula:
```go
return float64(len(s.packets)*5120+s.totalObs*500) / 1048576.0
```
This ignored three data structures that grow continuously with every
ingest cycle:
| Structure | Production size | Heap not counted |
|---|---|---|
| `distHops []distHopRecord` | 1,556,833 records | ~300 MB |
| `distPaths []distPathRecord` | 93,090 records | ~25 MB |
| `spIndex map[string]int` | 4,113,234 entries | ~400 MB |
Result: formula reported ~1.2 GB while actual heap was ~5 GB. With
`maxMemoryMB: 1024`, eviction calculated it only needed to shed ~200 MB,
removed a handful of packets, and stopped. Memory kept growing until the
OOM killer fired.
## Fix
Replace `estimatedMemoryMB()` with `runtime.ReadMemStats` so all data
structures are automatically counted:
```go
func (s *PacketStore) estimatedMemoryMB() float64 {
if s.memoryEstimator != nil {
return s.memoryEstimator()
}
var ms runtime.MemStats
runtime.ReadMemStats(&ms)
return float64(ms.HeapAlloc) / 1048576.0
}
```
Replace the eviction simulation loop (which re-used the same wrong
formula) with a proportional calculation: if heap is N× over budget,
evict enough packets to keep `(1/N) × 0.9` of the current count. The 0.9
factor adds a 10% buffer so the next ingest cycle doesn't immediately
re-trigger. All major data structures (distHops, distPaths, spIndex)
scale with packet count, so removing a fraction of packets frees roughly
the same fraction of total heap.
## Testing
- Updated `TestEvictStale_MemoryBasedEviction` to inject a deterministic
estimator via the new `memoryEstimator` field.
- Added `TestEvictStale_MemoryBasedEviction_UnderestimatedHeap`:
verifies that when actual heap is 5× over limit (the production failure
scenario), eviction correctly removes ~80%+ of packets.
```
=== RUN TestEvictStale_MemoryBasedEviction
[store] Evicted 538 packets (1076 obs)
--- PASS
=== RUN TestEvictStale_MemoryBasedEviction_UnderestimatedHeap
[store] Evicted 820 packets (1640 obs)
--- PASS
```
Full suite: `go test ./...` — ok (10.3s)
## Perf note
`runtime.ReadMemStats` runs once per eviction tick (every 60 s) and once
per `/api/perf/store` call. Cost is negligible.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
Fixes critical performance issue in neighbor graph computation that
consumed 65% of CPU (30+ seconds) on a 325K packet dataset.
## Changes
### Fix 1: Cache strings.ToLower results
- Added cachedToLower() helper that caches lowercased strings in a local
map
- Pubkeys repeat across hundreds of thousands of observations
- Pre-computes fromLower once per transaction instead of once per
observation
- **Impact:** Eliminates ~8.4s (25.3% CPU)
### Fix 2: Cache parsed DecodedJSON via StoreTx.ParsedDecoded()
- Added ParsedDecoded() method on StoreTx using sync.Once for
thread-safe lazy caching
- json.Unmarshal on decoded_json now runs at most once per packet
lifetime
- Result reused by extractFromNode, indexByNode, trackAdvertPubkey
- **Impact:** Eliminates ~8.8s (26.3% CPU)
### Fix 3: Extend neighbor graph TTL from 60s to 5 minutes
- The graph depends on traffic patterns, not individual packets
- Reduces rebuild frequency 5x
- **Impact:** ~80% reduction in sustained CPU from graph rebuilds
## Tests
- 7 new tests added, all 26+ existing neighbor graph tests pass
- BenchmarkBuildFromStore: 727us/op, 237KB/op, 6030 allocs/op
Related: #559
---------
Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: you <you@example.com>