Commit Graph

1244 Commits

Author SHA1 Message Date
Kpa-clawbot cd470dffbe perf: batch observation fetching to eliminate N+1 API calls on sort change (#586)
## Summary

Fixes the N+1 API call pattern when changing observation sort mode on
the packets page. Previously, switching sort to Path or Time fired
individual `/api/packets/{hash}` requests for **every**
multi-observation group without cached children — potentially 100+
concurrent requests.

## Changes

### Backend: Batch observations endpoint
- **New endpoint:** `POST /api/packets/observations` accepts `{"hashes":
["h1", "h2", ...]}` and returns all observations keyed by hash in a
single response
- Capped at 200 hashes per request to prevent abuse
- 4 test cases covering empty input, invalid JSON, too-many-hashes, and
valid requests

### Frontend: Use batch endpoint
- `packets.js` sort change handler now collects all hashes needing
observation data and sends a single POST request instead of N individual
GETs
- Same behavior, single round-trip

## Performance

- **Before:** Changing sort with 100 visible groups → 100 concurrent API
requests, browser connection queueing (6 per host), several seconds of
lag
- **After:** Single POST request regardless of group count, response
time proportional to store lookup (sub-millisecond per hash in memory)

Fixes #389

---------

Co-authored-by: you <you@example.com>
2026-04-04 10:18:40 -07:00
Kpa-clawbot 7ff89d8607 perf(packets): coalesce WS-triggered renders with requestAnimationFrame (#585)
## Summary

Coalesce WS-triggered `renderTableRows()` calls using
`requestAnimationFrame` instead of `setTimeout` debouncing.

Fixes #396

## Problem

During high WebSocket throughput, multiple WS batches could each trigger
a `renderTableRows()` call via `setTimeout(..., 200)`. With rapid
batches, this caused the 50K-row table to be fully rebuilt every few
hundred milliseconds, causing UI jank.

## Solution

Replace the `setTimeout`-based debounce with a `requestAnimationFrame`
coalescing pattern:

1. **`scheduleWSRender()`** — sets a dirty flag and schedules a single
rAF callback
2. **Dirty flag** — multiple WS batches within the same frame just set
the flag; only one render fires
3. **Cleanup** — `destroy()` cancels any pending rAF and resets the
dirty flag

This ensures at most **one `renderTableRows()` per animation frame**
(~16ms), regardless of how many WS batches arrive.

## Performance justification

- **Before:** Each WS batch → `setTimeout(renderTableRows, 200)` — N
batches in <200ms = N renders
- **After:** N batches in one frame → 1 render on next rAF (~16ms)
- Worst case goes from O(N) renders per second to O(60) renders per
second (frame-capped)

## Changes

- `public/packets.js`: Add `scheduleWSRender()` with rAF + dirty flag;
replace setTimeout in WS handler; clean up in `destroy()`
- `test-frontend-helpers.js`: Update tests to verify rAF coalescing
pattern instead of setTimeout debounce

## Testing

- All existing tests pass (`npm test` — 0 failures)
- Updated 2 test cases to verify new rAF coalescing behavior

Co-authored-by: you <you@example.com>
2026-04-04 10:18:09 -07:00
Kpa-clawbot 493849f2e3 perf(frontend): compress og-image.png from 1.1MB to 235KB (#584)
## Summary

Compress `public/og-image.png` from **1,159,050 bytes (1.1MB)** to
**234,899 bytes (235KB)** — an **80% reduction**.

## What Changed

- Applied lossy PNG quantization via `pngquant` (quality 45-65, speed 1)
- Image dimensions unchanged: 1200×630px (standard OG image size)
- Visual quality remains suitable for social media previews

## Why

A 1.1MB OpenGraph image is excessive. Typical OG images are 50-200KB.
This reduces deployment size and Git repo bloat without affecting
functionality (browsers don't preload OG images).

## Testing

- Unit tests pass (`npm run test:unit`)
- No code changes — image-only commit
- `index.html` reference unchanged (`<meta property="og:image"
content="/og-image.png">`)

Fixes #397

Co-authored-by: you <you@example.com>
2026-04-04 10:17:21 -07:00
Kpa-clawbot 87ac61748c perf(analytics): compute network status client-side, eliminate redundant API call (#583)
## Summary

Reduces the analytics nodes tab from 3 parallel API calls to 2 by
computing network status (active/degraded/silent counts) client-side
instead of fetching from `/nodes/network-status`.

## What Changed

**`public/analytics.js` — `renderNodesTab()`:**
- Removed the `/nodes/network-status` API call from the `Promise.all`
batch
- Added client-side computation of active/degraded/silent counts using
the shared `getHealthThresholds()` function from `roles.js`
- Uses `nodesResp.total` and `nodesResp.counts` (already returned by
`/nodes` endpoint) for total node count and role breakdown

## Why This Works

The `/nodes` response already includes:
- `total` — count of all matching nodes (server-computed across full DB)
- `counts` — role counts across all nodes (from `GetAllRoleCounts()`)
- Per-node `last_seen`/`last_heard` timestamps

The `getHealthThresholds()` function in `roles.js` provides the same
degraded/silent thresholds used server-side, so client-side status
computation produces equivalent results for the loaded node set.

## Performance

- **Before:** 3 parallel API calls (`/nodes`, `/nodes/bulk-health`,
`/nodes/network-status`)
- **After:** 2 parallel API calls (`/nodes`, `/nodes/bulk-health`)
- Network status computation is O(n) over the 200 loaded nodes —
negligible client-side cost
- The `/nodes/network-status` endpoint scanned ALL nodes in the DB on
every call; this eliminates that server-side work entirely

## Testing

- All frontend helper tests pass (445/445)
- All packet filter tests pass (62/62)  
- All aging tests pass (29/29)
- All Go backend tests pass

Fixes #392

---------

Co-authored-by: you <you@example.com>
2026-04-04 10:17:05 -07:00
Kpa-clawbot 26de38f4b6 perf(map): reposition markers on zoom/resize instead of full rebuild (#582)
## Summary

Eliminates visible marker flicker on zoom/resize events in the map page
when displaying 500+ nodes.

## Problem

`renderMarkers()` was called on every `zoomend` and `resize` event,
which did `markerLayer.clearLayers()` followed by a full rebuild of all
markers. With many nodes, this caused a visible flash where all markers
disappeared briefly before being re-added.

## Solution

Instead of rebuilding all markers from scratch on zoom/resize:

1. **Store Leaflet layer references** on marker data objects
(`_leafletMarker`, `_leafletLine`, `_leafletDot`) during the initial
full render
2. **Add `_repositionMarkers()`** — re-runs `deconflictLabels()` at the
new zoom level and updates existing marker positions via
`setLatLng()`/`setLatLngs()` without clearing the layer group
3. **Debounce zoom/resize handlers** (150ms) to coalesce rapid events
during animated zooms
4. **Dynamically manage offset indicators** — adds/removes deconfliction
offset lines and dots as positions change at different zoom levels

Full `renderMarkers()` is still called for filter changes, data updates,
and theme changes — only zoom/resize uses the lightweight repositioning
path.

## Complexity

- `_repositionMarkers()`: O(n) — single pass over stored marker data
- `deconflictLabels()`: O(n × k) where k is max spiral offsets (48) —
unchanged
- No new API calls, no DOM rebuilds

Fixes #393

---------

Co-authored-by: you <you@example.com>
2026-04-04 17:16:48 +00:00
Kpa-clawbot d2d4c504e8 perf(live): parallelize replayRecent() observation fetches (#581)
## Summary

`replayRecent()` in `live.js` fetched observation details for 8 packet
groups **sequentially** — each `await fetch()` waited for the previous
to complete before starting the next.

## Change

Replaced the sequential `for` loop with `Promise.all()` to fetch all 8
detail API calls **concurrently**. The mapping from results to live
packets is unchanged.

**Before:** 8 sequential fetches (total time ≈ sum of all request
durations)
**After:** 8 parallel fetches (total time ≈ max of all request
durations)

## Notes

- `replayRecent()` is currently disabled (commented out at line 856), so
this is dormant code — no runtime risk
- No behavioral change: same data mapping, same rendering, same VCR
buffer population
- All existing tests pass

Fixes #394

---------

Co-authored-by: you <you@example.com>
2026-04-04 10:16:08 -07:00
Kpa-clawbot b37e8e2da2 perf(packets): replace N+1 API calls with single expand=observations query (#580)
## Summary

Eliminates the N+1 API call storm when toggling off "Group by Hash" in
the packets table.

## Problem

When ungrouped mode was active, `loadPackets()` fired individual
`/api/packets/{hash}` requests for every multi-observation packet. With
200+ multi-obs packets, this created 200+ parallel HTTP requests —
overwhelming both browser connection limits and the server.

## Fix

The server already supports `expand=observations` on the `/api/packets`
endpoint, which returns observations inline. Instead of:

1. Always fetching grouped (`groupByHash=true`)
2. Then N+1 fetching each packet's children individually

We now:

1. Fetch grouped when grouped mode is active (`groupByHash=true`)
2. Fetch with `expand=observations` when ungrouped — **single API call**
3. Flatten observations client-side

**Result: 200+ API calls → 1 API call.**

## Changes

- `public/packets.js`: Replaced N+1 observation fetching loop with
single `expand=observations` query parameter, flatten inline
observations client-side.

## Testing

- All frontend tests pass (packet-filter: 62/62, frontend-helpers:
445/445)
- All Go backend tests pass

Fixes #382

Co-authored-by: you <you@example.com>
2026-04-04 10:15:14 -07:00
Kpa-clawbot 45d8116880 perf: query only matching node locations in handleObservers (#579)
## Summary

`handleObservers()` in `routes.go` was calling `GetNodeLocations()`
which fetches ALL nodes from the DB just to match ~10 observer IDs
against node public keys. With 500+ nodes this is wasteful.

## Changes

- **`db.go`**: Added `GetNodeLocationsByKeys(keys []string)` — queries
only the rows matching the given public keys using a parameterized
`WHERE LOWER(public_key) IN (?, ?, ...)` clause.
- **`routes.go`**: `handleObservers` now collects observer IDs and calls
the targeted method instead of the full-table scan.
- **`coverage_test.go`**: Added `TestGetNodeLocationsByKeys` covering
known key, empty keys, and unknown key cases.

## Performance

With ~10 observers and 500+ nodes, the query goes from scanning all 500
rows to fetching only ~10. The original `GetNodeLocations()` is
preserved for any other callers.

Fixes #378

Co-authored-by: you <you@example.com>
2026-04-04 10:14:37 -07:00
Kpa-clawbot f68e98c376 perf(live): skip updateTimeline() when tab is hidden (#578)
## Summary

Skip `updateTimeline()` canvas redraws in `bufferPacket()` when the
browser tab is hidden (`_tabHidden === true`). Instead, batch-update the
timeline once when the tab becomes visible again via the
`visibilitychange` handler.

Fixes #385

## What Changed

**`public/live.js`** — two surgical edits:

1. **`bufferPacket()`**: Removed `updateTimeline()` call from the
`_tabHidden` early-return path. When the tab is backgrounded, packets
are still buffered (for VCR) but no canvas work is done.

2. **`visibilitychange` handler**: Added `updateTimeline()` call when
the tab is restored, so the timeline catches up in a single repaint
instead of N repaints (one per buffered packet).

## Performance Impact

At 5+ packets/sec with a backgrounded tab, this eliminates continuous
canvas redraws (`updateTimeline()` calls `ctx.clearRect` + full canvas
redraw + `updateTimelinePlayhead()`) that are invisible to the user. CPU
usage drops to near-zero for timeline rendering while backgrounded.

## Tests

All existing tests pass:
- `test-packet-filter.js` — 62 passed
- `test-aging.js` — 29 passed  
- `test-frontend-helpers.js` — 445 passed

Co-authored-by: you <you@example.com>
2026-04-04 10:14:13 -07:00
Kpa-clawbot f3d5d1e021 perf: resolve hops from in-memory prefix map instead of N+1 DB queries (#577)
## Summary

Replace N+1 per-hop DB queries in `handleResolveHops` with O(1) lookups
against the in-memory prefix map that already exists in the packet
store.

## Problem

Each hop in the `resolve-hops` API triggered a separate `SELECT ... LIKE
?` query against the nodes table. With 10 hops, that's 10 DB round-trips
— unnecessary when `getCachedNodesAndPM()` already maintains an
in-memory prefix map that can resolve hops instantly.

## Changes

- **routes.go**: Replace the per-hop DB query loop with `pm.m[hopLower]`
lookups from the prefix map. Convert `nodeInfo` → `HopCandidate` inline.
Remove unused `rows`/`sql.Scan` code.
- **store.go**: Add `InvalidateNodeCache()` method to force prefix map
rebuild (needed by tests that insert nodes after store initialization).
- **routes_test.go**: Give `TestResolveHopsAmbiguous` a proper store so
hops resolve via the prefix map.
- **resolve_context_test.go**: Call `InvalidateNodeCache()` after
inserting test nodes. Fix confidence assertion — with GPS candidates and
no affinity context, `resolveWithContext` correctly returns
`gps_preference` (previously masked because the prefix map didn't have
the test nodes).

## Complexity

O(1) per hop lookup via hash map vs O(n) DB scan per hop. No hot-path
impact — this endpoint is called on-demand, not in a render loop.

Fixes #369

---------

Co-authored-by: you <you@example.com>
2026-04-04 09:51:07 -07:00
Kpa-clawbot 02004c5912 perf: incremental distance index update on path changes (#576)
## Summary

Replace full `buildDistanceIndex()` rebuild with incremental
`removeTxFromDistanceIndex`/`addTxToDistanceIndex` for only the
transmissions whose paths actually changed during
`IngestNewObservations`.

## Problem

When any transmission's best path changed during observation ingestion,
the **entire distance index was rebuilt** — iterating all 30K+ packets,
resolving all hops, and computing haversine distances. This
`O(total_packets × avg_hops)` operation ran under a write lock, blocking
all API readers.

A 30-second debounce (`distRebuildInterval`) was added in #557 to
mitigate this, but it only delayed the pain — the full rebuild still
happened, just less frequently.

## Fix

- Added `removeTxFromDistanceIndex(tx)` — filters out all
`distHopRecord` and `distPathRecord` entries for a specific transmission
- Added `addTxToDistanceIndex(tx)` — computes and appends new distance
records for a single transmission
- In `IngestNewObservations`, changed path-change handling to call
remove+add for each affected tx instead of marking dirty and waiting for
a full rebuild
- Removed `distDirty`, `distLast`, and `distRebuildInterval` since
incremental updates are cheap enough to apply immediately

## Complexity

- **Before:** `O(total_packets × avg_hops)` per rebuild (30K+ packets)
- **After:** `O(changed_txs × avg_hops + total_dist_records)` — the
remove is a linear scan of the distance slices, but only for affected
txs; the add is `O(hops)` per changed tx

The remove scan over `distHops`/`distPaths` slices is linear in slice
length, but this is still far cheaper than the full rebuild which also
does JSON parsing, hop resolution, and haversine math for every packet.

## Tests

- Updated `TestDistanceRebuildDebounce` →
`TestDistanceIncrementalUpdate` to verify incremental behavior and check
for duplicate path records
- All existing tests pass (`go test ./...` in both `cmd/server` and
`cmd/ingestor`)

Fixes #365

---------

Co-authored-by: you <you@example.com>
2026-04-04 09:50:55 -07:00
Kpa-clawbot ef30031e2e perf: cache resolveRegionObservers with 30s TTL (#575)
## Summary

Cache `resolveRegionObservers()` results with a 30-second TTL to
eliminate repeated database queries for region→observer ID mappings.

## Problem

`resolveRegionObservers()` queried the database on every call despite
the observers table changing infrequently (~20 rows). It's called from
10+ hot paths including `filterPackets()`, `GetChannels()`, and multiple
analytics compute functions. When analytics caches are cold, parallel
requests each hit the DB independently.

## Solution

- Added a dedicated `regionObsMu` mutex + `regionObsCache` map with 30s
TTL
- Uses a separate mutex (not `s.mu`) to avoid deadlocks — callers
already hold `s.mu.RLock()`
- Cache is lazily populated per-region and fully invalidated after TTL
expires
- Follows the same pattern as `getCachedNodesAndPM()` (30s TTL,
on-demand rebuild)

## Changes

- **`cmd/server/store.go`**: Added `regionObsMu`, `regionObsCache`,
`regionObsCacheTime` fields; rewrote `resolveRegionObservers()` to check
cache first; added `fetchAndCacheRegionObs()` helper
- **`cmd/server/coverage_test.go`**: Added
`TestResolveRegionObserversCaching` — verifies cache population, cache
hits, and nil handling for unknown regions

## Testing

- All existing Go tests pass (`go test ./...`)
- New test verifies caching behavior (population, hits, nil for unknown
regions)

Fixes #362

---------

Co-authored-by: you <you@example.com>
2026-04-04 09:50:27 -07:00
Kpa-clawbot 67511ed6a7 perf: combine GetStoreStats into 2 concurrent queries instead of 5 sequential (#574)
## Summary

`GetStoreStats()` ran 5 sequential DB queries on every call. This
combines them into **2 concurrent queries**:

1. **Node/observer counts** — single query using subqueries: `SELECT
(SELECT COUNT(*) FROM nodes WHERE ...), (SELECT COUNT(*) FROM nodes),
(SELECT COUNT(*) FROM observers)`
2. **Observation counts** — single query using conditional aggregation:
`SUM(CASE WHEN timestamp > ? THEN 1 ELSE 0 END)` scoped to the 24h
window, avoiding a full table scan for the 1h count

Both queries run concurrently via goroutines + `sync.WaitGroup`.

## What changed

- `cmd/server/store.go`: Rewrote `GetStoreStats()` — 5 sequential
`QueryRow` calls → 2 concurrent combined queries
- Error handling now propagates query errors instead of silently
ignoring them

## Performance justification

- **Before:** 5 sequential round-trips to SQLite, with 2 potentially
expensive `COUNT(*)` scans on the `observations` table
- **After:** 2 concurrent round-trips; the observation query scans the
24h window once instead of separately scanning for 1h and 24h
- The 10s cache (`statsTTL`) remains, so this fires at most once per 10s
— but when it does fire, it's ~2.5x fewer round-trips and the
observation scan is halved

## Tests

- `go test ./...` passes for both `cmd/server` and `cmd/ingestor`

Fixes #363

---------

Co-authored-by: you <you@example.com>
2026-04-04 09:48:25 -07:00
Kpa-clawbot b35b473508 perf(nodes): extract shared fetchNodeDetail() to deduplicate API calls (#573)
## Summary

Extracts a shared `fetchNodeDetail(pubkey)` helper in `nodes.js` that
fetches both `/nodes/{pubkey}` and `/nodes/{pubkey}/health` in parallel.
Both `selectNode()` (side panel) and `loadFullNode()` (full-screen view)
now call this single function instead of duplicating the fetch logic.

## What Changed

- **New:** `fetchNodeDetail(pubkey)` — shared async function that
returns node data with `.healthData` attached
- **Modified:** `loadFullNode()` — uses `fetchNodeDetail()` instead of
inline `Promise.all`
- **Modified:** `selectNode()` — uses `fetchNodeDetail()` instead of
inline `Promise.all`

## Why

The duplicate `api()` calls weren't a major perf issue (TTL caching
mitigates most cases), but the duplicated logic was unnecessary tech
debt. On mobile, `selectNode()` redirects to `loadFullNode()` via hash
change, so the two code paths could fire sequentially with expired
cache.

## Testing

- All frontend helper tests pass (445/445)
- All packet filter tests pass (62/62)
- All aging tests pass (29/29)
- No behavioral change — only code structure improvement

Fixes #391

Co-authored-by: you <you@example.com>
2026-04-04 09:47:59 -07:00
Kpa-clawbot d4f2c3ac66 perf: index subpath detail lookups instead of scanning all packets (#571)
## Summary

`GetSubpathDetail()` iterated ALL packets to find those containing a
specific subpath — `O(packets × hops × subpath_length)`. With 30K+
packets this caused user-visible latency on every subpath detail click.

## Changes

### `cmd/server/store.go`
- Added `spTxIndex map[string][]*StoreTx` alongside existing `spIndex` —
tracks which transmissions contain each subpath key
- Extended `addTxToSubpathIndexFull()` and
`removeTxFromSubpathIndexFull()` to maintain both indexes simultaneously
- Original `addTxToSubpathIndex()`/`removeTxFromSubpathIndex()` wrappers
preserved for backward compatibility
- `buildSubpathIndex()` now populates both `spIndex` and `spTxIndex`
during `Load()`
- All incremental update sites (ingest, path change, eviction) use the
`Full` variants
- `GetSubpathDetail()` rewritten: direct `O(1)` map lookup on
`spTxIndex[key]` instead of scanning all packets

### `cmd/server/coverage_test.go`
- Added `TestSubpathTxIndexPopulated`: verifies `spTxIndex` is
populated, counts match `spIndex`, and `GetSubpathDetail` returns
correct results for both existing and non-existent subpaths

## Complexity

- **Before:** `O(total_packets × avg_hops × subpath_length)` per request
- **After:** `O(matched_txs)` per request (direct map lookup)

## Tests

All tests pass: `cmd/server` (4.6s), `cmd/ingestor` (25.6s)

Fixes #358

---------

Co-authored-by: you <you@example.com>
2026-04-04 09:35:00 -07:00
Kpa-clawbot 37300bf5c8 fix: cap prefix map at 8 chars to cut memory ~10x (#570)
## Summary

`buildPrefixMap()` was generating map entries for every prefix length
from 2 to `len(pubkey)` (up to 64 chars), creating ~31 entries per node.
With 500 nodes that's ~15K map entries; with 1K+ nodes it balloons to
31K+.

## Changes

**`cmd/server/store.go`:**
- Added `maxPrefixLen = 8` constant — MeshCore path hops use 2–6 char
prefixes, 8 gives headroom
- Capped the prefix generation loop at `maxPrefixLen` instead of
`len(pk)`
- Added full pubkey as a separate map entry when key is longer than
`maxPrefixLen`, ensuring exact-match lookups (used by
`resolveWithContext`) still work

**`cmd/server/coverage_test.go`:**
- Added `TestPrefixMapCap` with subtests for:
  - Short prefix resolution still works
  - Full pubkey exact-match resolution still works
  - Intermediate prefixes beyond the cap correctly return nil
  - Short keys (≤8 chars) have all prefix entries
  - Map size is bounded

## Impact

- Map entries per node: ~31 → ~8 (one per prefix length 2–8, plus one
full-key entry)
- Total map size for 500 nodes: ~15K entries → ~4K entries (~75%
reduction)
- No behavioral change for path hop resolution (2–6 char prefixes)
- No behavioral change for exact pubkey lookups

## Tests

All existing tests pass:
- `cmd/server`: 
- `cmd/ingestor`: 

Fixes #364

---------

Co-authored-by: you <you@example.com>
2026-04-04 09:28:38 -07:00
Kpa-clawbot cb8a2e15c8 perf: index node path lookups instead of scanning all packets (#572)
## Summary

Index node path lookups in `handleNodePaths()` instead of scanning all
packets on every request.

## Problem

`handleNodePaths()` iterated ALL packets in the store (`O(total_packets
× avg_hops)`) with prefix string matching on every hop. This caused
user-facing latency on every node detail page load with 30K+ packets.

## Fix

Added a `byPathHop` index (`map[string][]*StoreTx`) that maps lowercase
hop prefixes and resolved full pubkeys to their transmissions. The
handler now does direct map lookups instead of a full scan.

### Index lifecycle
- **Built** during `Load()` via `buildPathHopIndex()`
- **Incrementally updated** during `IngestNewFromDB()` (new packets) and
`IngestNewObservations()` (path changes)
- **Cleaned up** during `EvictStale()` (packet removal)

### Query strategy
The handler looks up candidates from the index using:
1. Full pubkey (matches resolved hops from `resolved_path`)
2. 2-char prefix (matches short raw hops)
3. 4-char prefix (matches medium raw hops)
4. Any longer raw hops starting with the 4-char prefix

This reduces complexity from `O(total_packets × avg_hops)` to
`O(matching_txs + unique_hop_keys)`.

## Tests

- `TestNodePathsEndpointUsesIndex` — verifies the endpoint returns
correct results using the index
- `TestPathHopIndexIncrementalUpdate` — verifies add/remove operations
on the index

All existing tests pass.

Fixes #359

Co-authored-by: you <you@example.com>
2026-04-04 09:25:18 -07:00
Kpa-clawbot aac038abb9 fix: filter inconsistent hash sizes by role and add 7-day time window (#567)
## Summary

Fixes #566 — The "Inconsistent Hash Sizes" list on the Analytics page
included all node types and had no time window, causing false positives.

## Changes

### 1. Role filter on inconsistent nodes (`cmd/server/store.go`)
Added role filter to the `inconsistentNodes` loop in
`computeHashCollisions()` so only repeaters and room servers are
included. Companions are excluded since they were never affected by the
firmware bug. This matches the existing role filter on collision
bucketing from #441.

```go
// Before:
if cn.HashSizeInconsistent {

// After:
if cn.HashSizeInconsistent && (cn.Role == "repeater" || cn.Role == "room_server") {
```

### 2. 7-day time window on hash size computation
(`cmd/server/store.go`)
Added a 7-day recency cutoff to `computeNodeHashSizeInfo()`. Adverts
older than 7 days are now skipped, preventing legitimate historical
config changes (e.g., testing different byte sizes) from creating
permanent false positives.

### 3. Frontend description text (`public/analytics.js`)
Updated the description to reflect the filtered scope: now says
"Repeaters and room servers" instead of "Nodes", mentions the 7-day
window, and notes that companions are excluded.

## Tests

- `TestInconsistentNodesExcludesCompanions` — verifies companions are
excluded while repeaters and room servers are included
- `TestHashSizeInfoTimeWindow` — verifies adverts older than 7 days are
excluded from hash size computation
- Updated existing hash size tests to use recent timestamps (compatible
with the new time window)
- All existing tests pass: `cmd/server` , `cmd/ingestor` 

## Perf justification
The time window filter adds a single string comparison per advert in the
scan loop — O(n) with a tiny constant. No impact on hot paths.

---------

Co-authored-by: you <you@example.com>
2026-04-04 09:22:12 -07:00
Kpa-clawbot 588fba226d perf: track max transmission/observation IDs incrementally (#569)
## Summary

Replace O(n) map iteration in `MaxTransmissionID()` and
`MaxObservationID()` with O(1) field lookups.

## What Changed

- Added `maxTxID` and `maxObsID` fields to `PacketStore`
- Updated `Load()`, `IngestNewFromDB()`, and `IngestNewObservations()`
to track max IDs incrementally as entries are added
- `MaxTransmissionID()` and `MaxObservationID()` now return the tracked
field directly instead of iterating the entire map

## Performance

Before: O(n) iteration over 30K+ map entries under a read lock
After: O(1) field return

## Tests

- Added `TestMaxTransmissionIDIncremental` verifying the incremental
field matches brute-force iteration over the maps
- All existing tests pass (`cmd/server` and `cmd/ingestor`)

Fixes #356

Co-authored-by: you <you@example.com>
2026-04-04 09:20:17 -07:00
Kpa-clawbot c670742589 feat: add byte-size filter to map page (#565) (#568)
## Summary

Adds a byte-size filter to the map page, allowing users to filter
repeater markers by their hash prefix size (1-byte, 2-byte, or 3-byte).

## What changed

**`public/map.js`** — single file change:

1. **New filter state**: Added `byteSize` to the `filters` object
(default: `'all'`), persisted in `localStorage`
2. **New UI section**: Added a "Byte Size" fieldset with button group
(`All | 1-byte | 2-byte | 3-byte`) in the map controls panel, between
"Node Types" and "Display"
3. **Filter logic**: In `_renderMarkersInner`, when `byteSize !==
'all'`, repeater nodes are filtered by their `hash_size` field.
Non-repeater nodes (companions, rooms, sensors) are unaffected — they
pass through regardless of the byte-size filter setting
4. **Event binding**: Button click handlers update the filter, persist
to localStorage, and re-render markers

## Design decisions

- **Client-side only** — no backend changes needed. The `hash_size`
field is already included in the `/api/nodes` response
- **Repeaters only** — byte size is a repeater configuration concept;
other node roles don't have configurable path prefix sizes
- **Matches existing pattern** — uses the same button-group UI as the
Status filter (All/Active/Stale)
- **`hash_size` defaults to 1** — consistent with how the rest of the
codebase treats missing `hash_size` (`node.hash_size || 1`)

## Performance

No new API calls. Filter is a simple string comparison inside the
existing `nodes.filter()` loop in `_renderMarkersInner` — O(1) per node,
negligible overhead.

Fixes #565

Co-authored-by: you <you@example.com>
2026-04-04 09:14:49 -07:00
efiten f897ce1b26 fix: use runtime heap stats for memory-based eviction (#564)
## Problem

Closes #563. Addresses the *Packet store estimated memory* item in #559.

`estimatedMemoryMB()` used a hardcoded formula:

```go
return float64(len(s.packets)*5120+s.totalObs*500) / 1048576.0
```

This ignored three data structures that grow continuously with every
ingest cycle:

| Structure | Production size | Heap not counted |
|---|---|---|
| `distHops []distHopRecord` | 1,556,833 records | ~300 MB |
| `distPaths []distPathRecord` | 93,090 records | ~25 MB |
| `spIndex map[string]int` | 4,113,234 entries | ~400 MB |

Result: formula reported ~1.2 GB while actual heap was ~5 GB. With
`maxMemoryMB: 1024`, eviction calculated it only needed to shed ~200 MB,
removed a handful of packets, and stopped. Memory kept growing until the
OOM killer fired.

## Fix

Replace `estimatedMemoryMB()` with `runtime.ReadMemStats` so all data
structures are automatically counted:

```go
func (s *PacketStore) estimatedMemoryMB() float64 {
    if s.memoryEstimator != nil {
        return s.memoryEstimator()
    }
    var ms runtime.MemStats
    runtime.ReadMemStats(&ms)
    return float64(ms.HeapAlloc) / 1048576.0
}
```

Replace the eviction simulation loop (which re-used the same wrong
formula) with a proportional calculation: if heap is N× over budget,
evict enough packets to keep `(1/N) × 0.9` of the current count. The 0.9
factor adds a 10% buffer so the next ingest cycle doesn't immediately
re-trigger. All major data structures (distHops, distPaths, spIndex)
scale with packet count, so removing a fraction of packets frees roughly
the same fraction of total heap.

## Testing

- Updated `TestEvictStale_MemoryBasedEviction` to inject a deterministic
estimator via the new `memoryEstimator` field.
- Added `TestEvictStale_MemoryBasedEviction_UnderestimatedHeap`:
verifies that when actual heap is 5× over limit (the production failure
scenario), eviction correctly removes ~80%+ of packets.

```
=== RUN   TestEvictStale_MemoryBasedEviction
[store] Evicted 538 packets (1076 obs)
--- PASS

=== RUN   TestEvictStale_MemoryBasedEviction_UnderestimatedHeap
[store] Evicted 820 packets (1640 obs)
--- PASS
```

Full suite: `go test ./...` — ok (10.3s)

## Perf note

`runtime.ReadMemStats` runs once per eviction tick (every 60 s) and once
per `/api/perf/store` call. Cost is negligible.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 08:41:54 -07:00
Kpa-clawbot cbfce41d7e perf: optimize neighbor graph build (3 fixes for 30s+ CPU) (#562)
## Summary

Fixes critical performance issue in neighbor graph computation that
consumed 65% of CPU (30+ seconds) on a 325K packet dataset.

## Changes

### Fix 1: Cache strings.ToLower results
- Added cachedToLower() helper that caches lowercased strings in a local
map
- Pubkeys repeat across hundreds of thousands of observations
- Pre-computes fromLower once per transaction instead of once per
observation
- **Impact:** Eliminates ~8.4s (25.3% CPU)

### Fix 2: Cache parsed DecodedJSON via StoreTx.ParsedDecoded()
- Added ParsedDecoded() method on StoreTx using sync.Once for
thread-safe lazy caching
- json.Unmarshal on decoded_json now runs at most once per packet
lifetime
- Result reused by extractFromNode, indexByNode, trackAdvertPubkey
- **Impact:** Eliminates ~8.8s (26.3% CPU)

### Fix 3: Extend neighbor graph TTL from 60s to 5 minutes
- The graph depends on traffic patterns, not individual packets
- Reduces rebuild frequency 5x
- **Impact:** ~80% reduction in sustained CPU from graph rebuilds

## Tests

- 7 new tests added, all 26+ existing neighbor graph tests pass
- BenchmarkBuildFromStore: 727us/op, 237KB/op, 6030 allocs/op

Related: #559

---------

Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: you <you@example.com>
v3.4.1
2026-04-04 01:25:51 -07:00
you 1e1c4cb91f fix: include resolved_path in groupByHash packet response
QueryGroupedPackets builds its map manually and was missing
resolved_path. The non-grouped path (txToMap) included it.
2026-04-04 08:01:35 +00:00
you 0c340e1eb6 fix: set hasResolvedPath flag after ensuring column exists
detectSchema() runs at DB open time before ensureResolvedPathColumn()
adds the column during Load(). On first run (or any run where the column
was just added), hasResolvedPath stayed false, causing Load() to skip
reading resolved_path from SQLite. This forced a full backfill of all
observations on every restart, burning CPU for minutes on large DBs.

Fix: set hasResolvedPath = true after ensureResolvedPathColumn succeeds.
2026-04-04 07:46:25 +00:00
Kpa-clawbot ae38cdefb4 feat: server-side hop resolution at ingest — resolved_path (#556)
## Summary

Implements server-side hop prefix resolution at ingest time with a
persisted neighbor graph. Hop prefixes in `path_json` are now resolved
to full 64-char pubkeys at ingest and stored as `resolved_path` on each
observation, eliminating the need for client-side resolution via
`HopResolver`.

Fixes #555

## What changed

### New file: `cmd/server/neighbor_persist.go`
SQLite persistence layer for the neighbor graph and resolved paths:
- `neighbor_edges` table creation and management
- Load/build/persist neighbor edges from/to SQLite
- `resolved_path` column migration on observations
- `resolvePathForObs()` — resolves hop prefixes using
`resolveWithContext` with 4-tier priority (affinity → geo → GPS → first
match)
- Cold startup backfill for observations missing `resolved_path`
- Async persistence of edges and resolved paths during ingest
(non-blocking)

### Modified: `cmd/server/store.go`
- `StoreObs` gains `ResolvedPath []*string` field
- `StoreTx` gains `ResolvedPath []*string` (cached from best
observation)
- `Load()` dynamically includes `resolved_path` in SQL query when column
exists
- `IngestNewFromDB()` resolves paths at ingest time and persists
asynchronously
- `pickBestObservation()` propagates `ResolvedPath` to transmission
- `txToMap()` and `enrichObs()` include `resolved_path` in API responses
- All 7 `pm.resolve()` call sites migrated to `pm.resolveWithContext()`
with the persisted graph
- Broadcast maps include `resolved_path` per observation

### Modified: `cmd/server/db.go`
- `DB` struct gains `hasResolvedPath bool` flag
- `detectSchema()` checks for `resolved_path` column existence
- Graceful degradation when column is absent (test DBs, old schemas)

### Modified: `cmd/server/main.go`
- Startup sequence: ensure tables → load/build graph → backfill resolved
paths → re-pick best observations

### Modified: `cmd/server/routes.go`
- `mapSliceToTransmissions()` and `mapSliceToObservations()` propagate
`resolved_path`
- Node paths handler uses `resolveWithContext` with graph

### Modified: `cmd/server/types.go`
- `TransmissionResp` and `ObservationResp` gain `ResolvedPath []*string`
with `omitempty`

### New file: `cmd/server/neighbor_persist_test.go`
16 tests covering:
- Path resolution (unambiguous, empty, unresolvable prefixes)
- Marshal/unmarshal of resolved_path JSON
- SQLite table creation and column migration (idempotent)
- Edge persistence and loading
- Schema detection
- Full Load() with resolved_path
- API response serialization (present when set, omitted when nil)

## Design decisions

1. **Async persistence** — resolved paths and neighbor edges are written
to SQLite in a goroutine to avoid blocking the ingest loop. The
in-memory state is authoritative.

2. **Schema compatibility** — `DB.hasResolvedPath` flag allows the
server to work with databases that don't yet have the `resolved_path`
column. SQL queries dynamically include/exclude the column.

3. **`pm.resolve()` retained** — Not removed as dead code because
existing tests use it directly. All production call sites now use
`resolveWithContext` with the persisted graph.

4. **Edge persistence is conservative** — Only unambiguous edges (single
candidate) are persisted to `neighbor_edges`. Ambiguous prefixes are
handled by the in-memory `NeighborGraph` via Jaccard disambiguation.

5. **`null` = unresolved** — Ambiguous prefixes store `null` in the
resolved_path array. Frontend falls back to prefix display.

## Performance

- `resolveWithContext` per hop: ~1-5μs (map lookups, no DB queries)
- Typical packet has 0-5 hops → <25μs total resolution overhead per
packet
- Edge/path persistence is async → zero impact on ingest latency
- Backfill is one-time on first startup with the new column

## Test results

```
cd cmd/server && go test ./... -count=1  → ok (4.4s)
cd cmd/ingestor && go test ./... -count=1 → ok (25.5s)
```

---------

Co-authored-by: you <you@example.com>
2026-04-04 00:20:59 -07:00
Kpa-clawbot a97fa52f10 feat: frontend consumers prefer resolved_path (M4, #555) (#561)
## Summary

Implements **M4 (frontend consumers)** from the [resolved-path
spec](https://github.com/Kpa-clawbot/CoreScope/blob/resolved-path-spec/docs/specs/resolved-path.md)
for #555.

The server (PR #556, M1-M3) now returns `resolved_path` on all
packet/observation API responses and WebSocket broadcasts. This PR
updates all frontend consumers to **prefer `resolved_path`** over
client-side HopResolver, with full fallback for old packets.

## What changed

### `hop-resolver.js`
- Added `resolveFromServer(hops, resolvedPath)` — takes the short hex
prefixes and aligned array of full pubkeys from `resolved_path`, looks
up node names from the existing nodesList. Returns the same `{ [hop]: {
name, pubkey, ... } }` format as `resolve()`.

### `packet-helpers.js`
- Added `getResolvedPath(p)` — cached JSON parser for the new
`resolved_path` field (mirrors `getParsedPath`).
- Updated `clearParsedCache()` to also clear `_parsedResolvedPath`.

### `packets.js`
- **Bulk load** (`loadPackets`): calls `cacheResolvedPaths(packets)`
before the existing `resolveHops` fallback.
- **WebSocket updates**: pre-populates `hopNameCache` from
`resolved_path` on incoming packets before falling back to HopResolver
for any remaining unknown hops.
- **Group expansion** (`pktToggleGroup`): caches resolved paths from
child observations.
- **Packet detail** (`selectPacket`): prefers `resolveFromServer` when
`resolved_path` is available.
- **Show Route button**: uses `resolved_path` pubkeys directly instead
of client-side disambiguation.
- **Observation spreading**: carries `resolved_path` field when
constructing observation packets.

### `live.js`
- `resolveHopPositions` accepts optional `resolvedPath` parameter;
prefers server-resolved pubkeys, falls back to HopResolver for null
entries.
- Normalized WS packet objects now carry `resolved_path`.

### Files NOT changed (no resolution changes needed)
- **`analytics.js`** — only uses `HopResolver.haversineKm` (a utility
function). Topology, subpath, and hop distance data comes pre-resolved
from the server API (handled by M2/M3).
- **`nodes.js`** — gets pre-resolved path data from
`/nodes/:pubkey/paths` API; no client-side hop resolution.
- **`map.js`** — `drawPacketRoute` already handles full 64-char pubkeys
via exact match. The updated `packets.js` now passes full pubkeys from
`resolved_path` to the map.

## Fallback pattern

```javascript
// In hop-resolver.js
function resolveFromServer(hops, resolvedPath) {
  // Returns resolved entries for non-null pubkeys
  // Skips null entries (unresolved) — caller falls back to HopResolver
}

// In packets.js — bulk load
await cacheResolvedPaths(packets);  // server-side first
await resolveHops([...allHops]);     // client-side fallback for remaining
```

Old packets without `resolved_path` continue to work exactly as before
via the existing HopResolver. `hop-resolver.js` is NOT removed — it
remains the fallback.

## Tests

- 10 new tests for `resolveFromServer()` and `getResolvedPath()`
- All 445 frontend helper tests pass
- All 62 packet filter tests pass
- All 29 aging tests pass

Closes #555 (M4 milestone)

---------

Co-authored-by: you <you@example.com>
2026-04-04 00:18:46 -07:00
Kpa-clawbot 43673e86f2 fix: perf stats MaxMB reads from config instead of hardcoded 1024 (#558)
Perf stats `GetPerfStoreStats` returned a hardcoded `MaxMB: 1024`
regardless of the configured `packetStore.maxMemoryMB`. Now reads from
`s.maxMemoryMB`.

Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-03 23:25:54 -07:00
Kpa-clawbot 81ef51cc5c fix: debounce distance index rebuild to prevent CPU hot loop (#557)
## Problem

On busy meshes (325K+ transmissions, 50 observers), the distance index
rebuild runs on **every ingest poll** (~1s interval), computing
haversine distances for 1M+ hop records. Each rebuild takes 2-3 seconds
but new observations arrive faster than it can finish, creating a CPU
hot loop that starves the HTTP server.

Discovered on the Cascadia Mesh instance where `corescope-server` was
consuming 15 minutes of CPU time in 10 minutes of uptime, the API was
completely unresponsive, and health checks were timing out.

### Server logs showing the hot loop:
```
[store] Built distance index: 1797778 hop records, 207072 path records
[store] Built distance index: 1797806 hop records, 207075 path records
[store] Built distance index: 1797811 hop records, 207075 path records
[store] Built distance index: 1797820 hop records, 207075 path records
```
Every 2 seconds, nonstop.

## Root Cause

`IngestNewObservations` calls `buildDistanceIndex()` synchronously
whenever `pickBestObservation` selects a longer path. With 50 observers
sending observations every second, paths change on nearly every poll
cycle, triggering a full rebuild each time.

## Fix

- Mark distance index dirty on path changes instead of rebuilding inline
- Rebuild at most every **30 seconds** (configurable via `distLast`
timer)
- Set `distLast` after initial `Load()` to prevent immediate re-rebuild
on first ingest
- Distance data is at most 30s stale — acceptable for an analytics view

## Testing

- `go build`, `go vet`, `go test` all pass
- No behavioral change for the initial load or the analytics API
response shape
- Distance data freshness goes from real-time to 30s max staleness

---------

Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: you <you@example.com>
2026-04-03 23:08:09 -07:00
you ddce26ff2d ci: pin build and deploy jobs to meshcore-vm runner 2026-04-04 04:21:48 +00:00
Kpa-clawbot ee29cc627f perf: parallelize expanded group fetches, use hashIndex Map lookup (#552)
## Summary
Fixes #388 — expanded groups were fetched sequentially with O(n)
`packets.find()` lookups.

## Changes
1. **Parallel fetch**: Replaced sequential `for...of + await` loop in
`loadPackets()` with `Promise.all()` so all expanded group children are
fetched concurrently.
2. **O(1) Map lookup**: Replaced 3 instances of `packets.find(p =>
p.hash === hash)` with `hashIndex.get(hash)`:
   - `loadPackets()` expanded group restore (~line 553)
   - `select-observation` click handler (~line 1015)
   - `pktToggleGroup()` (~line 2012)

## Perf justification
- **Before**: N expanded groups → N sequential API calls + N ×
O(packets.length) array scans
- **After**: N parallel API calls + N × O(1) Map lookups
- Typical N is 1-3 (minor severity as noted in issue), but the fix is
trivial and correct

## Tests
All existing tests pass: `test-packet-filter.js` (62), `test-aging.js`
(29), `test-frontend-helpers.js` (433).

Co-authored-by: you <you@example.com>
2026-04-03 21:09:17 -07:00
Kpa-clawbot f3caf42be4 feat: show transport badge in live packet feed (#551)
## Summary

Show the transport badge ("T") in the live packet feed, matching the
packets table (#337).

## Changes

- Add `transportBadge(pkt.route_type)` to all 4 feed rendering paths in
`live.js`:
  - Grouped feed items (initial history load)
  - `addFeedItemDOM()` (VCR replay)
  - Dedup new feed items (live WebSocket updates)
  - Node detail panel recent packets list
- Uses existing `transportBadge()` from `app.js` and `.badge-transport`
CSS from `style.css`

## Testing

- 2 new source-level assertions in `test-live.js` verifying
`transportBadge()` calls exist
- All existing tests pass (67 passed in test-live.js, no new failures)

Fixes #338

Co-authored-by: you <you@example.com>
2026-04-03 21:09:02 -07:00
Kpa-clawbot c34744247a fix: clean up nodeActivity in pruneStaleNodes to prevent memory leak (#553)
## Summary

`nodeActivity` (an object tracking per-node packet counts for heatmap
intensity) grows without bound — entries are added on every packet flash
but never removed, even when stale nodes are pruned.

## Changes

- **Delete `nodeActivity[key]`** alongside `nodeMarkers[key]` and
`nodeData[key]` when removing stale WS-only nodes in `pruneStaleNodes()`
- **Prune orphaned entries** — after the main prune loop, sweep
`nodeActivity` and delete any key that has no corresponding `nodeData`
entry (catches edge cases where nodes were removed by other code paths)
- Both run every 60s via the existing `pruneStaleNodes` interval timer

## Testing

- Added 2 regression tests in `test-frontend-helpers.js` verifying stale
node cleanup and orphan removal
- All 435 frontend helper tests pass, plus packet-filter (62) and aging
(29)

Fixes #390

---------

Co-authored-by: you <you@example.com>
2026-04-03 16:54:53 -07:00
Kpa-clawbot 10f712f9d7 fix: restructure scroll containers for iOS status bar tap-to-scroll (#330) (#554)
## Summary

Fixes #330 — iOS status bar tap-to-scroll broken because `#app` had
`overflow: hidden`, preventing `<body>` from being the scroll container.

## Approach: Option B from the issue

Instead of a JS polyfill, this restructures scroll containers so
`<body>` is the primary scroll container by default, which iOS Safari
requires for native status-bar tap-to-scroll.

### How it works

**`#app` default (body-scroll mode):** Uses `min-height` instead of
fixed `height`, no `overflow: hidden`. Content pushes beyond the
viewport and body scrolls naturally.

**`#app.app-fixed` (fixed-layout mode):** Restores the original `height:
calc(100dvh - 52px); overflow: hidden` for pages that need constrained
containers. The router in `app.js` toggles this class based on the
current page.

### Fixed-layout pages (`.app-fixed`)
These pages need fixed-height containers and are unchanged in behavior:
- **packets** — virtual scroll requires fixed-height `.panel-left` to
calculate visible rows
- **nodes** — split-panel layout with independently scrollable panels
- **map** — Leaflet requires fixed-dimension container
- **live** — Leaflet map (also has its own `#app:has(.live-page)`
override in live.css)
- **channels** — split-panel chat layout
- **audio-lab** — split-panel layout

### Body-scroll pages (no `.app-fixed`)
These pages now let the body scroll, enabling iOS tap-to-scroll:
- **analytics** — removed `overflow-y: auto; height: 100%`
- **observers** — removed `overflow-y: auto; height: calc(100vh - 56px)`
- **traces** — removed `overflow-y: auto; height: 100%`
- **home** — removed `#app:has(.home-hero)` override (no longer needed)
- **compare** — removed inline `overflow-y:auto; height:calc(100vh -
56px)`
- **perf** — removed inline `height:100%; overflow-y:auto`
- **observer-detail** — removed inline `overflow-y:auto;
height:calc(100vh - 56px)`
- **node-analytics** — removed inline `height:100%; overflow-y:auto`

### Files changed
| File | Change |
|------|--------|
| `public/style.css` | `#app` default → `min-height`; added `.app-fixed`
class |
| `public/app.js` | Router toggles `.app-fixed` based on page |
| `public/home.css` | Removed `#app:has()` workaround |
| `public/compare.js` | Removed inline overflow/height |
| `public/perf.js` | Removed inline overflow/height |
| `public/observer-detail.js` | Removed inline overflow/height |
| `public/node-analytics.js` | Removed inline overflow/height |

### What's preserved
- Sticky nav (`position: sticky; top: 0`) — works with body scroll
- Split-panel resize handles — unchanged, still in fixed containers
- Virtual scroll on packets page — unchanged, `.panel-left` still has
fixed height
- Leaflet maps — unchanged, containers still have fixed dimensions
- Mobile responsive overrides — unchanged

Co-authored-by: you <you@example.com>
2026-04-03 16:54:36 -07:00
Kpa-clawbot 412a8fdb8f feat: live map uses affinity-aware hop resolution (#528) (#550)
## Summary

Augments the shared `HopResolver` with neighbor-graph affinity data so
that when multiple nodes match a hop prefix, the resolver prefers
candidates that are known neighbors of the adjacent hop — instead of
relying solely on geo-distance.

Fixes #528

## Changes

### `public/hop-resolver.js`
- Added `affinityMap` — stores bidirectional neighbor adjacency with
scores
- Added `setAffinity(graph)` — ingests `/api/analytics/neighbor-graph`
edge data into O(1) Map lookups
- Added `getAffinity(pubkeyA, pubkeyB)` — returns affinity score between
two nodes (0 if not neighbors)
- Added `pickByAffinity(candidates, adjacentPubkey, anchor, ...)` —
picks best candidate: affinity-neighbor first (highest score), then
geo-distance fallback
- Modified forward and backward passes in `resolve()` to track the
previously-resolved pubkey and use `pickByAffinity` instead of raw
geo-sort

### `public/live.js`
- Added `fetchAffinityData()` — fetches `/api/analytics/neighbor-graph`
once and calls `HopResolver.setAffinity()`
- Added `startAffinityRefresh()` — refreshes affinity data every 60
seconds
- Both are called from `loadNodes()` after HopResolver is initialized

### `test-hop-resolver-affinity.js` (new)
- Affinity prefers neighbor candidate over geo-closest
- Cold start (no affinity data) falls back to geo-closest
- Null/undefined affinity doesn't crash
- Bidirectional score lookup
- Highest affinity score wins among multiple neighbors
- Unambiguous hops unaffected by affinity

## Performance

- API calls: 1 at load + 1 per 60s (no per-packet calls)
- Per-packet resolve: O(1) Map lookups, <0.5ms
- Memory: ~50KB for 2K-node graph

---------

Co-authored-by: you <you@example.com>
2026-04-03 16:32:53 -07:00
Kpa-clawbot 9a39198d92 fix: only count repeaters in hash collision analysis (#441) (#548)
Fixes #441

## Summary

Hash collision analysis was including ALL node types, inflating
collision counts with irrelevant data. Per MeshCore firmware analysis,
**only repeaters matter for collision analysis** — they're the only role
that forwards packets and appears in routing `path[]` arrays.

## Root Causes Fixed

1. **`hash_size==0` nodes counted in all buckets** — nodes with unknown
hash size were included via `cn.HashSize == bytes || cn.HashSize == 0`,
polluting every bucket
2. **Non-repeater roles included** — companions, rooms, sensors, and
observers were counted even though their hash collisions never cause
routing ambiguity

## Fix

Changed `computeHashCollisions()` filter from:
```go
// Before: include everything except companions
if cn.HashSize == bytes && cn.Role != "companion" {
```
To:
```go
// After: only include repeaters (per firmware analysis)
if cn.HashSize == bytes && cn.Role == "repeater" {
```

## Why only repeaters?

From [MeshCore firmware
analysis](https://github.com/Kpa-clawbot/CoreScope/issues/441#issuecomment-4185218547):
- Only repeaters override `allowPacketForward()` to return `true`
- Only repeaters append their hash to `path[]` during relay
- Companions, rooms, sensors, observers never forward packets
- Cross-role collisions are benign (companion silently drops, real
repeater still forwards)

## Tests
- `TestHashCollisionsOnlyRepeaters` — verifies companions, rooms,
sensors, and hash_size==0 nodes are all excluded

---------

Co-authored-by: you <you@example.com>
2026-04-03 14:23:13 -07:00
Kpa-clawbot 526ea8a1fc perf(live): chunk VCR replay packet processing to avoid UI freezes (#549)
## Summary

VCR replay functions (`vcrReplayFromTs`, `vcrRewind`,
`fetchNextReplayPage`) fetch up to 10K packets and process them all
synchronously on the main thread via `expandToBufferEntries`, causing
multi-second UI freezes — especially on mobile.

## Fix

- Added `expandToBufferEntriesAsync()` — processes packets in chunks of
200, yielding to the event loop via `setTimeout(0)` between chunks
- Updated all three VCR replay callers to use the async variant
- Kept the synchronous `expandToBufferEntries()` for backward
compatibility (tests, small datasets)
- Exposed `_liveExpandToBufferEntriesAsync` on window for test access

## Perf justification

- **Before:** 10K packets × ~2 observations = 20K+ objects created
synchronously, blocking the main thread for 1-3 seconds on mobile
- **After:** Same work split into chunks of 200 packets (~400 entries)
with event loop yields between chunks. Each chunk takes <5ms, keeping
the UI responsive (well under the 16ms frame budget)
- Chunk size of 200 is tunable via `VCR_CHUNK_SIZE`

## Tests

- Added regression test: sync expand correctness at scale (500 packets →
1000 entries)
- Added structural test: verifies `VCR_CHUNK_SIZE` exists and async
function yields via `setTimeout`
- All existing tests pass (`npm test`)

Fixes #395

---------

Co-authored-by: you <you@example.com>
2026-04-03 21:22:05 +00:00
Kpa-clawbot 8e42febc9c fix: virtual scroll height accounts for expanded group rows (#410) (#547)
## Summary

Fixes #410 — virtual scroll height miscalculation for expanded group
rows.

## Root Cause

When WebSocket messages add children to an already-expanded packet
group, `_rowCounts` becomes stale during the 200ms render debounce
window. Scroll events during this window call `renderVisibleRows()` with
stale row counts, causing wrong total height, spacer heights, and
visible range calculations.

## Changes

**public/packets.js:**
- Added `_rowCountsDirty` flag to track when row counts need
recomputation
- Added `_invalidateRowCounts()` — marks row counts as stale and clears
cumulative cache
- Added `_refreshRowCountsIfDirty()` — lazily recomputes `_rowCounts`
from `_displayPackets`
- Called `_invalidateRowCounts()` when WS handler adds children to
expanded groups (line ~402)
- Called `_refreshRowCountsIfDirty()` at top of `renderVisibleRows()`
before using row counts
- Reset `_rowCountsDirty` in all cleanup paths (destroy, empty display)

**test-packets.js:**
- Added 4 regression tests for `_invalidateRowCounts` /
`_refreshRowCountsIfDirty`

## Complexity

O(n) recomputation of `_rowCounts` when dirty (same as existing
`renderTableRows` path). Only triggers when WS modifies expanded group
children, which is infrequent relative to scroll events.

Co-authored-by: you <you@example.com>
2026-04-03 13:55:23 -07:00
Kpa-clawbot 59bff5462c fix: rate-limit cache invalidation to prevent 0% hit rate (#533) (#546)
## Summary

Fixes #533 — server cache hit rate always 0%.

## Root Cause

`invalidateCachesFor()` is called at the end of every
`IngestNewFromDB()` and `IngestNewObservations()` cycle (~2-5s). Since
new data arrives continuously, caches are cleared faster than any
analytics request can hit them, resulting in a permanent 0% cache hit
rate. The cache TTL (15s/60s) is irrelevant because entries are evicted
by invalidation long before they expire.

## Fix

Rate-limit cache invalidation with a 10-second cooldown:

- First call after cooldown goes through immediately
- Subsequent calls during cooldown accumulate dirty flags in
`pendingInv`
- Next call after cooldown merges pending + current flags and applies
them
- Eviction bypasses cooldown (data removal requires immediate clearing)

Analytics data may be at most ~10s stale, which is acceptable for a
dashboard.

## Changes

- **`store.go`**: Added `lastInvalidated`, `pendingInv`, `invCooldown`
fields. Refactored `invalidateCachesFor()` to rate-limit non-eviction
invalidation. Extracted `applyCacheInvalidation()` helper.
- **`cache_invalidation_test.go`**: Added 4 new tests:
- `TestInvalidationRateLimited` — verifies caches survive during
cooldown
  - `TestInvalidationCooldownAccumulatesFlags` — verifies flag merging
- `TestEvictionBypassesCooldown` — verifies eviction always clears
immediately
- `BenchmarkCacheHitDuringIngestion` — confirms 100% hit rate during
rapid ingestion (was 0%)

## Perf Proof

```
BenchmarkCacheHitDuringIngestion-16    3467889    1018 ns/op    100.0 hit%
```

Before: 0% hit rate under continuous ingestion. After: 100% hit rate
during cooldown periods.

Co-authored-by: you <you@example.com>
2026-04-03 13:53:58 -07:00
Kpa-clawbot 8c1cd8a9fe perf: track advert pubkeys incrementally, eliminate per-request JSON parsing (#360) (#544)
## Summary

`GetPerfStoreStats()` and `GetPerfStoreStatsTyped()` iterated **all**
ADVERT packets and called `json.Unmarshal` on each one — under a read
lock — on every `/api/perf` and `/api/health` request. With 5K+ adverts,
each health check triggered thousands of JSON parses.

## Fix

Added a refcounted `advertPubkeys map[string]int` to `PacketStore` that
tracks distinct pubkeys incrementally during `Load()`,
`IngestNewFromDB()`, and eviction. The perf/health handlers now just
read `len(s.advertPubkeys)` — O(1) with zero allocations.

## Benchmark Results (5K adverts, 200 distinct pubkeys)

| Method | ns/op | allocs/op |
|--------|-------|-----------|
| `GetPerfStoreStatsTyped` | **78** | **0** |
| `GetPerfStoreStats` | **2,565** | **9** |

Before this change, both methods performed O(N) JSON unmarshals per
call.

## Tests Added

- `TestAdvertPubkeyTracking` — verifies incremental tracking through
add/evict lifecycle
- `TestAdvertPubkeyPublicKeyField` — covers the `public_key` JSON field
variant
- `TestAdvertPubkeyNonAdvert` — ensures non-ADVERT packets don't affect
count
- `BenchmarkGetPerfStoreStats` — 5K adverts benchmark
- `BenchmarkGetPerfStoreStatsTyped` — 5K adverts benchmark

Fixes #360

---------

Co-authored-by: you <you@example.com>
2026-04-03 13:51:13 -07:00
Kpa-clawbot 29e8e37114 fix: mobile filter dropdown specificity prevents expansion (#534) (#541)
## Summary

Fixes #534 — mobile filter dropdown doesn't expand on packets page.

## Root Cause

CSS specificity battle in the mobile media query. The hide rule uses
`:not()` pseudo-classes which add specificity:

```css
/* Higher specificity due to :not() */
.filter-bar > *:not(.filter-toggle-btn):not(.col-toggle-wrap) { display: none; }

/* Lower specificity — loses even with .filters-expanded */
.filter-bar.filters-expanded > * { display: inline-flex; }
```

The JS toggle correctly adds/removes `.filters-expanded`, but the CSS
expanded rule could never win.

## Fix

Match the `:not()` selectors in the expanded rule so `.filters-expanded`
makes it strictly more specific:

```css
.filter-bar.filters-expanded > *:not(.filter-toggle-btn):not(.col-toggle-wrap) { display: inline-flex; }
```

Added a comment explaining the specificity dependency so future devs
don't repeat this.

## Tests

Added Playwright E2E test: mobile viewport (480×800), navigates to
packets page, clicks filter toggle, verifies filter inputs become
visible.

---------

Co-authored-by: you <you@example.com>
2026-04-03 13:50:10 -07:00
Kpa-clawbot 9b9f396af5 perf: replace O(n²) observation dedup with map-based O(n) (#355) (#543)
## Summary

Fixes #355 — replaces O(n²) observation dedup in `Load()`,
`IngestNewFromDB()`, and `IngestNewObservations()` with an O(1)
map-based lookup.

## Changes

- Added `obsKeys map[string]bool` field to `StoreTx` for O(1) dedup
keyed on `observerID + "|" + pathJSON`
- Replaced all 3 linear-scan dedup sites in `store.go` with map lookups
- Lazy-init `obsKeys` for transmissions created before this change (in
`IngestNewFromDB` and `IngestNewObservations`)
- Added regression test (`TestObsDedupCorrectness`) verifying dedup
correctness
- Added nil-map safety test (`TestObsDedupNilMapSafety`)
- Added benchmark comparing map vs linear scan

## Benchmark Results (ARM64, 16 cores)

| Observations | Map (O(1)) | Linear (O(n)) | Speedup |
|---|---|---|---|
| 10 | 34 ns/op | 41 ns/op | 1.2x |
| 50 | 34 ns/op | 186 ns/op | 5.5x |
| 100 | 34 ns/op | 361 ns/op | 10.6x |
| 500 | 34 ns/op | 4,903 ns/op | **146x** |

Map lookup is constant time regardless of observation count. The linear
scan degrades quadratically — at 500 observations per transmission
(realistic for popular packets seen by many observers), the old code is
146x slower per dedup check.

All existing tests pass.

---------

Co-authored-by: you <you@example.com>
2026-04-03 13:33:26 -07:00
Kpa-clawbot b472c8de30 perf: replace O(n²) selection sort with sort.Slice (#354) (#542)
## Summary

Fixes #354

Replaces the O(n²) selection sort in `sortedCopy()` with Go's built-in
`sort.Float64s()` (O(n log n)).

## Changes

- **`cmd/server/routes.go`**: Replaced manual nested-loop selection sort
with `sort.Float64s(cp)`
- **`cmd/server/helpers_test.go`**: Added regression test with
1000-element random input + benchmark

## Benchmark Results (ARM64)

```
BenchmarkSortedCopy/n=256     ~16μs/op    1 alloc
BenchmarkSortedCopy/n=1000    ~95μs/op    1 alloc
BenchmarkSortedCopy/n=10000   ~1.3ms/op   1 alloc
```

With the old O(n²) sort, n=10000 would take ~50ms+. The new
implementation scales as O(n log n).

## Testing

- All existing `TestSortedCopy` tests pass (unchanged behavior)
- New `TestSortedCopyLarge` validates correctness on 1000 random
elements
- `go test ./...` passes in `cmd/server`

Co-authored-by: you <you@example.com>
2026-04-03 13:11:59 -07:00
Kpa-clawbot 03e384bbc4 fix: null guard on pathHops prevents crash on ADVERT detail (#538) (#540)
## Summary

Fixes #538 — `null is not an object (evaluating 'pathHops.length')`
crash on ADVERT packet detail.

## Root Cause

`getParsedPath` caches its result as `p._parsedPath`. If another code
path (e.g., object spread, API response) sets `_parsedPath = null`, the
cache check (`!== undefined`) passes and returns `null` — causing
`.length` to crash.

Same pattern exists for `getParsedDecoded`.

## Changes

### `public/packet-helpers.js`
- `getParsedPath`: cached return now uses `|| []` to guard against null
cache
- `getParsedDecoded`: cached return now uses `|| {}` to guard against
null cache

### `public/packets.js`
- `renderDetail()` (line ~1440): defensive `|| []` / `|| {}` on
getParsedPath/getParsedDecoded calls
- `buildFlatRowHtml()` (line ~1103): same defensive guards

### `test-frontend-helpers.js`
- Added test: cached `_parsedPath = null` returns `[]`
- Added test: cached `_parsedDecoded = null` returns `{}`

## Testing

All 428 frontend helper tests pass. All 62 packet filter tests pass.

Co-authored-by: you <you@example.com>
2026-04-03 13:03:20 -07:00
Kpa-clawbot bf8c9e72ec fix: observer filter checks all observations in grouped mode (#537) (#539)
Fixes #537

## Problem
Observer filter in grouped mode only checked `p.observer_id` (the
primary observer), ignoring child observations. Grouped packets seen by
multiple observers would be hidden when filtering for a non-primary
observer.

## Fix
Two filter paths updated to also check `p._children`:

1. **Client-side display filter** (line ~1293): removed the
`!groupByHash` guard and added `_children` check so grouped packets are
included when any child observation matches
2. **WS real-time filter** (line ~360): added `_children` fallback check

The grouped row rendering (line ~1042) already correctly uses
`_observerFilterSet` for child filtering — no changes needed there.

## Tests
Added 5 tests in `test-frontend-helpers.js`:
- Grouped packet with matching child observer is shown
- Grouped packet with no matching observers is hidden  
- WS filter passes/rejects grouped packets correctly
- Source code assertions verifying both filter paths check `_children`

Co-authored-by: you <you@example.com>
2026-04-03 13:02:25 -07:00
Kpa-clawbot 48923db3d0 Add deep linking rule to AGENTS.md (#535)
Adds a rule to AGENTS.md requiring all new UI states to be
URL-addressable (deep-linkable). Part of #536.

Co-authored-by: you <you@example.com>
2026-04-03 13:01:31 -07:00
efiten 709e5a4776 fix: observer filter drops groups in grouped packets view (#464) (#531)
## Summary

- When `groupByHash=true`, each group only carries its representative
(best-path) `observer_id`. The client-side filter was checking only that
field, silently dropping groups that were seen by the selected observer
but had a different representative.
- `loadPackets` now passes the `observer` param to the server so
`filterPackets`/`buildGroupedWhere` do the correct "any observation
matches" check.
- Client-side observer filter in `renderTableRows` is skipped for
grouped mode (server already filtered correctly).
- Both `db.go` and `store.go` observer filtering extended to support
comma-separated IDs (multi-select UI).

## Test plan

- [ ] Set an observer filter on the Packets screen with grouping enabled
— all groups that have **any** observation from the selected observer(s)
should appear, not just groups where that observer is the representative
- [ ] Multi-select two observers — groups seen by either should appear
- [ ] Toggle to flat (ungrouped) mode — per-observation filter still
works correctly
- [ ] Existing grouped packets tests pass: `cd cmd/server && go test
./...`

Fixes #464

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: you <you@example.com>
2026-04-03 09:22:37 -07:00
you 9099154514 docs: add v3.4 release notes v3.4.0 2026-04-03 08:26:05 +00:00
Kpa-clawbot 924caaa680 fix: render both steps AND FAQ on home page (#525) (#529)
Fixes #525

The `checklist()` function in `home.js` treated steps and FAQ/checklist
as mutually exclusive — if `homeCfg.checklist` existed, steps were
skipped entirely. Adding a single FAQ via the customizer made all intro
steps disappear.

Now renders steps first, then FAQ below with a ' FAQ' header. Falls
back to Bay Area hardcoded defaults only when neither exists.

---------

Co-authored-by: you <you@example.com>
2026-04-03 01:19:42 -07:00
Kpa-clawbot ca95fc46aa fix: neighbor UI — show neighbors crash, dark mode contrast (#523) (#527)
## Summary

Part of #523 — fixes bugs 5 and 7 (bug 6 was a duplicate of bug 7).

### Bug 5: Show Neighbors button throws `window._mapSelectRefNode is not
a function`

**Root cause:** Map popup HTML used inline `onclick` calling
`window._mapSelectRefNode`, which was deleted on SPA page destroy. If a
popup persisted after navigation, clicks would throw.

**Fix:** Replaced inline `onclick` with event delegation. A
document-level click handler catches all `[data-show-neighbors]` clicks
and calls `selectReferenceNode` directly. The global
`window._mapSelectRefNode` is still exposed for existing Playwright
tests but is no longer relied upon by the UI.

### Bug 7: Blue text on dark blue background (dark mode contrast)

**Root cause:** Neighbor table cells inside `.node-detail-section` /
`.node-full-card` inherited accent/link color instead of using
`var(--text)`, making text unreadable in dark mode.

**Fix:** Added explicit `color: var(--text)` on `.node-detail-section
.data-table td` and `.node-full-card .data-table td`. Only `<a>` tags
within those cells retain `color: var(--accent)`.

### Files changed
- `public/map.js` — event delegation for Show Neighbors
- `public/style.css` — contrast fix for neighbor table cells

---------

Co-authored-by: you <you@example.com>
2026-04-03 00:49:17 -07:00
Kpa-clawbot 54fab0551e fix: add home defaults to server theme config (#525) (#526)
## Summary

Fixes #525 — Customizer v2 home section shows empty fields and adding
FAQ kills steps.

## Root Cause

Server returned `home: null` from `/api/config/theme` when no home
config existed in config.json or theme.json. The customizer had no
built-in defaults, so all home fields appeared empty. When a user added
a single override (e.g. FAQ), `computeEffective` started from `home:
null`, created `home: {}`, and only applied the user's override — wiping
steps and everything else.

## Fix

### Server-side (primary)
In `handleConfigTheme()`, replaced the conditional `home` assignment
with `mergeMap` using built-in defaults matching what `home.js`
hardcodes:
- `heroTitle`: "CoreScope"
- `heroSubtitle`: "Real-time MeshCore LoRa mesh network analyzer"
- `steps`: 4 default getting-started steps
- `footerLinks`: Packets + Network Map links

Config/theme overrides merge on top, so customization still works.

### Client-side (defense-in-depth)
Added `DEFAULT_HOME` constant in `customize-v2.js`. `computeEffective()`
now falls back to these defaults when server returns `home: null`,
ensuring the customizer works even without server defaults.

## Tests
- **Go**: `TestConfigThemeHomeDefaults` — verifies `/api/config/theme`
returns non-null home with heroTitle, steps, footerLinks when no config
is set
- **JS**: Two new tests in `test-frontend-helpers.js` — verifies
`computeEffective` provides defaults when home is null, and that user
overrides merge correctly with defaults

Co-authored-by: you <you@example.com>
2026-04-03 00:31:03 -07:00