meshcore-analyzer

mirror of https://github.com/Kpa-clawbot/meshcore-analyzer.git synced 2026-05-25 19:14:05 +00:00

Author	SHA1	Message	Date
efiten	d0b597ff49	feat: make Network Overview collapsible, collapsed by default Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 01:59:33 +00:00
efiten	e19b0eba85	feat: link keygen button to meshcore-web-keygen with prefix pre-fill Replace placeholder keygen link with https://agessaman.github.io/meshcore-web-keygen/ which supports ?prefix= URL param for pre-filling the generated prefix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 01:59:33 +00:00
efiten	df75468a8b	feat: add Prefix Tool tab to Analytics page (#347 ) Adds a new "Prefix Tool" tab to the Analytics page with three sections: - Network Overview: per-hash-size collision stats and a size recommendation based on node count - Prefix Checker: accepts a 1/2/3-byte hex prefix or full public key and shows which nodes share that prefix at each tier, with severity badges - Prefix Generator: picks a random collision-free prefix at the chosen hash size, with a link to the MeshCore keygen tool 100% client-side — no new API endpoints. Reuses the existing /nodes list. Supports deep links: ?tab=prefix-tool&prefix=A3F1 and ?generate=2. Adds a "Check a prefix →" link to the Hash Issues tab nav. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 01:59:33 +00:00
you	0a55717283	docs: add PSK brute-force attack with timestamp oracle to security analysis Weak passphrases with no KDF stretching are the #1 practical threat. Timestamp in plaintext block 0 serves as known-plaintext oracle for instant key verification from a single captured packet. Key findings: - decode_base64() output used directly as AES key, no KDF - Short passphrases produce <16 byte keys (reduced key space) - No salt means global precomputed attacks work - 3-word passphrase crackable in ~2 min on commodity GPU Reviewed by djb and Dijkstra personas. Corrections applied: - GPU throughput upgraded from 10^9 to 10^10 AES/sec baseline - Oracle strengthened: bytes 4+ (type byte, sender name) also predictable - Dictionary size assumptions made explicit - Zipf's law caveat added (humans don't choose uniformly) - base64 short-passphrase key truncation issue documented	2026-04-05 00:58:57 +00:00
you	bcab31bf72	docs: AES-128-ECB security analysis — block-level vulnerability assessment Formal analysis of MeshCore's ECB encryption for channel and direct messages. Reviewed by djb and Dijkstra expert personas through 3 revisions. Key findings: - Block 0 has accidental nonce (4-byte timestamp) preventing repetition - Blocks 1+ are pure deterministic ECB with no nonce — vulnerable to frequency analysis for repeated message content - Partial final block attack: zero-padding reduces search space - HMAC key reuse: AES key is first 16 bytes of HMAC key (same material) - Recommended fix: switch to AES-128-CTR mode	2026-04-05 00:44:21 +00:00
Kpa-clawbot	6ae62ce535	perf: make txToMap observations lazy via ExpandObservations flag (#595 ) ## Summary `txToMap()` previously always allocated observation sub-maps for every packet, even though the `/api/packets` handler immediately stripped them via `delete(p, "observations")` unless `expand=observations` was requested. A typical page of 50 packets with ~5 observations each caused 300+ unnecessary map allocations per request. ## Changes - `txToMap`: Add variadic `includeObservations bool` parameter. Observations are only built when `true` is passed, eliminating allocations when they'd just be discarded. - `PacketQuery`: Add `ExpandObservations bool` field to thread the caller's intent through the query pipeline. - `routes.go`: Set `ExpandObservations` based on `expand=observations` query param. Removed the post-hoc `delete(p, "observations")` loop — observations are simply never created when not requested. - Single-packet lookups (`GetPacketByID`, `GetPacketByHash`): Always pass `true` since detail views need observations. - Multi-node/analytics queries: Default (no flag) = no observations, matching prior behavior. ## Testing - Added `TestTxToMapLazyObservations` covering all three cases: no flag, `false`, and `true`. - All existing tests pass (`go test ./...`). ## Perf Impact Eliminates ~250 observation map allocations per /api/packets request (at default page size of 50 with ~5 observations each). This is a constant-factor improvement per request — no algorithmic complexity change. Fixes #374 Co-authored-by: you <you@example.com>	2026-04-04 10:39:30 -07:00
Kpa-clawbot	6e2f79c0ad	perf: optimize QueryGroupedPackets — cache observer count, defer map construction (#594 ) ## Summary Optimizes `QueryGroupedPackets()` in `store.go` to eliminate two major inefficiencies on every grouped packet list request: ### Changes 1. Cache `UniqueObserverCount` on `StoreTx` — Instead of iterating all observations to count unique observers on every query (O(total_observations) per request), we now track unique observers at ingest time via an `observerSet` map and pre-computed `UniqueObserverCount` field. This is updated incrementally as observations arrive. 2. Defer map construction until after pagination — Previously, `map[string]interface{}` was built for ALL 30K+ filtered results before sorting and paginating. Now the grouped cache stores sorted `[]StoreTx` pointers (lightweight), and `groupedTxsToPage()` builds maps only for the requested page (typically 50 items). This eliminates ~30K map allocations per cache miss. 3. Lighter cache footprint* — The grouped cache now stores `[]StoreTx` instead of `PacketResult` with pre-built maps, reducing memory pressure and GC work. ### Complexity - Observer counting: O(1) per query (was O(total_observations)) - Map construction: O(page_size) per query (was O(n) where n = all filtered results) - Sort remains O(n log n) on cache miss, but the cache (3s TTL) absorbs repeated requests ### Testing - `cd cmd/server && go test ./...` — all tests pass - `cd cmd/ingestor && go build ./...` — builds clean Fixes #370 --------- Co-authored-by: you <you@example.com>	2026-04-04 10:39:04 -07:00
Kpa-clawbot	b0862f7a41	fix: replace time.Tick with NewTicker in prune goroutine for graceful shutdown (#593 ) ## Summary Replace `time.Tick()` with `time.NewTicker()` in the auto-prune goroutine so it stops cleanly during graceful shutdown. ## Problem `time.Tick` creates a ticker that can never be garbage collected or stopped. While the prune goroutine runs for the process lifetime, it won't stop during graceful shutdown — the goroutine leaks past the shutdown sequence. ## Fix - Create a `time.NewTicker` and a done channel - Use `select` to listen on both the ticker and done channel - Stop the ticker and close the done channel in the shutdown path (after `poller.Stop()`) - Pattern matches the existing `StartEvictionTicker()` approach ## Testing - `go build ./...` — compiles cleanly - `go test ./...` — all tests pass Fixes #377 Co-authored-by: you <you@example.com>	2026-04-04 10:38:37 -07:00
Kpa-clawbot	45991eca09	perf: combine chained filterPackets passes into single scan (#592 ) ## Summary Combines the chained `filterTxSlice` calls in `filterPackets()` into a single pass over the packet slice. ## Problem When multiple filter parameters are specified (e.g., `type=4&route=1&since=...&until=...`), each filter created a new intermediate `[]StoreTx` slice. With N filters, this meant N separate scans and N-1 unnecessary allocations. ## Fix All filter predicates (type, route, observer, hash, since, until, region, node) are pre-computed before the loop, then evaluated in a single `filterTxSlice` call. This eliminates all intermediate allocations. Preserved behavior:* - Fast-path index lookups for hash-only and observer-only queries remain unchanged - Node-only fast-path via `byNode` index preserved - All existing filter semantics maintained (same comparison operators, same null checks) Complexity: Single `O(n)` pass regardless of how many filters are active, vs previous `O(n * k)` where k = number of active filters (each pass is O(n) but allocates). ## Testing All existing tests pass (`cd cmd/server && go test ./...`). Fixes #373 Co-authored-by: you <you@example.com>	2026-04-04 10:38:10 -07:00
Kpa-clawbot	76c42556a2	perf: sort snrVals/rssiVals once in computeAnalyticsRF (#591 ) ## Summary Sort `snrVals` and `rssiVals` once upfront in `computeAnalyticsRF()` and read min/max/median directly from the sorted slices, instead of copying and sorting per stat call. ## Changes - Sort both slices once before computing stats (2 sorts total instead of 4+ copy+sorts) - Read `min` from `sorted[0]`, `max` from `sorted[len-1]`, `median` from `sorted[len/2]` - Remove the now-unused `sortedF64` and `medianF64` helper closures ## Performance impact With 100K+ observations, this eliminates multiple O(n log n) copy+sort operations. Previously each call to `medianF64` did a full copy + sort, and `minF64`/`maxF64` did O(n) scans on the unsorted array. Now: 2 in-place sorts total, O(1) lookups for min/max/median. Fixes #366 Co-authored-by: you <you@example.com>	2026-04-04 10:37:42 -07:00
Kpa-clawbot	6f8378a31c	perf: batch-remove from secondary indexes in EvictStale (#590 ) ## Summary `EvictStale()` was doing O(n) linear scans per evicted item to remove from secondary indexes (`byObserver`, `byPayloadType`, `byNode`). Evicting 1000 packets from an observer with 50K observations meant 1000 × 50K = 50M comparisons — all under a write lock. ## Fix Replace per-item removal with batch single-pass filtering: 1. Collect phase: Walk evicted packets once, building sets of evicted tx IDs, observation IDs, and affected index keys 2. Filter phase: For each affected index slice, do a single pass keeping only non-evicted entries Before: O(evicted_count × index_slice_size) per index — quadratic in practice After: O(evicted_count + index_slice_size) per affected key — linear ## Changes - `cmd/server/store.go`: Restructured `EvictStale()` eviction loop into collect + batch-filter pattern ## Testing - All existing tests pass (`cd cmd/server && go test ./...`) Fixes #368 Co-authored-by: you <you@example.com>	2026-04-04 10:37:27 -07:00
Kpa-clawbot	56115ee0a4	perf: use byNode index in QueryMultiNodePackets instead of full scan (#589 ) ## Summary `QueryMultiNodePackets()` was scanning ALL packets with `strings.Contains` on JSON blobs — O(packets × pubkeys × json_length). With 30K+ packets and multiple pubkeys, this caused noticeable latency on `/api/packets?nodes=...`. ## Fix Replace the full scan with lookups into the existing `byNode` index, which already maps pubkeys to their transmissions. Merge results with hash-based deduplication, then apply time filters. Before: O(N × P × J) where N=all packets, P=pubkeys, J=avg JSON length After: O(M × P) where M=packets per pubkey (typically small), plus O(R log R) sort for pagination correctness Results are sorted by `FirstSeen` after merging to maintain the oldest-first ordering expected by the pagination logic. Fixes #357 Co-authored-by: you <you@example.com>	2026-04-04 10:36:59 -07:00
Kpa-clawbot	321d1cf913	perf: apply time filter early in GetNodeAnalytics to avoid full packet scan (#588 ) ## Problem `GetNodeAnalytics()` in `store.go` scans ALL 30K+ packets doing `strings.Contains` on every JSON blob when the node has a name, then filters by time range after the full scan. This is `O(packets × json_length)` on every `/api/nodes/{pubkey}/analytics` request. ## Fix Move the `fromISO` time check inside the scan loop so old packets are skipped before the expensive `strings.Contains` matching. For the non-name path (indexed-only), the time filter is also applied inline, eliminating the separate `allPkts` intermediate slice. ### Before 1. Scan all packets → collect matches (including old ones) → `allPkts` 2. Filter `allPkts` by time → `packets` ### After 1. Scan packets, skip `tx.FirstSeen <= fromISO` immediately → `packets` This avoids `strings.Contains` calls on packets outside the requested time window (typically 7 days out of months of data). ## Complexity - Before: `O(total_packets × avg_json_length)` for name matching - After: `O(recent_packets × avg_json_length)` — only packets within the time window are string-matched ## Testing - `cd cmd/server && go test ./...` — all tests pass Fixes #367 Co-authored-by: you <you@example.com>	2026-04-04 10:36:49 -07:00
Kpa-clawbot	790a713ba9	perf: combine 4 subpath API calls into single bulk endpoint (#587 ) ## Summary Consolidates the 4 parallel `/api/analytics/subpaths` calls in the Route Patterns tab into a single `/api/analytics/subpaths-bulk` endpoint, eliminating 3 redundant server-side scans of the subpath index on cache miss. ## Changes ### Backend (`cmd/server/routes.go`, `cmd/server/store.go`) - New `GET /api/analytics/subpaths-bulk?groups=2-2:50,3-3:30,4-4:20,5-8:15` endpoint - Groups format: `minLen-maxLen:limit` comma-separated - `GetAnalyticsSubpathsBulk()` iterates `spIndex` once, bucketing entries into per-group accumulators by hop length - Hop name resolution is done once per raw hop and shared across groups - Results are cached per-group for compatibility with existing single-key cache lookups - Region-filtered queries fall back to individual `GetAnalyticsSubpaths()` calls (region filtering requires per-transmission observer checks) ### Frontend (`public/analytics.js`) - `renderSubpaths()` now makes 1 API call instead of 4 - Response shape: `{ results: [{ subpaths, totalPaths }, ...] }` — destructured into the same `[d2, d3, d4, d5]` variables ### Tests (`cmd/server/routes_test.go`) - `TestAnalyticsSubpathsBulk`: validates 3-group response shape, missing params error, invalid format error ## Performance - Before: 4 API calls → 4 scans of `spIndex` + 4× hop resolution on cache miss - After: 1 API call → 1 scan of `spIndex` + 1× hop resolution (shared cache) - Cache miss cost reduced by ~75% for this tab - No change on cache hit (individual group caching still works) Fixes #398 Co-authored-by: you <you@example.com>	2026-04-04 10:19:18 -07:00
Kpa-clawbot	cd470dffbe	perf: batch observation fetching to eliminate N+1 API calls on sort change (#586 ) ## Summary Fixes the N+1 API call pattern when changing observation sort mode on the packets page. Previously, switching sort to Path or Time fired individual `/api/packets/{hash}` requests for every multi-observation group without cached children — potentially 100+ concurrent requests. ## Changes ### Backend: Batch observations endpoint - New endpoint: `POST /api/packets/observations` accepts `{"hashes": ["h1", "h2", ...]}` and returns all observations keyed by hash in a single response - Capped at 200 hashes per request to prevent abuse - 4 test cases covering empty input, invalid JSON, too-many-hashes, and valid requests ### Frontend: Use batch endpoint - `packets.js` sort change handler now collects all hashes needing observation data and sends a single POST request instead of N individual GETs - Same behavior, single round-trip ## Performance - Before: Changing sort with 100 visible groups → 100 concurrent API requests, browser connection queueing (6 per host), several seconds of lag - After: Single POST request regardless of group count, response time proportional to store lookup (sub-millisecond per hash in memory) Fixes #389 --------- Co-authored-by: you <you@example.com>	2026-04-04 10:18:40 -07:00
Kpa-clawbot	7ff89d8607	perf(packets): coalesce WS-triggered renders with requestAnimationFrame (#585 ) ## Summary Coalesce WS-triggered `renderTableRows()` calls using `requestAnimationFrame` instead of `setTimeout` debouncing. Fixes #396 ## Problem During high WebSocket throughput, multiple WS batches could each trigger a `renderTableRows()` call via `setTimeout(..., 200)`. With rapid batches, this caused the 50K-row table to be fully rebuilt every few hundred milliseconds, causing UI jank. ## Solution Replace the `setTimeout`-based debounce with a `requestAnimationFrame` coalescing pattern: 1. `scheduleWSRender()` — sets a dirty flag and schedules a single rAF callback 2. Dirty flag — multiple WS batches within the same frame just set the flag; only one render fires 3. Cleanup — `destroy()` cancels any pending rAF and resets the dirty flag This ensures at most one `renderTableRows()` per animation frame (~16ms), regardless of how many WS batches arrive. ## Performance justification - Before: Each WS batch → `setTimeout(renderTableRows, 200)` — N batches in <200ms = N renders - After: N batches in one frame → 1 render on next rAF (~16ms) - Worst case goes from O(N) renders per second to O(60) renders per second (frame-capped) ## Changes - `public/packets.js`: Add `scheduleWSRender()` with rAF + dirty flag; replace setTimeout in WS handler; clean up in `destroy()` - `test-frontend-helpers.js`: Update tests to verify rAF coalescing pattern instead of setTimeout debounce ## Testing - All existing tests pass (`npm test` — 0 failures) - Updated 2 test cases to verify new rAF coalescing behavior Co-authored-by: you <you@example.com>	2026-04-04 10:18:09 -07:00
Kpa-clawbot	493849f2e3	perf(frontend): compress og-image.png from 1.1MB to 235KB (#584 ) ## Summary Compress `public/og-image.png` from 1,159,050 bytes (1.1MB) to 234,899 bytes (235KB) — an 80% reduction. ## What Changed - Applied lossy PNG quantization via `pngquant` (quality 45-65, speed 1) - Image dimensions unchanged: 1200×630px (standard OG image size) - Visual quality remains suitable for social media previews ## Why A 1.1MB OpenGraph image is excessive. Typical OG images are 50-200KB. This reduces deployment size and Git repo bloat without affecting functionality (browsers don't preload OG images). ## Testing - Unit tests pass (`npm run test:unit`) - No code changes — image-only commit - `index.html` reference unchanged (`<meta property="og:image" content="/og-image.png">`) Fixes #397 Co-authored-by: you <you@example.com>	2026-04-04 10:17:21 -07:00
Kpa-clawbot	87ac61748c	perf(analytics): compute network status client-side, eliminate redundant API call (#583 ) ## Summary Reduces the analytics nodes tab from 3 parallel API calls to 2 by computing network status (active/degraded/silent counts) client-side instead of fetching from `/nodes/network-status`. ## What Changed `public/analytics.js` — `renderNodesTab()`: - Removed the `/nodes/network-status` API call from the `Promise.all` batch - Added client-side computation of active/degraded/silent counts using the shared `getHealthThresholds()` function from `roles.js` - Uses `nodesResp.total` and `nodesResp.counts` (already returned by `/nodes` endpoint) for total node count and role breakdown ## Why This Works The `/nodes` response already includes: - `total` — count of all matching nodes (server-computed across full DB) - `counts` — role counts across all nodes (from `GetAllRoleCounts()`) - Per-node `last_seen`/`last_heard` timestamps The `getHealthThresholds()` function in `roles.js` provides the same degraded/silent thresholds used server-side, so client-side status computation produces equivalent results for the loaded node set. ## Performance - Before: 3 parallel API calls (`/nodes`, `/nodes/bulk-health`, `/nodes/network-status`) - After: 2 parallel API calls (`/nodes`, `/nodes/bulk-health`) - Network status computation is O(n) over the 200 loaded nodes — negligible client-side cost - The `/nodes/network-status` endpoint scanned ALL nodes in the DB on every call; this eliminates that server-side work entirely ## Testing - All frontend helper tests pass (445/445) - All packet filter tests pass (62/62) - All aging tests pass (29/29) - All Go backend tests pass Fixes #392 --------- Co-authored-by: you <you@example.com>	2026-04-04 10:17:05 -07:00
Kpa-clawbot	26de38f4b6	perf(map): reposition markers on zoom/resize instead of full rebuild (#582 ) ## Summary Eliminates visible marker flicker on zoom/resize events in the map page when displaying 500+ nodes. ## Problem `renderMarkers()` was called on every `zoomend` and `resize` event, which did `markerLayer.clearLayers()` followed by a full rebuild of all markers. With many nodes, this caused a visible flash where all markers disappeared briefly before being re-added. ## Solution Instead of rebuilding all markers from scratch on zoom/resize: 1. Store Leaflet layer references on marker data objects (`_leafletMarker`, `_leafletLine`, `_leafletDot`) during the initial full render 2. Add `_repositionMarkers()` — re-runs `deconflictLabels()` at the new zoom level and updates existing marker positions via `setLatLng()`/`setLatLngs()` without clearing the layer group 3. Debounce zoom/resize handlers (150ms) to coalesce rapid events during animated zooms 4. Dynamically manage offset indicators — adds/removes deconfliction offset lines and dots as positions change at different zoom levels Full `renderMarkers()` is still called for filter changes, data updates, and theme changes — only zoom/resize uses the lightweight repositioning path. ## Complexity - `_repositionMarkers()`: O(n) — single pass over stored marker data - `deconflictLabels()`: O(n × k) where k is max spiral offsets (48) — unchanged - No new API calls, no DOM rebuilds Fixes #393 --------- Co-authored-by: you <you@example.com>	2026-04-04 17:16:48 +00:00
Kpa-clawbot	d2d4c504e8	perf(live): parallelize replayRecent() observation fetches (#581 ) ## Summary `replayRecent()` in `live.js` fetched observation details for 8 packet groups sequentially — each `await fetch()` waited for the previous to complete before starting the next. ## Change Replaced the sequential `for` loop with `Promise.all()` to fetch all 8 detail API calls concurrently. The mapping from results to live packets is unchanged. Before: 8 sequential fetches (total time ≈ sum of all request durations) After: 8 parallel fetches (total time ≈ max of all request durations) ## Notes - `replayRecent()` is currently disabled (commented out at line 856), so this is dormant code — no runtime risk - No behavioral change: same data mapping, same rendering, same VCR buffer population - All existing tests pass Fixes #394 --------- Co-authored-by: you <you@example.com>	2026-04-04 10:16:08 -07:00
Kpa-clawbot	b37e8e2da2	perf(packets): replace N+1 API calls with single expand=observations query (#580 ) ## Summary Eliminates the N+1 API call storm when toggling off "Group by Hash" in the packets table. ## Problem When ungrouped mode was active, `loadPackets()` fired individual `/api/packets/{hash}` requests for every multi-observation packet. With 200+ multi-obs packets, this created 200+ parallel HTTP requests — overwhelming both browser connection limits and the server. ## Fix The server already supports `expand=observations` on the `/api/packets` endpoint, which returns observations inline. Instead of: 1. Always fetching grouped (`groupByHash=true`) 2. Then N+1 fetching each packet's children individually We now: 1. Fetch grouped when grouped mode is active (`groupByHash=true`) 2. Fetch with `expand=observations` when ungrouped — single API call 3. Flatten observations client-side Result: 200+ API calls → 1 API call. ## Changes - `public/packets.js`: Replaced N+1 observation fetching loop with single `expand=observations` query parameter, flatten inline observations client-side. ## Testing - All frontend tests pass (packet-filter: 62/62, frontend-helpers: 445/445) - All Go backend tests pass Fixes #382 Co-authored-by: you <you@example.com>	2026-04-04 10:15:14 -07:00
Kpa-clawbot	45d8116880	perf: query only matching node locations in handleObservers (#579 ) ## Summary `handleObservers()` in `routes.go` was calling `GetNodeLocations()` which fetches ALL nodes from the DB just to match ~10 observer IDs against node public keys. With 500+ nodes this is wasteful. ## Changes - `db.go`: Added `GetNodeLocationsByKeys(keys []string)` — queries only the rows matching the given public keys using a parameterized `WHERE LOWER(public_key) IN (?, ?, ...)` clause. - `routes.go`: `handleObservers` now collects observer IDs and calls the targeted method instead of the full-table scan. - `coverage_test.go`: Added `TestGetNodeLocationsByKeys` covering known key, empty keys, and unknown key cases. ## Performance With ~10 observers and 500+ nodes, the query goes from scanning all 500 rows to fetching only ~10. The original `GetNodeLocations()` is preserved for any other callers. Fixes #378 Co-authored-by: you <you@example.com>	2026-04-04 10:14:37 -07:00
Kpa-clawbot	f68e98c376	perf(live): skip updateTimeline() when tab is hidden (#578 ) ## Summary Skip `updateTimeline()` canvas redraws in `bufferPacket()` when the browser tab is hidden (`_tabHidden === true`). Instead, batch-update the timeline once when the tab becomes visible again via the `visibilitychange` handler. Fixes #385 ## What Changed `public/live.js` — two surgical edits: 1. `bufferPacket()`: Removed `updateTimeline()` call from the `_tabHidden` early-return path. When the tab is backgrounded, packets are still buffered (for VCR) but no canvas work is done. 2. `visibilitychange` handler: Added `updateTimeline()` call when the tab is restored, so the timeline catches up in a single repaint instead of N repaints (one per buffered packet). ## Performance Impact At 5+ packets/sec with a backgrounded tab, this eliminates continuous canvas redraws (`updateTimeline()` calls `ctx.clearRect` + full canvas redraw + `updateTimelinePlayhead()`) that are invisible to the user. CPU usage drops to near-zero for timeline rendering while backgrounded. ## Tests All existing tests pass: - `test-packet-filter.js` — 62 passed - `test-aging.js` — 29 passed - `test-frontend-helpers.js` — 445 passed Co-authored-by: you <you@example.com>	2026-04-04 10:14:13 -07:00
Kpa-clawbot	f3d5d1e021	perf: resolve hops from in-memory prefix map instead of N+1 DB queries (#577 ) ## Summary Replace N+1 per-hop DB queries in `handleResolveHops` with O(1) lookups against the in-memory prefix map that already exists in the packet store. ## Problem Each hop in the `resolve-hops` API triggered a separate `SELECT ... LIKE ?` query against the nodes table. With 10 hops, that's 10 DB round-trips — unnecessary when `getCachedNodesAndPM()` already maintains an in-memory prefix map that can resolve hops instantly. ## Changes - routes.go: Replace the per-hop DB query loop with `pm.m[hopLower]` lookups from the prefix map. Convert `nodeInfo` → `HopCandidate` inline. Remove unused `rows`/`sql.Scan` code. - store.go: Add `InvalidateNodeCache()` method to force prefix map rebuild (needed by tests that insert nodes after store initialization). - routes_test.go: Give `TestResolveHopsAmbiguous` a proper store so hops resolve via the prefix map. - resolve_context_test.go: Call `InvalidateNodeCache()` after inserting test nodes. Fix confidence assertion — with GPS candidates and no affinity context, `resolveWithContext` correctly returns `gps_preference` (previously masked because the prefix map didn't have the test nodes). ## Complexity O(1) per hop lookup via hash map vs O(n) DB scan per hop. No hot-path impact — this endpoint is called on-demand, not in a render loop. Fixes #369 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:51:07 -07:00
Kpa-clawbot	02004c5912	perf: incremental distance index update on path changes (#576 ) ## Summary Replace full `buildDistanceIndex()` rebuild with incremental `removeTxFromDistanceIndex`/`addTxToDistanceIndex` for only the transmissions whose paths actually changed during `IngestNewObservations`. ## Problem When any transmission's best path changed during observation ingestion, the entire distance index was rebuilt — iterating all 30K+ packets, resolving all hops, and computing haversine distances. This `O(total_packets × avg_hops)` operation ran under a write lock, blocking all API readers. A 30-second debounce (`distRebuildInterval`) was added in #557 to mitigate this, but it only delayed the pain — the full rebuild still happened, just less frequently. ## Fix - Added `removeTxFromDistanceIndex(tx)` — filters out all `distHopRecord` and `distPathRecord` entries for a specific transmission - Added `addTxToDistanceIndex(tx)` — computes and appends new distance records for a single transmission - In `IngestNewObservations`, changed path-change handling to call remove+add for each affected tx instead of marking dirty and waiting for a full rebuild - Removed `distDirty`, `distLast`, and `distRebuildInterval` since incremental updates are cheap enough to apply immediately ## Complexity - Before: `O(total_packets × avg_hops)` per rebuild (30K+ packets) - After: `O(changed_txs × avg_hops + total_dist_records)` — the remove is a linear scan of the distance slices, but only for affected txs; the add is `O(hops)` per changed tx The remove scan over `distHops`/`distPaths` slices is linear in slice length, but this is still far cheaper than the full rebuild which also does JSON parsing, hop resolution, and haversine math for every packet. ## Tests - Updated `TestDistanceRebuildDebounce` → `TestDistanceIncrementalUpdate` to verify incremental behavior and check for duplicate path records - All existing tests pass (`go test ./...` in both `cmd/server` and `cmd/ingestor`) Fixes #365 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:50:55 -07:00
Kpa-clawbot	ef30031e2e	perf: cache resolveRegionObservers with 30s TTL (#575 ) ## Summary Cache `resolveRegionObservers()` results with a 30-second TTL to eliminate repeated database queries for region→observer ID mappings. ## Problem `resolveRegionObservers()` queried the database on every call despite the observers table changing infrequently (~20 rows). It's called from 10+ hot paths including `filterPackets()`, `GetChannels()`, and multiple analytics compute functions. When analytics caches are cold, parallel requests each hit the DB independently. ## Solution - Added a dedicated `regionObsMu` mutex + `regionObsCache` map with 30s TTL - Uses a separate mutex (not `s.mu`) to avoid deadlocks — callers already hold `s.mu.RLock()` - Cache is lazily populated per-region and fully invalidated after TTL expires - Follows the same pattern as `getCachedNodesAndPM()` (30s TTL, on-demand rebuild) ## Changes - `cmd/server/store.go`: Added `regionObsMu`, `regionObsCache`, `regionObsCacheTime` fields; rewrote `resolveRegionObservers()` to check cache first; added `fetchAndCacheRegionObs()` helper - `cmd/server/coverage_test.go`: Added `TestResolveRegionObserversCaching` — verifies cache population, cache hits, and nil handling for unknown regions ## Testing - All existing Go tests pass (`go test ./...`) - New test verifies caching behavior (population, hits, nil for unknown regions) Fixes #362 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:50:27 -07:00
Kpa-clawbot	67511ed6a7	perf: combine GetStoreStats into 2 concurrent queries instead of 5 sequential (#574 ) ## Summary `GetStoreStats()` ran 5 sequential DB queries on every call. This combines them into 2 concurrent queries: 1. Node/observer counts — single query using subqueries: `SELECT (SELECT COUNT() FROM nodes WHERE ...), (SELECT COUNT() FROM nodes), (SELECT COUNT() FROM observers)` 2. Observation counts* — single query using conditional aggregation: `SUM(CASE WHEN timestamp > ? THEN 1 ELSE 0 END)` scoped to the 24h window, avoiding a full table scan for the 1h count Both queries run concurrently via goroutines + `sync.WaitGroup`. ## What changed - `cmd/server/store.go`: Rewrote `GetStoreStats()` — 5 sequential `QueryRow` calls → 2 concurrent combined queries - Error handling now propagates query errors instead of silently ignoring them ## Performance justification - Before: 5 sequential round-trips to SQLite, with 2 potentially expensive `COUNT()` scans on the `observations` table - After:* 2 concurrent round-trips; the observation query scans the 24h window once instead of separately scanning for 1h and 24h - The 10s cache (`statsTTL`) remains, so this fires at most once per 10s — but when it does fire, it's ~2.5x fewer round-trips and the observation scan is halved ## Tests - `go test ./...` passes for both `cmd/server` and `cmd/ingestor` Fixes #363 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:48:25 -07:00
Kpa-clawbot	b35b473508	perf(nodes): extract shared fetchNodeDetail() to deduplicate API calls (#573 ) ## Summary Extracts a shared `fetchNodeDetail(pubkey)` helper in `nodes.js` that fetches both `/nodes/{pubkey}` and `/nodes/{pubkey}/health` in parallel. Both `selectNode()` (side panel) and `loadFullNode()` (full-screen view) now call this single function instead of duplicating the fetch logic. ## What Changed - New: `fetchNodeDetail(pubkey)` — shared async function that returns node data with `.healthData` attached - Modified: `loadFullNode()` — uses `fetchNodeDetail()` instead of inline `Promise.all` - Modified: `selectNode()` — uses `fetchNodeDetail()` instead of inline `Promise.all` ## Why The duplicate `api()` calls weren't a major perf issue (TTL caching mitigates most cases), but the duplicated logic was unnecessary tech debt. On mobile, `selectNode()` redirects to `loadFullNode()` via hash change, so the two code paths could fire sequentially with expired cache. ## Testing - All frontend helper tests pass (445/445) - All packet filter tests pass (62/62) - All aging tests pass (29/29) - No behavioral change — only code structure improvement Fixes #391 Co-authored-by: you <you@example.com>	2026-04-04 09:47:59 -07:00
Kpa-clawbot	d4f2c3ac66	perf: index subpath detail lookups instead of scanning all packets (#571 ) ## Summary `GetSubpathDetail()` iterated ALL packets to find those containing a specific subpath — `O(packets × hops × subpath_length)`. With 30K+ packets this caused user-visible latency on every subpath detail click. ## Changes ### `cmd/server/store.go` - Added `spTxIndex map[string][]StoreTx` alongside existing `spIndex` — tracks which transmissions contain each subpath key - Extended `addTxToSubpathIndexFull()` and `removeTxFromSubpathIndexFull()` to maintain both indexes simultaneously - Original `addTxToSubpathIndex()`/`removeTxFromSubpathIndex()` wrappers preserved for backward compatibility - `buildSubpathIndex()` now populates both `spIndex` and `spTxIndex` during `Load()` - All incremental update sites (ingest, path change, eviction) use the `Full` variants - `GetSubpathDetail()` rewritten: direct `O(1)` map lookup on `spTxIndex[key]` instead of scanning all packets ### `cmd/server/coverage_test.go` - Added `TestSubpathTxIndexPopulated`: verifies `spTxIndex` is populated, counts match `spIndex`, and `GetSubpathDetail` returns correct results for both existing and non-existent subpaths ## Complexity - Before:* `O(total_packets × avg_hops × subpath_length)` per request - After: `O(matched_txs)` per request (direct map lookup) ## Tests All tests pass: `cmd/server` (4.6s), `cmd/ingestor` (25.6s) Fixes #358 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:35:00 -07:00
Kpa-clawbot	37300bf5c8	fix: cap prefix map at 8 chars to cut memory ~10x (#570 ) ## Summary `buildPrefixMap()` was generating map entries for every prefix length from 2 to `len(pubkey)` (up to 64 chars), creating ~31 entries per node. With 500 nodes that's ~15K map entries; with 1K+ nodes it balloons to 31K+. ## Changes `cmd/server/store.go`: - Added `maxPrefixLen = 8` constant — MeshCore path hops use 2–6 char prefixes, 8 gives headroom - Capped the prefix generation loop at `maxPrefixLen` instead of `len(pk)` - Added full pubkey as a separate map entry when key is longer than `maxPrefixLen`, ensuring exact-match lookups (used by `resolveWithContext`) still work `cmd/server/coverage_test.go`: - Added `TestPrefixMapCap` with subtests for: - Short prefix resolution still works - Full pubkey exact-match resolution still works - Intermediate prefixes beyond the cap correctly return nil - Short keys (≤8 chars) have all prefix entries - Map size is bounded ## Impact - Map entries per node: ~31 → ~8 (one per prefix length 2–8, plus one full-key entry) - Total map size for 500 nodes: ~15K entries → ~4K entries (~75% reduction) - No behavioral change for path hop resolution (2–6 char prefixes) - No behavioral change for exact pubkey lookups ## Tests All existing tests pass: - `cmd/server`: ✅ - `cmd/ingestor`: ✅ Fixes #364 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:28:38 -07:00
Kpa-clawbot	cb8a2e15c8	perf: index node path lookups instead of scanning all packets (#572 ) ## Summary Index node path lookups in `handleNodePaths()` instead of scanning all packets on every request. ## Problem `handleNodePaths()` iterated ALL packets in the store (`O(total_packets × avg_hops)`) with prefix string matching on every hop. This caused user-facing latency on every node detail page load with 30K+ packets. ## Fix Added a `byPathHop` index (`map[string][]StoreTx`) that maps lowercase hop prefixes and resolved full pubkeys to their transmissions. The handler now does direct map lookups instead of a full scan. ### Index lifecycle - Built* during `Load()` via `buildPathHopIndex()` - Incrementally updated during `IngestNewFromDB()` (new packets) and `IngestNewObservations()` (path changes) - Cleaned up during `EvictStale()` (packet removal) ### Query strategy The handler looks up candidates from the index using: 1. Full pubkey (matches resolved hops from `resolved_path`) 2. 2-char prefix (matches short raw hops) 3. 4-char prefix (matches medium raw hops) 4. Any longer raw hops starting with the 4-char prefix This reduces complexity from `O(total_packets × avg_hops)` to `O(matching_txs + unique_hop_keys)`. ## Tests - `TestNodePathsEndpointUsesIndex` — verifies the endpoint returns correct results using the index - `TestPathHopIndexIncrementalUpdate` — verifies add/remove operations on the index All existing tests pass. Fixes #359 Co-authored-by: you <you@example.com>	2026-04-04 09:25:18 -07:00
Kpa-clawbot	aac038abb9	fix: filter inconsistent hash sizes by role and add 7-day time window (#567 ) ## Summary Fixes #566 — The "Inconsistent Hash Sizes" list on the Analytics page included all node types and had no time window, causing false positives. ## Changes ### 1. Role filter on inconsistent nodes (`cmd/server/store.go`) Added role filter to the `inconsistentNodes` loop in `computeHashCollisions()` so only repeaters and room servers are included. Companions are excluded since they were never affected by the firmware bug. This matches the existing role filter on collision bucketing from #441. ```go // Before: if cn.HashSizeInconsistent { // After: if cn.HashSizeInconsistent && (cn.Role == "repeater" \|\| cn.Role == "room_server") { ``` ### 2. 7-day time window on hash size computation (`cmd/server/store.go`) Added a 7-day recency cutoff to `computeNodeHashSizeInfo()`. Adverts older than 7 days are now skipped, preventing legitimate historical config changes (e.g., testing different byte sizes) from creating permanent false positives. ### 3. Frontend description text (`public/analytics.js`) Updated the description to reflect the filtered scope: now says "Repeaters and room servers" instead of "Nodes", mentions the 7-day window, and notes that companions are excluded. ## Tests - `TestInconsistentNodesExcludesCompanions` — verifies companions are excluded while repeaters and room servers are included - `TestHashSizeInfoTimeWindow` — verifies adverts older than 7 days are excluded from hash size computation - Updated existing hash size tests to use recent timestamps (compatible with the new time window) - All existing tests pass: `cmd/server` ✅, `cmd/ingestor` ✅ ## Perf justification The time window filter adds a single string comparison per advert in the scan loop — O(n) with a tiny constant. No impact on hot paths. --------- Co-authored-by: you <you@example.com>	2026-04-04 09:22:12 -07:00
Kpa-clawbot	588fba226d	perf: track max transmission/observation IDs incrementally (#569 ) ## Summary Replace O(n) map iteration in `MaxTransmissionID()` and `MaxObservationID()` with O(1) field lookups. ## What Changed - Added `maxTxID` and `maxObsID` fields to `PacketStore` - Updated `Load()`, `IngestNewFromDB()`, and `IngestNewObservations()` to track max IDs incrementally as entries are added - `MaxTransmissionID()` and `MaxObservationID()` now return the tracked field directly instead of iterating the entire map ## Performance Before: O(n) iteration over 30K+ map entries under a read lock After: O(1) field return ## Tests - Added `TestMaxTransmissionIDIncremental` verifying the incremental field matches brute-force iteration over the maps - All existing tests pass (`cmd/server` and `cmd/ingestor`) Fixes #356 Co-authored-by: you <you@example.com>	2026-04-04 09:20:17 -07:00
Kpa-clawbot	c670742589	feat: add byte-size filter to map page (#565 ) (#568 ) ## Summary Adds a byte-size filter to the map page, allowing users to filter repeater markers by their hash prefix size (1-byte, 2-byte, or 3-byte). ## What changed `public/map.js` — single file change: 1. New filter state: Added `byteSize` to the `filters` object (default: `'all'`), persisted in `localStorage` 2. New UI section: Added a "Byte Size" fieldset with button group (`All \| 1-byte \| 2-byte \| 3-byte`) in the map controls panel, between "Node Types" and "Display" 3. Filter logic: In `_renderMarkersInner`, when `byteSize !== 'all'`, repeater nodes are filtered by their `hash_size` field. Non-repeater nodes (companions, rooms, sensors) are unaffected — they pass through regardless of the byte-size filter setting 4. Event binding: Button click handlers update the filter, persist to localStorage, and re-render markers ## Design decisions - Client-side only — no backend changes needed. The `hash_size` field is already included in the `/api/nodes` response - Repeaters only — byte size is a repeater configuration concept; other node roles don't have configurable path prefix sizes - Matches existing pattern — uses the same button-group UI as the Status filter (All/Active/Stale) - `hash_size` defaults to 1 — consistent with how the rest of the codebase treats missing `hash_size` (`node.hash_size \|\| 1`) ## Performance No new API calls. Filter is a simple string comparison inside the existing `nodes.filter()` loop in `_renderMarkersInner` — O(1) per node, negligible overhead. Fixes #565 Co-authored-by: you <you@example.com>	2026-04-04 09:14:49 -07:00
efiten	f897ce1b26	fix: use runtime heap stats for memory-based eviction (#564 ) ## Problem Closes #563. Addresses the Packet store estimated memory item in #559. `estimatedMemoryMB()` used a hardcoded formula: ```go return float64(len(s.packets)5120+s.totalObs500) / 1048576.0 ``` This ignored three data structures that grow continuously with every ingest cycle: \| Structure \| Production size \| Heap not counted \| \|---\|---\|---\| \| `distHops []distHopRecord` \| 1,556,833 records \| ~300 MB \| \| `distPaths []distPathRecord` \| 93,090 records \| ~25 MB \| \| `spIndex map[string]int` \| 4,113,234 entries \| ~400 MB \| Result: formula reported ~1.2 GB while actual heap was ~5 GB. With `maxMemoryMB: 1024`, eviction calculated it only needed to shed ~200 MB, removed a handful of packets, and stopped. Memory kept growing until the OOM killer fired. ## Fix Replace `estimatedMemoryMB()` with `runtime.ReadMemStats` so all data structures are automatically counted: ```go func (s *PacketStore) estimatedMemoryMB() float64 { if s.memoryEstimator != nil { return s.memoryEstimator() } var ms runtime.MemStats runtime.ReadMemStats(&ms) return float64(ms.HeapAlloc) / 1048576.0 } ``` Replace the eviction simulation loop (which re-used the same wrong formula) with a proportional calculation: if heap is N× over budget, evict enough packets to keep `(1/N) × 0.9` of the current count. The 0.9 factor adds a 10% buffer so the next ingest cycle doesn't immediately re-trigger. All major data structures (distHops, distPaths, spIndex) scale with packet count, so removing a fraction of packets frees roughly the same fraction of total heap. ## Testing - Updated `TestEvictStale_MemoryBasedEviction` to inject a deterministic estimator via the new `memoryEstimator` field. - Added `TestEvictStale_MemoryBasedEviction_UnderestimatedHeap`: verifies that when actual heap is 5× over limit (the production failure scenario), eviction correctly removes ~80%+ of packets. ``` === RUN TestEvictStale_MemoryBasedEviction [store] Evicted 538 packets (1076 obs) --- PASS === RUN TestEvictStale_MemoryBasedEviction_UnderestimatedHeap [store] Evicted 820 packets (1640 obs) --- PASS ``` Full suite: `go test ./...` — ok (10.3s) ## Perf note `runtime.ReadMemStats` runs once per eviction tick (every 60 s) and once per `/api/perf/store` call. Cost is negligible. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 08:41:54 -07:00
Kpa-clawbot	cbfce41d7e	perf: optimize neighbor graph build (3 fixes for 30s+ CPU) (#562 ) ## Summary Fixes critical performance issue in neighbor graph computation that consumed 65% of CPU (30+ seconds) on a 325K packet dataset. ## Changes ### Fix 1: Cache strings.ToLower results - Added cachedToLower() helper that caches lowercased strings in a local map - Pubkeys repeat across hundreds of thousands of observations - Pre-computes fromLower once per transaction instead of once per observation - Impact: Eliminates ~8.4s (25.3% CPU) ### Fix 2: Cache parsed DecodedJSON via StoreTx.ParsedDecoded() - Added ParsedDecoded() method on StoreTx using sync.Once for thread-safe lazy caching - json.Unmarshal on decoded_json now runs at most once per packet lifetime - Result reused by extractFromNode, indexByNode, trackAdvertPubkey - Impact: Eliminates ~8.8s (26.3% CPU) ### Fix 3: Extend neighbor graph TTL from 60s to 5 minutes - The graph depends on traffic patterns, not individual packets - Reduces rebuild frequency 5x - Impact: ~80% reduction in sustained CPU from graph rebuilds ## Tests - 7 new tests added, all 26+ existing neighbor graph tests pass - BenchmarkBuildFromStore: 727us/op, 237KB/op, 6030 allocs/op Related: #559 --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: you <you@example.com> v3.4.1	2026-04-04 01:25:51 -07:00
you	1e1c4cb91f	fix: include resolved_path in groupByHash packet response QueryGroupedPackets builds its map manually and was missing resolved_path. The non-grouped path (txToMap) included it.	2026-04-04 08:01:35 +00:00
you	0c340e1eb6	fix: set hasResolvedPath flag after ensuring column exists detectSchema() runs at DB open time before ensureResolvedPathColumn() adds the column during Load(). On first run (or any run where the column was just added), hasResolvedPath stayed false, causing Load() to skip reading resolved_path from SQLite. This forced a full backfill of all observations on every restart, burning CPU for minutes on large DBs. Fix: set hasResolvedPath = true after ensureResolvedPathColumn succeeds.	2026-04-04 07:46:25 +00:00
Kpa-clawbot	ae38cdefb4	feat: server-side hop resolution at ingest — resolved_path (#556 ) ## Summary Implements server-side hop prefix resolution at ingest time with a persisted neighbor graph. Hop prefixes in `path_json` are now resolved to full 64-char pubkeys at ingest and stored as `resolved_path` on each observation, eliminating the need for client-side resolution via `HopResolver`. Fixes #555 ## What changed ### New file: `cmd/server/neighbor_persist.go` SQLite persistence layer for the neighbor graph and resolved paths: - `neighbor_edges` table creation and management - Load/build/persist neighbor edges from/to SQLite - `resolved_path` column migration on observations - `resolvePathForObs()` — resolves hop prefixes using `resolveWithContext` with 4-tier priority (affinity → geo → GPS → first match) - Cold startup backfill for observations missing `resolved_path` - Async persistence of edges and resolved paths during ingest (non-blocking) ### Modified: `cmd/server/store.go` - `StoreObs` gains `ResolvedPath []string` field - `StoreTx` gains `ResolvedPath []string` (cached from best observation) - `Load()` dynamically includes `resolved_path` in SQL query when column exists - `IngestNewFromDB()` resolves paths at ingest time and persists asynchronously - `pickBestObservation()` propagates `ResolvedPath` to transmission - `txToMap()` and `enrichObs()` include `resolved_path` in API responses - All 7 `pm.resolve()` call sites migrated to `pm.resolveWithContext()` with the persisted graph - Broadcast maps include `resolved_path` per observation ### Modified: `cmd/server/db.go` - `DB` struct gains `hasResolvedPath bool` flag - `detectSchema()` checks for `resolved_path` column existence - Graceful degradation when column is absent (test DBs, old schemas) ### Modified: `cmd/server/main.go` - Startup sequence: ensure tables → load/build graph → backfill resolved paths → re-pick best observations ### Modified: `cmd/server/routes.go` - `mapSliceToTransmissions()` and `mapSliceToObservations()` propagate `resolved_path` - Node paths handler uses `resolveWithContext` with graph ### Modified: `cmd/server/types.go` - `TransmissionResp` and `ObservationResp` gain `ResolvedPath []string` with `omitempty` ### New file: `cmd/server/neighbor_persist_test.go` 16 tests covering: - Path resolution (unambiguous, empty, unresolvable prefixes) - Marshal/unmarshal of resolved_path JSON - SQLite table creation and column migration (idempotent) - Edge persistence and loading - Schema detection - Full Load() with resolved_path - API response serialization (present when set, omitted when nil) ## Design decisions 1. Async persistence* — resolved paths and neighbor edges are written to SQLite in a goroutine to avoid blocking the ingest loop. The in-memory state is authoritative. 2. Schema compatibility — `DB.hasResolvedPath` flag allows the server to work with databases that don't yet have the `resolved_path` column. SQL queries dynamically include/exclude the column. 3. `pm.resolve()` retained — Not removed as dead code because existing tests use it directly. All production call sites now use `resolveWithContext` with the persisted graph. 4. Edge persistence is conservative — Only unambiguous edges (single candidate) are persisted to `neighbor_edges`. Ambiguous prefixes are handled by the in-memory `NeighborGraph` via Jaccard disambiguation. 5. `null` = unresolved — Ambiguous prefixes store `null` in the resolved_path array. Frontend falls back to prefix display. ## Performance - `resolveWithContext` per hop: ~1-5μs (map lookups, no DB queries) - Typical packet has 0-5 hops → <25μs total resolution overhead per packet - Edge/path persistence is async → zero impact on ingest latency - Backfill is one-time on first startup with the new column ## Test results ``` cd cmd/server && go test ./... -count=1 → ok (4.4s) cd cmd/ingestor && go test ./... -count=1 → ok (25.5s) ``` --------- Co-authored-by: you <you@example.com>	2026-04-04 00:20:59 -07:00
Kpa-clawbot	a97fa52f10	feat: frontend consumers prefer resolved_path (M4, #555 ) (#561 ) ## Summary Implements M4 (frontend consumers) from the [resolved-path spec](https://github.com/Kpa-clawbot/CoreScope/blob/resolved-path-spec/docs/specs/resolved-path.md) for #555. The server (PR #556, M1-M3) now returns `resolved_path` on all packet/observation API responses and WebSocket broadcasts. This PR updates all frontend consumers to prefer `resolved_path` over client-side HopResolver, with full fallback for old packets. ## What changed ### `hop-resolver.js` - Added `resolveFromServer(hops, resolvedPath)` — takes the short hex prefixes and aligned array of full pubkeys from `resolved_path`, looks up node names from the existing nodesList. Returns the same `{ [hop]: { name, pubkey, ... } }` format as `resolve()`. ### `packet-helpers.js` - Added `getResolvedPath(p)` — cached JSON parser for the new `resolved_path` field (mirrors `getParsedPath`). - Updated `clearParsedCache()` to also clear `_parsedResolvedPath`. ### `packets.js` - Bulk load (`loadPackets`): calls `cacheResolvedPaths(packets)` before the existing `resolveHops` fallback. - WebSocket updates: pre-populates `hopNameCache` from `resolved_path` on incoming packets before falling back to HopResolver for any remaining unknown hops. - Group expansion (`pktToggleGroup`): caches resolved paths from child observations. - Packet detail (`selectPacket`): prefers `resolveFromServer` when `resolved_path` is available. - Show Route button: uses `resolved_path` pubkeys directly instead of client-side disambiguation. - Observation spreading: carries `resolved_path` field when constructing observation packets. ### `live.js` - `resolveHopPositions` accepts optional `resolvedPath` parameter; prefers server-resolved pubkeys, falls back to HopResolver for null entries. - Normalized WS packet objects now carry `resolved_path`. ### Files NOT changed (no resolution changes needed) - `analytics.js` — only uses `HopResolver.haversineKm` (a utility function). Topology, subpath, and hop distance data comes pre-resolved from the server API (handled by M2/M3). - `nodes.js` — gets pre-resolved path data from `/nodes/:pubkey/paths` API; no client-side hop resolution. - `map.js` — `drawPacketRoute` already handles full 64-char pubkeys via exact match. The updated `packets.js` now passes full pubkeys from `resolved_path` to the map. ## Fallback pattern ```javascript // In hop-resolver.js function resolveFromServer(hops, resolvedPath) { // Returns resolved entries for non-null pubkeys // Skips null entries (unresolved) — caller falls back to HopResolver } // In packets.js — bulk load await cacheResolvedPaths(packets); // server-side first await resolveHops([...allHops]); // client-side fallback for remaining ``` Old packets without `resolved_path` continue to work exactly as before via the existing HopResolver. `hop-resolver.js` is NOT removed — it remains the fallback. ## Tests - 10 new tests for `resolveFromServer()` and `getResolvedPath()` - All 445 frontend helper tests pass - All 62 packet filter tests pass - All 29 aging tests pass Closes #555 (M4 milestone) --------- Co-authored-by: you <you@example.com>	2026-04-04 00:18:46 -07:00
Kpa-clawbot	43673e86f2	fix: perf stats MaxMB reads from config instead of hardcoded 1024 (#558 ) Perf stats `GetPerfStoreStats` returned a hardcoded `MaxMB: 1024` regardless of the configured `packetStore.maxMemoryMB`. Now reads from `s.maxMemoryMB`. Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-03 23:25:54 -07:00
Kpa-clawbot	81ef51cc5c	fix: debounce distance index rebuild to prevent CPU hot loop (#557 ) ## Problem On busy meshes (325K+ transmissions, 50 observers), the distance index rebuild runs on every ingest poll (~1s interval), computing haversine distances for 1M+ hop records. Each rebuild takes 2-3 seconds but new observations arrive faster than it can finish, creating a CPU hot loop that starves the HTTP server. Discovered on the Cascadia Mesh instance where `corescope-server` was consuming 15 minutes of CPU time in 10 minutes of uptime, the API was completely unresponsive, and health checks were timing out. ### Server logs showing the hot loop: ``` [store] Built distance index: 1797778 hop records, 207072 path records [store] Built distance index: 1797806 hop records, 207075 path records [store] Built distance index: 1797811 hop records, 207075 path records [store] Built distance index: 1797820 hop records, 207075 path records ``` Every 2 seconds, nonstop. ## Root Cause `IngestNewObservations` calls `buildDistanceIndex()` synchronously whenever `pickBestObservation` selects a longer path. With 50 observers sending observations every second, paths change on nearly every poll cycle, triggering a full rebuild each time. ## Fix - Mark distance index dirty on path changes instead of rebuilding inline - Rebuild at most every 30 seconds (configurable via `distLast` timer) - Set `distLast` after initial `Load()` to prevent immediate re-rebuild on first ingest - Distance data is at most 30s stale — acceptable for an analytics view ## Testing - `go build`, `go vet`, `go test` all pass - No behavioral change for the initial load or the analytics API response shape - Distance data freshness goes from real-time to 30s max staleness --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: you <you@example.com>	2026-04-03 23:08:09 -07:00
you	ddce26ff2d	ci: pin build and deploy jobs to meshcore-vm runner	2026-04-04 04:21:48 +00:00
Kpa-clawbot	ee29cc627f	perf: parallelize expanded group fetches, use hashIndex Map lookup (#552 ) ## Summary Fixes #388 — expanded groups were fetched sequentially with O(n) `packets.find()` lookups. ## Changes 1. Parallel fetch: Replaced sequential `for...of + await` loop in `loadPackets()` with `Promise.all()` so all expanded group children are fetched concurrently. 2. O(1) Map lookup: Replaced 3 instances of `packets.find(p => p.hash === hash)` with `hashIndex.get(hash)`: - `loadPackets()` expanded group restore (~line 553) - `select-observation` click handler (~line 1015) - `pktToggleGroup()` (~line 2012) ## Perf justification - Before: N expanded groups → N sequential API calls + N × O(packets.length) array scans - After: N parallel API calls + N × O(1) Map lookups - Typical N is 1-3 (minor severity as noted in issue), but the fix is trivial and correct ## Tests All existing tests pass: `test-packet-filter.js` (62), `test-aging.js` (29), `test-frontend-helpers.js` (433). Co-authored-by: you <you@example.com>	2026-04-03 21:09:17 -07:00
Kpa-clawbot	f3caf42be4	feat: show transport badge in live packet feed (#551 ) ## Summary Show the transport badge ("T") in the live packet feed, matching the packets table (#337). ## Changes - Add `transportBadge(pkt.route_type)` to all 4 feed rendering paths in `live.js`: - Grouped feed items (initial history load) - `addFeedItemDOM()` (VCR replay) - Dedup new feed items (live WebSocket updates) - Node detail panel recent packets list - Uses existing `transportBadge()` from `app.js` and `.badge-transport` CSS from `style.css` ## Testing - 2 new source-level assertions in `test-live.js` verifying `transportBadge()` calls exist - All existing tests pass (67 passed in test-live.js, no new failures) Fixes #338 Co-authored-by: you <you@example.com>	2026-04-03 21:09:02 -07:00
Kpa-clawbot	c34744247a	fix: clean up nodeActivity in pruneStaleNodes to prevent memory leak (#553 ) ## Summary `nodeActivity` (an object tracking per-node packet counts for heatmap intensity) grows without bound — entries are added on every packet flash but never removed, even when stale nodes are pruned. ## Changes - Delete `nodeActivity[key]` alongside `nodeMarkers[key]` and `nodeData[key]` when removing stale WS-only nodes in `pruneStaleNodes()` - Prune orphaned entries — after the main prune loop, sweep `nodeActivity` and delete any key that has no corresponding `nodeData` entry (catches edge cases where nodes were removed by other code paths) - Both run every 60s via the existing `pruneStaleNodes` interval timer ## Testing - Added 2 regression tests in `test-frontend-helpers.js` verifying stale node cleanup and orphan removal - All 435 frontend helper tests pass, plus packet-filter (62) and aging (29) Fixes #390 --------- Co-authored-by: you <you@example.com>	2026-04-03 16:54:53 -07:00
Kpa-clawbot	10f712f9d7	fix: restructure scroll containers for iOS status bar tap-to-scroll (#330 ) (#554 ) ## Summary Fixes #330 — iOS status bar tap-to-scroll broken because `#app` had `overflow: hidden`, preventing `<body>` from being the scroll container. ## Approach: Option B from the issue Instead of a JS polyfill, this restructures scroll containers so `<body>` is the primary scroll container by default, which iOS Safari requires for native status-bar tap-to-scroll. ### How it works `#app` default (body-scroll mode): Uses `min-height` instead of fixed `height`, no `overflow: hidden`. Content pushes beyond the viewport and body scrolls naturally. `#app.app-fixed` (fixed-layout mode): Restores the original `height: calc(100dvh - 52px); overflow: hidden` for pages that need constrained containers. The router in `app.js` toggles this class based on the current page. ### Fixed-layout pages (`.app-fixed`) These pages need fixed-height containers and are unchanged in behavior: - packets — virtual scroll requires fixed-height `.panel-left` to calculate visible rows - nodes — split-panel layout with independently scrollable panels - map — Leaflet requires fixed-dimension container - live — Leaflet map (also has its own `#app:has(.live-page)` override in live.css) - channels — split-panel chat layout - audio-lab — split-panel layout ### Body-scroll pages (no `.app-fixed`) These pages now let the body scroll, enabling iOS tap-to-scroll: - analytics — removed `overflow-y: auto; height: 100%` - observers — removed `overflow-y: auto; height: calc(100vh - 56px)` - traces — removed `overflow-y: auto; height: 100%` - home — removed `#app:has(.home-hero)` override (no longer needed) - compare — removed inline `overflow-y:auto; height:calc(100vh - 56px)` - perf — removed inline `height:100%; overflow-y:auto` - observer-detail — removed inline `overflow-y:auto; height:calc(100vh - 56px)` - node-analytics — removed inline `height:100%; overflow-y:auto` ### Files changed \| File \| Change \| \|------\|--------\| \| `public/style.css` \| `#app` default → `min-height`; added `.app-fixed` class \| \| `public/app.js` \| Router toggles `.app-fixed` based on page \| \| `public/home.css` \| Removed `#app:has()` workaround \| \| `public/compare.js` \| Removed inline overflow/height \| \| `public/perf.js` \| Removed inline overflow/height \| \| `public/observer-detail.js` \| Removed inline overflow/height \| \| `public/node-analytics.js` \| Removed inline overflow/height \| ### What's preserved - Sticky nav (`position: sticky; top: 0`) — works with body scroll - Split-panel resize handles — unchanged, still in fixed containers - Virtual scroll on packets page — unchanged, `.panel-left` still has fixed height - Leaflet maps — unchanged, containers still have fixed dimensions - Mobile responsive overrides — unchanged Co-authored-by: you <you@example.com>	2026-04-03 16:54:36 -07:00
Kpa-clawbot	412a8fdb8f	feat: live map uses affinity-aware hop resolution (#528 ) (#550 ) ## Summary Augments the shared `HopResolver` with neighbor-graph affinity data so that when multiple nodes match a hop prefix, the resolver prefers candidates that are known neighbors of the adjacent hop — instead of relying solely on geo-distance. Fixes #528 ## Changes ### `public/hop-resolver.js` - Added `affinityMap` — stores bidirectional neighbor adjacency with scores - Added `setAffinity(graph)` — ingests `/api/analytics/neighbor-graph` edge data into O(1) Map lookups - Added `getAffinity(pubkeyA, pubkeyB)` — returns affinity score between two nodes (0 if not neighbors) - Added `pickByAffinity(candidates, adjacentPubkey, anchor, ...)` — picks best candidate: affinity-neighbor first (highest score), then geo-distance fallback - Modified forward and backward passes in `resolve()` to track the previously-resolved pubkey and use `pickByAffinity` instead of raw geo-sort ### `public/live.js` - Added `fetchAffinityData()` — fetches `/api/analytics/neighbor-graph` once and calls `HopResolver.setAffinity()` - Added `startAffinityRefresh()` — refreshes affinity data every 60 seconds - Both are called from `loadNodes()` after HopResolver is initialized ### `test-hop-resolver-affinity.js` (new) - Affinity prefers neighbor candidate over geo-closest - Cold start (no affinity data) falls back to geo-closest - Null/undefined affinity doesn't crash - Bidirectional score lookup - Highest affinity score wins among multiple neighbors - Unambiguous hops unaffected by affinity ## Performance - API calls: 1 at load + 1 per 60s (no per-packet calls) - Per-packet resolve: O(1) Map lookups, <0.5ms - Memory: ~50KB for 2K-node graph --------- Co-authored-by: you <you@example.com>	2026-04-03 16:32:53 -07:00
Kpa-clawbot	9a39198d92	fix: only count repeaters in hash collision analysis (#441 ) (#548 ) Fixes #441 ## Summary Hash collision analysis was including ALL node types, inflating collision counts with irrelevant data. Per MeshCore firmware analysis, only repeaters matter for collision analysis — they're the only role that forwards packets and appears in routing `path[]` arrays. ## Root Causes Fixed 1. `hash_size==0` nodes counted in all buckets — nodes with unknown hash size were included via `cn.HashSize == bytes \|\| cn.HashSize == 0`, polluting every bucket 2. Non-repeater roles included — companions, rooms, sensors, and observers were counted even though their hash collisions never cause routing ambiguity ## Fix Changed `computeHashCollisions()` filter from: ```go // Before: include everything except companions if cn.HashSize == bytes && cn.Role != "companion" { ``` To: ```go // After: only include repeaters (per firmware analysis) if cn.HashSize == bytes && cn.Role == "repeater" { ``` ## Why only repeaters? From [MeshCore firmware analysis](https://github.com/Kpa-clawbot/CoreScope/issues/441#issuecomment-4185218547): - Only repeaters override `allowPacketForward()` to return `true` - Only repeaters append their hash to `path[]` during relay - Companions, rooms, sensors, observers never forward packets - Cross-role collisions are benign (companion silently drops, real repeater still forwards) ## Tests - `TestHashCollisionsOnlyRepeaters` — verifies companions, rooms, sensors, and hash_size==0 nodes are all excluded --------- Co-authored-by: you <you@example.com>	2026-04-03 14:23:13 -07:00
Kpa-clawbot	526ea8a1fc	perf(live): chunk VCR replay packet processing to avoid UI freezes (#549 ) ## Summary VCR replay functions (`vcrReplayFromTs`, `vcrRewind`, `fetchNextReplayPage`) fetch up to 10K packets and process them all synchronously on the main thread via `expandToBufferEntries`, causing multi-second UI freezes — especially on mobile. ## Fix - Added `expandToBufferEntriesAsync()` — processes packets in chunks of 200, yielding to the event loop via `setTimeout(0)` between chunks - Updated all three VCR replay callers to use the async variant - Kept the synchronous `expandToBufferEntries()` for backward compatibility (tests, small datasets) - Exposed `_liveExpandToBufferEntriesAsync` on window for test access ## Perf justification - Before: 10K packets × ~2 observations = 20K+ objects created synchronously, blocking the main thread for 1-3 seconds on mobile - After: Same work split into chunks of 200 packets (~400 entries) with event loop yields between chunks. Each chunk takes <5ms, keeping the UI responsive (well under the 16ms frame budget) - Chunk size of 200 is tunable via `VCR_CHUNK_SIZE` ## Tests - Added regression test: sync expand correctness at scale (500 packets → 1000 entries) - Added structural test: verifies `VCR_CHUNK_SIZE` exists and async function yields via `setTimeout` - All existing tests pass (`npm test`) Fixes #395 --------- Co-authored-by: you <you@example.com>	2026-04-03 21:22:05 +00:00

1 2 3 4 5 ...

1258 Commits