meshcore-analyzer

mirror of https://github.com/Kpa-clawbot/meshcore-analyzer.git synced 2026-05-26 13:44:09 +00:00

Author	SHA1	Message	Date
Kpa-clawbot	e42477b810	feat: collapsible panels + medium breakpoint on live map (#606 ) ## Summary Adds collapsible/minimizable UI panels on the live map page so overlay panels don't block map content on medium-sized screens. Fixes #279 ## Changes ### Collapsible Legend Panel (all screen sizes) - The legend toggle button (🎨/✕) is now visible at all screen sizes, not just mobile - Clicking it smoothly collapses/expands the legend with a CSS transition - Collapsed state persists in `localStorage` (`live-legend-hidden`) - Feed panel already had hide/show with localStorage — no changes needed there ### Medium Breakpoint (768px) New `@media (max-width: 768px)` rules for tablet/small laptop screens: - Feed panel: 360px → 280px wide, max-height 340px → 200px - Node detail panel: 320px → 260px wide - Legend: smaller font (10px) and tighter padding - Header: reduced gap and padding - Stats/toggles: smaller font sizes ### What's NOT changed - Mobile (≤640px): existing behavior preserved (feed/legend hidden entirely) - Desktop (>768px): no changes — panels render at full size as before ## Testing - `test-packet-filter.js`: 62 passed - `test-aging.js`: 29 passed - `test-frontend-helpers.js`: 445 passed --------- Co-authored-by: you <you@example.com>	2026-04-04 23:56:07 -07:00
you	cbc3e3ce13	docs: movable UI panels spec — draggable panel positioning (#279 )	2026-04-05 06:54:45 +00:00
you	1796493ec0	docs: channel color highlighting spec (#271 ) Custom color assignment for hash channels in Live tab. Reviewed by Tufte, Torvalds, and Doshi personas.	2026-04-05 06:45:53 +00:00
you	168866ecb6	fix: View Route on Map button works on packet detail page The button click handler used document.getElementById() which fails on /packet/[ID] pages because renderDetail() runs before the container is appended to the DOM. Changed to panel.querySelector() which searches within the detached element tree. Fixes #601	2026-04-05 06:43:59 +00:00
you	be9257cd26	chore: switch license to GPL v3 Copyleft ensures all derivative works remain open source.	2026-04-05 06:36:03 +00:00
you	b5b6faf90a	chore: switch license from MIT to Apache 2.0 Adds patent protection for contributors while maintaining the same permissive usage rights.	2026-04-05 06:35:38 +00:00
you	592061ec7e	chore: add MIT license	2026-04-05 06:32:28 +00:00
you	596ccf2322	fix(rf-health): offset TX/RX airtime labels when overlapping When TX and RX values are within 12px, TX label shifts up and RX shifts down to avoid rendering on top of each other.	2026-04-05 06:31:02 +00:00
Kpa-clawbot	232770a858	feat(rf-health): M2 — airtime, error rate, battery charts with delta computation (#605 ) ## M2: Airtime + Channel Quality + Battery Charts Implements M2 of #600 — server-side delta computation and three new charts in the RF Health detail view. ### Backend Changes Delta computation for cumulative counters (`tx_air_secs`, `rx_air_secs`, `recv_errors`): - Computes per-interval deltas between consecutive samples - Reboot handling: detects counter reset (current < previous), skips that delta, records reboot timestamp - Gap handling: if time between samples > 2× interval, inserts null (no interpolation) - Returns `tx_airtime_pct` and `rx_airtime_pct` as percentages (delta_secs / interval_secs × 100) - Returns `recv_error_rate` as delta_errors / (delta_recv + delta_errors) × 100 `resolution` query param on `/api/observers/{id}/metrics`: - `5m` (default) — raw samples - `1h` — hourly aggregates (GROUP BY hour with AVG/MAX) - `1d` — daily aggregates Schema additions: - `packets_sent` and `packets_recv` columns added to `observer_metrics` (migration) - Ingestor parses these fields from MQTT stats messages API response now includes: - `tx_airtime_pct`, `rx_airtime_pct`, `recv_error_rate` (computed deltas) - `reboots` array with timestamps of detected reboots - `is_reboot_sample` flag on affected samples ### Frontend Changes Three new charts in the RF Health detail view, stacked vertically below noise floor: 1. Airtime chart — TX (red) + RX (blue) as separate SVG lines, Y-axis 0-100%, direct labels at endpoints 2. Error Rate chart — `recv_error_rate` line, shown only when data exists 3. Battery chart — voltage line with 3.3V low reference, shown only when battery_mv > 0 All charts: - Share X-axis and time range (aligned vertically) - Reboot markers as vertical hairlines spanning all charts - Direct labels on data (no legends) - Resolution auto-selected: `1h` for 7d/30d ranges - Charts hidden when no data exists ### Tests - `TestComputeDeltas`: normal deltas, reboot detection, gap detection - `TestGetObserverMetricsResolution`: 5m/1h/1d downsampling verification - Updated `TestGetObserverMetrics` for new API signature --------- Co-authored-by: you <you@example.com>	2026-04-04 23:17:17 -07:00
you	747aea37b7	fix(rf-health): add region filter support to metrics summary Frontend passes RegionFilter query string to summary API. Backend filters results by observer IATA region. Added iata field to MetricsSummaryRow.	2026-04-05 06:00:42 +00:00
you	968c104e14	feat(rf-health): show observer detail in side panel instead of page bottom - Change RF Health detail view from bottom-of-page to a right-sliding side panel - Grid stays visible and stable when detail is open (no layout shift) - Click another observer updates panel in place; close button (×) dismisses - On mobile (<640px): panel stacks below grid at full width - Filter out observers with insufficient data (<2 sparkline points) from grid entirely - Follows the same split-layout pattern used by the nodes page	2026-04-05 05:53:42 +00:00
Kpa-clawbot	6f35d4d417	feat: RF Health Dashboard M1 — observer metrics + small multiples grid (#604 ) ## RF Health Dashboard — M1: Observer Metrics Storage, API & Small Multiples Grid Implements M1 of #600. ### What this does Adds a complete RF health monitoring pipeline: MQTT stats ingestion → SQLite storage → REST API → interactive dashboard with small multiples grid. ### Backend Changes Ingestor (`cmd/ingestor/`) - New `observer_metrics` table via migration system (`_migrations` pattern) - Parse `tx_air_secs`, `rx_air_secs`, `recv_errors` from MQTT status messages (same pattern as existing `noise_floor` and `battery_mv`) - `INSERT OR REPLACE` with timestamps rounded to nearest 5-min interval boundary (using ingestor wall clock, not observer timestamps) - Missing fields stored as NULLs — partial data is always better than no data - Configurable retention pruning: `retention.metricsDays` (default 30), runs on startup + every 24h Server (`cmd/server/`) - `GET /api/observers/{id}/metrics?since=...&until=...` — per-observer time-series data - `GET /api/observers/metrics/summary?window=24h` — fleet summary with current NF, avg/max NF, sample count - `parseWindowDuration()` supports `1h`, `24h`, `3d`, `7d`, `30d` etc. - Server-side metrics retention pruning (same config, staggered 2min after packet prune) ### Frontend Changes RF Health tab (`public/analytics.js`, `public/style.css`) - Small multiples grid showing all observers simultaneously — anomalies pop out visually - Per-observer cell: name, current NF value, battery voltage, sparkline, avg/max stats - NF status coloring: warning (amber) at ≥-100 dBm, critical (red) at ≥-85 dBm — text color only, no background fills - Click any cell → expanded detail view with full noise floor line chart - Reference lines with direct text labels (`-100 warning`, `-85 critical`) — not color bands - Min/max points labeled directly on the chart - Time range selector: preset buttons (1h/3h/6h/12h/24h/3d/7d/30d) + custom from/to datetime picker - Deep linking: `#/analytics?tab=rf-health&observer=...&range=...` - All charts use SVG, matching existing analytics.js patterns - Responsive: 3-4 columns on desktop, 1 on mobile ### Design Decisions (from spec) - Labels directly on data, not in legends - Reference lines with text labels, not color bands - Small multiples grid, not card+accordion (Tufte: instant visual fleet comparison) - Ingestor wall clock for all timestamps (observer clocks may drift) ### Tests Added Ingestor tests: - `TestRoundToInterval` — 5 cases for rounding to 5-min boundaries - `TestInsertMetrics` — basic insertion with all fields - `TestInsertMetricsIdempotent` — INSERT OR REPLACE deduplication - `TestInsertMetricsNullFields` — partial data with NULLs - `TestPruneOldMetrics` — retention pruning - `TestExtractObserverMetaNewFields` — parsing tx_air_secs, rx_air_secs, recv_errors Server tests: - `TestGetObserverMetrics` — time-series query with since/until filters, NULL handling - `TestGetMetricsSummary` — fleet summary aggregation - `TestObserverMetricsAPIEndpoints` — DB query verification - `TestMetricsAPIEndpoints` — HTTP endpoint response shape - `TestParseWindowDuration` — duration parsing for h/d formats ### Test Results ``` cd cmd/ingestor && go test ./... → PASS (26s) cd cmd/server && go test ./... → PASS (5s) ``` ### What's NOT in this PR (deferred to M2+) - Server-side delta computation for cumulative counters - Airtime charts (TX/RX percentage lines) - Channel quality chart (recv_error_rate) - Battery voltage chart - Reboot detection and chart annotations - Resolution downsampling (1h, 1d aggregates) - Pattern detection / automated diagnosis --------- Co-authored-by: you <you@example.com>	2026-04-04 22:21:35 -07:00
you	aaf00d0616	docs: add M5 Prometheus/Grafana metrics export to RF Health spec	2026-04-05 05:02:36 +00:00
you	41c046c974	docs: RF Health Dashboard spec — observer radio metrics Per-observer time-series charts for noise floor, TX/RX airtime, CRC errors, and battery. Small multiples grid design. MVP-first milestones. Reviewed by Carmack (perf), Munger (failure modes), radio expert (hardware), Tufte (visualization), and Doshi (product strategy).	2026-04-05 04:42:32 +00:00
efiten	1fbdd1c3d3	feat: Prefix Tool tab on Analytics page (#347 ) (#599 ) ## Summary - Adds a new Prefix Tool tab to the Analytics page (alongside Hash Stats / Hash Issues) - Network Overview: per-tier collision stats (1/2/3-byte) and a network-size-based recommendation — collapsible, folded by default - Prefix Checker: accepts a 1/2/3-byte hex prefix or full public key; shows colliding nodes at each tier with severity badges (✅ / ⚠️ / 🔴); clicking a node navigates to its detail page - Prefix Generator: picks a random collision-free prefix at the chosen hash size; links to [meshcore-web-keygen](https://agessaman.github.io/meshcore-web-keygen/) with the prefix pre-filled - Hash Issues tab: adds a "🔎 Check a prefix →" shortcut in the nav - Deep-link support: `#/analytics?tab=prefix-tool&prefix=A3F1` pre-fills and runs the checker; `?generate=2` pre-selects and runs the generator - No new API endpoints — 100% client-side using the existing `/nodes` list ## Verification Live on staging: https://staging.on8ar.eu/#/analytics?tab=prefix-tool ## Test plan - [x] Network Overview card is collapsed by default; expands on click; stats are correct - [x] Prefix Checker: 2-char input shows 1-byte results; 4-char shows 2-byte; 6-char shows 3-byte; 64-char pubkey shows all three tiers - [x] Prefix Checker: invalid hex shows error; odd-length input shows error - [x] Prefix Generator: Generate picks an unused prefix; "Try another" cycles; keygen link opens with prefix pre-filled - [x] Deep link `?prefix=A3F1` pre-fills checker and scrolls to it - [x] Deep link `?generate=2` pre-selects 2-byte and runs generator - [x] Hash Issues tab shows "🔎 Check a prefix →" in the nav - [x] FAQ link at bottom of generator opens correct MeshCore docs anchor 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 20:18:32 -07:00
efiten	d34320fa6c	fix: use _getColCount() in error-state row to match spacers (#406 ) (#597 ) ## Summary The error-state `<tbody>` row (shown when packet loading fails) hardcoded `colspan="10"`, while the virtual scroll spacers and the empty-state row both use `_getColCount()` (which reads from the actual `<thead>` and falls back to 11). One-line fix: replace the hardcoded value with `_getColCount()`. Fixes #406 ## Test plan - [x] Trigger the error state (e.g. kill the backend mid-load) — error row should span all columns with no gap on the right - [x] `node test-packets.js` — 72 passed, 0 failed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 19:41:55 -07:00
efiten	77b7c33d0f	perf: incremental DOM diff in renderVisibleRows (#414 ) (#596 ) ## Summary - Replace full \`tbody\` teardown+rebuild on every scroll frame with a range-diff that only adds/removes the delta rows at the edges of the visible window - \`buildFlatRowHtml\` / \`buildGroupRowHtml\` now accept an \`entryIdx\` parameter and emit \`data-entry-idx\` on every \`<tr>\` so the diff can target rows precisely (including expanded group children) - Full rebuild is retained for initial render and large scroll jumps past the buffer (no range overlap) - Also loads \`packet-helpers.js\` in the test sandbox, fixing 7 pre-existing test failures for the builder functions; adds 4 new tests covering \`data-entry-idx\` output Fixes #414 ## Test plan - [x] Open packets page with 500+ packets, scroll rapidly — DOM inspector should show incremental \`<tr>\` adds/removes rather than full \`tbody\` teardown - [x] Expand a grouped packet, scroll away and back — expanded children re-render correctly - [x] Large scroll jump (jump to bottom via scrollbar) — full rebuild fires, no visual glitch - [x] \`node test-packets.js\` — 72 passed, 0 failed 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: you <you@example.com>	2026-04-04 19:41:33 -07:00
you	0a55717283	docs: add PSK brute-force attack with timestamp oracle to security analysis Weak passphrases with no KDF stretching are the #1 practical threat. Timestamp in plaintext block 0 serves as known-plaintext oracle for instant key verification from a single captured packet. Key findings: - decode_base64() output used directly as AES key, no KDF - Short passphrases produce <16 byte keys (reduced key space) - No salt means global precomputed attacks work - 3-word passphrase crackable in ~2 min on commodity GPU Reviewed by djb and Dijkstra personas. Corrections applied: - GPU throughput upgraded from 10^9 to 10^10 AES/sec baseline - Oracle strengthened: bytes 4+ (type byte, sender name) also predictable - Dictionary size assumptions made explicit - Zipf's law caveat added (humans don't choose uniformly) - base64 short-passphrase key truncation issue documented	2026-04-05 00:58:57 +00:00
you	bcab31bf72	docs: AES-128-ECB security analysis — block-level vulnerability assessment Formal analysis of MeshCore's ECB encryption for channel and direct messages. Reviewed by djb and Dijkstra expert personas through 3 revisions. Key findings: - Block 0 has accidental nonce (4-byte timestamp) preventing repetition - Blocks 1+ are pure deterministic ECB with no nonce — vulnerable to frequency analysis for repeated message content - Partial final block attack: zero-padding reduces search space - HMAC key reuse: AES key is first 16 bytes of HMAC key (same material) - Recommended fix: switch to AES-128-CTR mode	2026-04-05 00:44:21 +00:00
Kpa-clawbot	6ae62ce535	perf: make txToMap observations lazy via ExpandObservations flag (#595 ) ## Summary `txToMap()` previously always allocated observation sub-maps for every packet, even though the `/api/packets` handler immediately stripped them via `delete(p, "observations")` unless `expand=observations` was requested. A typical page of 50 packets with ~5 observations each caused 300+ unnecessary map allocations per request. ## Changes - `txToMap`: Add variadic `includeObservations bool` parameter. Observations are only built when `true` is passed, eliminating allocations when they'd just be discarded. - `PacketQuery`: Add `ExpandObservations bool` field to thread the caller's intent through the query pipeline. - `routes.go`: Set `ExpandObservations` based on `expand=observations` query param. Removed the post-hoc `delete(p, "observations")` loop — observations are simply never created when not requested. - Single-packet lookups (`GetPacketByID`, `GetPacketByHash`): Always pass `true` since detail views need observations. - Multi-node/analytics queries: Default (no flag) = no observations, matching prior behavior. ## Testing - Added `TestTxToMapLazyObservations` covering all three cases: no flag, `false`, and `true`. - All existing tests pass (`go test ./...`). ## Perf Impact Eliminates ~250 observation map allocations per /api/packets request (at default page size of 50 with ~5 observations each). This is a constant-factor improvement per request — no algorithmic complexity change. Fixes #374 Co-authored-by: you <you@example.com>	2026-04-04 10:39:30 -07:00
Kpa-clawbot	6e2f79c0ad	perf: optimize QueryGroupedPackets — cache observer count, defer map construction (#594 ) ## Summary Optimizes `QueryGroupedPackets()` in `store.go` to eliminate two major inefficiencies on every grouped packet list request: ### Changes 1. Cache `UniqueObserverCount` on `StoreTx` — Instead of iterating all observations to count unique observers on every query (O(total_observations) per request), we now track unique observers at ingest time via an `observerSet` map and pre-computed `UniqueObserverCount` field. This is updated incrementally as observations arrive. 2. Defer map construction until after pagination — Previously, `map[string]interface{}` was built for ALL 30K+ filtered results before sorting and paginating. Now the grouped cache stores sorted `[]StoreTx` pointers (lightweight), and `groupedTxsToPage()` builds maps only for the requested page (typically 50 items). This eliminates ~30K map allocations per cache miss. 3. Lighter cache footprint* — The grouped cache now stores `[]StoreTx` instead of `PacketResult` with pre-built maps, reducing memory pressure and GC work. ### Complexity - Observer counting: O(1) per query (was O(total_observations)) - Map construction: O(page_size) per query (was O(n) where n = all filtered results) - Sort remains O(n log n) on cache miss, but the cache (3s TTL) absorbs repeated requests ### Testing - `cd cmd/server && go test ./...` — all tests pass - `cd cmd/ingestor && go build ./...` — builds clean Fixes #370 --------- Co-authored-by: you <you@example.com>	2026-04-04 10:39:04 -07:00
Kpa-clawbot	b0862f7a41	fix: replace time.Tick with NewTicker in prune goroutine for graceful shutdown (#593 ) ## Summary Replace `time.Tick()` with `time.NewTicker()` in the auto-prune goroutine so it stops cleanly during graceful shutdown. ## Problem `time.Tick` creates a ticker that can never be garbage collected or stopped. While the prune goroutine runs for the process lifetime, it won't stop during graceful shutdown — the goroutine leaks past the shutdown sequence. ## Fix - Create a `time.NewTicker` and a done channel - Use `select` to listen on both the ticker and done channel - Stop the ticker and close the done channel in the shutdown path (after `poller.Stop()`) - Pattern matches the existing `StartEvictionTicker()` approach ## Testing - `go build ./...` — compiles cleanly - `go test ./...` — all tests pass Fixes #377 Co-authored-by: you <you@example.com>	2026-04-04 10:38:37 -07:00
Kpa-clawbot	45991eca09	perf: combine chained filterPackets passes into single scan (#592 ) ## Summary Combines the chained `filterTxSlice` calls in `filterPackets()` into a single pass over the packet slice. ## Problem When multiple filter parameters are specified (e.g., `type=4&route=1&since=...&until=...`), each filter created a new intermediate `[]StoreTx` slice. With N filters, this meant N separate scans and N-1 unnecessary allocations. ## Fix All filter predicates (type, route, observer, hash, since, until, region, node) are pre-computed before the loop, then evaluated in a single `filterTxSlice` call. This eliminates all intermediate allocations. Preserved behavior:* - Fast-path index lookups for hash-only and observer-only queries remain unchanged - Node-only fast-path via `byNode` index preserved - All existing filter semantics maintained (same comparison operators, same null checks) Complexity: Single `O(n)` pass regardless of how many filters are active, vs previous `O(n * k)` where k = number of active filters (each pass is O(n) but allocates). ## Testing All existing tests pass (`cd cmd/server && go test ./...`). Fixes #373 Co-authored-by: you <you@example.com>	2026-04-04 10:38:10 -07:00
Kpa-clawbot	76c42556a2	perf: sort snrVals/rssiVals once in computeAnalyticsRF (#591 ) ## Summary Sort `snrVals` and `rssiVals` once upfront in `computeAnalyticsRF()` and read min/max/median directly from the sorted slices, instead of copying and sorting per stat call. ## Changes - Sort both slices once before computing stats (2 sorts total instead of 4+ copy+sorts) - Read `min` from `sorted[0]`, `max` from `sorted[len-1]`, `median` from `sorted[len/2]` - Remove the now-unused `sortedF64` and `medianF64` helper closures ## Performance impact With 100K+ observations, this eliminates multiple O(n log n) copy+sort operations. Previously each call to `medianF64` did a full copy + sort, and `minF64`/`maxF64` did O(n) scans on the unsorted array. Now: 2 in-place sorts total, O(1) lookups for min/max/median. Fixes #366 Co-authored-by: you <you@example.com>	2026-04-04 10:37:42 -07:00
Kpa-clawbot	6f8378a31c	perf: batch-remove from secondary indexes in EvictStale (#590 ) ## Summary `EvictStale()` was doing O(n) linear scans per evicted item to remove from secondary indexes (`byObserver`, `byPayloadType`, `byNode`). Evicting 1000 packets from an observer with 50K observations meant 1000 × 50K = 50M comparisons — all under a write lock. ## Fix Replace per-item removal with batch single-pass filtering: 1. Collect phase: Walk evicted packets once, building sets of evicted tx IDs, observation IDs, and affected index keys 2. Filter phase: For each affected index slice, do a single pass keeping only non-evicted entries Before: O(evicted_count × index_slice_size) per index — quadratic in practice After: O(evicted_count + index_slice_size) per affected key — linear ## Changes - `cmd/server/store.go`: Restructured `EvictStale()` eviction loop into collect + batch-filter pattern ## Testing - All existing tests pass (`cd cmd/server && go test ./...`) Fixes #368 Co-authored-by: you <you@example.com>	2026-04-04 10:37:27 -07:00
Kpa-clawbot	56115ee0a4	perf: use byNode index in QueryMultiNodePackets instead of full scan (#589 ) ## Summary `QueryMultiNodePackets()` was scanning ALL packets with `strings.Contains` on JSON blobs — O(packets × pubkeys × json_length). With 30K+ packets and multiple pubkeys, this caused noticeable latency on `/api/packets?nodes=...`. ## Fix Replace the full scan with lookups into the existing `byNode` index, which already maps pubkeys to their transmissions. Merge results with hash-based deduplication, then apply time filters. Before: O(N × P × J) where N=all packets, P=pubkeys, J=avg JSON length After: O(M × P) where M=packets per pubkey (typically small), plus O(R log R) sort for pagination correctness Results are sorted by `FirstSeen` after merging to maintain the oldest-first ordering expected by the pagination logic. Fixes #357 Co-authored-by: you <you@example.com>	2026-04-04 10:36:59 -07:00
Kpa-clawbot	321d1cf913	perf: apply time filter early in GetNodeAnalytics to avoid full packet scan (#588 ) ## Problem `GetNodeAnalytics()` in `store.go` scans ALL 30K+ packets doing `strings.Contains` on every JSON blob when the node has a name, then filters by time range after the full scan. This is `O(packets × json_length)` on every `/api/nodes/{pubkey}/analytics` request. ## Fix Move the `fromISO` time check inside the scan loop so old packets are skipped before the expensive `strings.Contains` matching. For the non-name path (indexed-only), the time filter is also applied inline, eliminating the separate `allPkts` intermediate slice. ### Before 1. Scan all packets → collect matches (including old ones) → `allPkts` 2. Filter `allPkts` by time → `packets` ### After 1. Scan packets, skip `tx.FirstSeen <= fromISO` immediately → `packets` This avoids `strings.Contains` calls on packets outside the requested time window (typically 7 days out of months of data). ## Complexity - Before: `O(total_packets × avg_json_length)` for name matching - After: `O(recent_packets × avg_json_length)` — only packets within the time window are string-matched ## Testing - `cd cmd/server && go test ./...` — all tests pass Fixes #367 Co-authored-by: you <you@example.com>	2026-04-04 10:36:49 -07:00
Kpa-clawbot	790a713ba9	perf: combine 4 subpath API calls into single bulk endpoint (#587 ) ## Summary Consolidates the 4 parallel `/api/analytics/subpaths` calls in the Route Patterns tab into a single `/api/analytics/subpaths-bulk` endpoint, eliminating 3 redundant server-side scans of the subpath index on cache miss. ## Changes ### Backend (`cmd/server/routes.go`, `cmd/server/store.go`) - New `GET /api/analytics/subpaths-bulk?groups=2-2:50,3-3:30,4-4:20,5-8:15` endpoint - Groups format: `minLen-maxLen:limit` comma-separated - `GetAnalyticsSubpathsBulk()` iterates `spIndex` once, bucketing entries into per-group accumulators by hop length - Hop name resolution is done once per raw hop and shared across groups - Results are cached per-group for compatibility with existing single-key cache lookups - Region-filtered queries fall back to individual `GetAnalyticsSubpaths()` calls (region filtering requires per-transmission observer checks) ### Frontend (`public/analytics.js`) - `renderSubpaths()` now makes 1 API call instead of 4 - Response shape: `{ results: [{ subpaths, totalPaths }, ...] }` — destructured into the same `[d2, d3, d4, d5]` variables ### Tests (`cmd/server/routes_test.go`) - `TestAnalyticsSubpathsBulk`: validates 3-group response shape, missing params error, invalid format error ## Performance - Before: 4 API calls → 4 scans of `spIndex` + 4× hop resolution on cache miss - After: 1 API call → 1 scan of `spIndex` + 1× hop resolution (shared cache) - Cache miss cost reduced by ~75% for this tab - No change on cache hit (individual group caching still works) Fixes #398 Co-authored-by: you <you@example.com>	2026-04-04 10:19:18 -07:00
Kpa-clawbot	cd470dffbe	perf: batch observation fetching to eliminate N+1 API calls on sort change (#586 ) ## Summary Fixes the N+1 API call pattern when changing observation sort mode on the packets page. Previously, switching sort to Path or Time fired individual `/api/packets/{hash}` requests for every multi-observation group without cached children — potentially 100+ concurrent requests. ## Changes ### Backend: Batch observations endpoint - New endpoint: `POST /api/packets/observations` accepts `{"hashes": ["h1", "h2", ...]}` and returns all observations keyed by hash in a single response - Capped at 200 hashes per request to prevent abuse - 4 test cases covering empty input, invalid JSON, too-many-hashes, and valid requests ### Frontend: Use batch endpoint - `packets.js` sort change handler now collects all hashes needing observation data and sends a single POST request instead of N individual GETs - Same behavior, single round-trip ## Performance - Before: Changing sort with 100 visible groups → 100 concurrent API requests, browser connection queueing (6 per host), several seconds of lag - After: Single POST request regardless of group count, response time proportional to store lookup (sub-millisecond per hash in memory) Fixes #389 --------- Co-authored-by: you <you@example.com>	2026-04-04 10:18:40 -07:00
Kpa-clawbot	7ff89d8607	perf(packets): coalesce WS-triggered renders with requestAnimationFrame (#585 ) ## Summary Coalesce WS-triggered `renderTableRows()` calls using `requestAnimationFrame` instead of `setTimeout` debouncing. Fixes #396 ## Problem During high WebSocket throughput, multiple WS batches could each trigger a `renderTableRows()` call via `setTimeout(..., 200)`. With rapid batches, this caused the 50K-row table to be fully rebuilt every few hundred milliseconds, causing UI jank. ## Solution Replace the `setTimeout`-based debounce with a `requestAnimationFrame` coalescing pattern: 1. `scheduleWSRender()` — sets a dirty flag and schedules a single rAF callback 2. Dirty flag — multiple WS batches within the same frame just set the flag; only one render fires 3. Cleanup — `destroy()` cancels any pending rAF and resets the dirty flag This ensures at most one `renderTableRows()` per animation frame (~16ms), regardless of how many WS batches arrive. ## Performance justification - Before: Each WS batch → `setTimeout(renderTableRows, 200)` — N batches in <200ms = N renders - After: N batches in one frame → 1 render on next rAF (~16ms) - Worst case goes from O(N) renders per second to O(60) renders per second (frame-capped) ## Changes - `public/packets.js`: Add `scheduleWSRender()` with rAF + dirty flag; replace setTimeout in WS handler; clean up in `destroy()` - `test-frontend-helpers.js`: Update tests to verify rAF coalescing pattern instead of setTimeout debounce ## Testing - All existing tests pass (`npm test` — 0 failures) - Updated 2 test cases to verify new rAF coalescing behavior Co-authored-by: you <you@example.com>	2026-04-04 10:18:09 -07:00
Kpa-clawbot	493849f2e3	perf(frontend): compress og-image.png from 1.1MB to 235KB (#584 ) ## Summary Compress `public/og-image.png` from 1,159,050 bytes (1.1MB) to 234,899 bytes (235KB) — an 80% reduction. ## What Changed - Applied lossy PNG quantization via `pngquant` (quality 45-65, speed 1) - Image dimensions unchanged: 1200×630px (standard OG image size) - Visual quality remains suitable for social media previews ## Why A 1.1MB OpenGraph image is excessive. Typical OG images are 50-200KB. This reduces deployment size and Git repo bloat without affecting functionality (browsers don't preload OG images). ## Testing - Unit tests pass (`npm run test:unit`) - No code changes — image-only commit - `index.html` reference unchanged (`<meta property="og:image" content="/og-image.png">`) Fixes #397 Co-authored-by: you <you@example.com>	2026-04-04 10:17:21 -07:00
Kpa-clawbot	87ac61748c	perf(analytics): compute network status client-side, eliminate redundant API call (#583 ) ## Summary Reduces the analytics nodes tab from 3 parallel API calls to 2 by computing network status (active/degraded/silent counts) client-side instead of fetching from `/nodes/network-status`. ## What Changed `public/analytics.js` — `renderNodesTab()`: - Removed the `/nodes/network-status` API call from the `Promise.all` batch - Added client-side computation of active/degraded/silent counts using the shared `getHealthThresholds()` function from `roles.js` - Uses `nodesResp.total` and `nodesResp.counts` (already returned by `/nodes` endpoint) for total node count and role breakdown ## Why This Works The `/nodes` response already includes: - `total` — count of all matching nodes (server-computed across full DB) - `counts` — role counts across all nodes (from `GetAllRoleCounts()`) - Per-node `last_seen`/`last_heard` timestamps The `getHealthThresholds()` function in `roles.js` provides the same degraded/silent thresholds used server-side, so client-side status computation produces equivalent results for the loaded node set. ## Performance - Before: 3 parallel API calls (`/nodes`, `/nodes/bulk-health`, `/nodes/network-status`) - After: 2 parallel API calls (`/nodes`, `/nodes/bulk-health`) - Network status computation is O(n) over the 200 loaded nodes — negligible client-side cost - The `/nodes/network-status` endpoint scanned ALL nodes in the DB on every call; this eliminates that server-side work entirely ## Testing - All frontend helper tests pass (445/445) - All packet filter tests pass (62/62) - All aging tests pass (29/29) - All Go backend tests pass Fixes #392 --------- Co-authored-by: you <you@example.com>	2026-04-04 10:17:05 -07:00
Kpa-clawbot	26de38f4b6	perf(map): reposition markers on zoom/resize instead of full rebuild (#582 ) ## Summary Eliminates visible marker flicker on zoom/resize events in the map page when displaying 500+ nodes. ## Problem `renderMarkers()` was called on every `zoomend` and `resize` event, which did `markerLayer.clearLayers()` followed by a full rebuild of all markers. With many nodes, this caused a visible flash where all markers disappeared briefly before being re-added. ## Solution Instead of rebuilding all markers from scratch on zoom/resize: 1. Store Leaflet layer references on marker data objects (`_leafletMarker`, `_leafletLine`, `_leafletDot`) during the initial full render 2. Add `_repositionMarkers()` — re-runs `deconflictLabels()` at the new zoom level and updates existing marker positions via `setLatLng()`/`setLatLngs()` without clearing the layer group 3. Debounce zoom/resize handlers (150ms) to coalesce rapid events during animated zooms 4. Dynamically manage offset indicators — adds/removes deconfliction offset lines and dots as positions change at different zoom levels Full `renderMarkers()` is still called for filter changes, data updates, and theme changes — only zoom/resize uses the lightweight repositioning path. ## Complexity - `_repositionMarkers()`: O(n) — single pass over stored marker data - `deconflictLabels()`: O(n × k) where k is max spiral offsets (48) — unchanged - No new API calls, no DOM rebuilds Fixes #393 --------- Co-authored-by: you <you@example.com>	2026-04-04 17:16:48 +00:00
Kpa-clawbot	d2d4c504e8	perf(live): parallelize replayRecent() observation fetches (#581 ) ## Summary `replayRecent()` in `live.js` fetched observation details for 8 packet groups sequentially — each `await fetch()` waited for the previous to complete before starting the next. ## Change Replaced the sequential `for` loop with `Promise.all()` to fetch all 8 detail API calls concurrently. The mapping from results to live packets is unchanged. Before: 8 sequential fetches (total time ≈ sum of all request durations) After: 8 parallel fetches (total time ≈ max of all request durations) ## Notes - `replayRecent()` is currently disabled (commented out at line 856), so this is dormant code — no runtime risk - No behavioral change: same data mapping, same rendering, same VCR buffer population - All existing tests pass Fixes #394 --------- Co-authored-by: you <you@example.com>	2026-04-04 10:16:08 -07:00
Kpa-clawbot	b37e8e2da2	perf(packets): replace N+1 API calls with single expand=observations query (#580 ) ## Summary Eliminates the N+1 API call storm when toggling off "Group by Hash" in the packets table. ## Problem When ungrouped mode was active, `loadPackets()` fired individual `/api/packets/{hash}` requests for every multi-observation packet. With 200+ multi-obs packets, this created 200+ parallel HTTP requests — overwhelming both browser connection limits and the server. ## Fix The server already supports `expand=observations` on the `/api/packets` endpoint, which returns observations inline. Instead of: 1. Always fetching grouped (`groupByHash=true`) 2. Then N+1 fetching each packet's children individually We now: 1. Fetch grouped when grouped mode is active (`groupByHash=true`) 2. Fetch with `expand=observations` when ungrouped — single API call 3. Flatten observations client-side Result: 200+ API calls → 1 API call. ## Changes - `public/packets.js`: Replaced N+1 observation fetching loop with single `expand=observations` query parameter, flatten inline observations client-side. ## Testing - All frontend tests pass (packet-filter: 62/62, frontend-helpers: 445/445) - All Go backend tests pass Fixes #382 Co-authored-by: you <you@example.com>	2026-04-04 10:15:14 -07:00
Kpa-clawbot	45d8116880	perf: query only matching node locations in handleObservers (#579 ) ## Summary `handleObservers()` in `routes.go` was calling `GetNodeLocations()` which fetches ALL nodes from the DB just to match ~10 observer IDs against node public keys. With 500+ nodes this is wasteful. ## Changes - `db.go`: Added `GetNodeLocationsByKeys(keys []string)` — queries only the rows matching the given public keys using a parameterized `WHERE LOWER(public_key) IN (?, ?, ...)` clause. - `routes.go`: `handleObservers` now collects observer IDs and calls the targeted method instead of the full-table scan. - `coverage_test.go`: Added `TestGetNodeLocationsByKeys` covering known key, empty keys, and unknown key cases. ## Performance With ~10 observers and 500+ nodes, the query goes from scanning all 500 rows to fetching only ~10. The original `GetNodeLocations()` is preserved for any other callers. Fixes #378 Co-authored-by: you <you@example.com>	2026-04-04 10:14:37 -07:00
Kpa-clawbot	f68e98c376	perf(live): skip updateTimeline() when tab is hidden (#578 ) ## Summary Skip `updateTimeline()` canvas redraws in `bufferPacket()` when the browser tab is hidden (`_tabHidden === true`). Instead, batch-update the timeline once when the tab becomes visible again via the `visibilitychange` handler. Fixes #385 ## What Changed `public/live.js` — two surgical edits: 1. `bufferPacket()`: Removed `updateTimeline()` call from the `_tabHidden` early-return path. When the tab is backgrounded, packets are still buffered (for VCR) but no canvas work is done. 2. `visibilitychange` handler: Added `updateTimeline()` call when the tab is restored, so the timeline catches up in a single repaint instead of N repaints (one per buffered packet). ## Performance Impact At 5+ packets/sec with a backgrounded tab, this eliminates continuous canvas redraws (`updateTimeline()` calls `ctx.clearRect` + full canvas redraw + `updateTimelinePlayhead()`) that are invisible to the user. CPU usage drops to near-zero for timeline rendering while backgrounded. ## Tests All existing tests pass: - `test-packet-filter.js` — 62 passed - `test-aging.js` — 29 passed - `test-frontend-helpers.js` — 445 passed Co-authored-by: you <you@example.com>	2026-04-04 10:14:13 -07:00
Kpa-clawbot	f3d5d1e021	perf: resolve hops from in-memory prefix map instead of N+1 DB queries (#577 ) ## Summary Replace N+1 per-hop DB queries in `handleResolveHops` with O(1) lookups against the in-memory prefix map that already exists in the packet store. ## Problem Each hop in the `resolve-hops` API triggered a separate `SELECT ... LIKE ?` query against the nodes table. With 10 hops, that's 10 DB round-trips — unnecessary when `getCachedNodesAndPM()` already maintains an in-memory prefix map that can resolve hops instantly. ## Changes - routes.go: Replace the per-hop DB query loop with `pm.m[hopLower]` lookups from the prefix map. Convert `nodeInfo` → `HopCandidate` inline. Remove unused `rows`/`sql.Scan` code. - store.go: Add `InvalidateNodeCache()` method to force prefix map rebuild (needed by tests that insert nodes after store initialization). - routes_test.go: Give `TestResolveHopsAmbiguous` a proper store so hops resolve via the prefix map. - resolve_context_test.go: Call `InvalidateNodeCache()` after inserting test nodes. Fix confidence assertion — with GPS candidates and no affinity context, `resolveWithContext` correctly returns `gps_preference` (previously masked because the prefix map didn't have the test nodes). ## Complexity O(1) per hop lookup via hash map vs O(n) DB scan per hop. No hot-path impact — this endpoint is called on-demand, not in a render loop. Fixes #369 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:51:07 -07:00
Kpa-clawbot	02004c5912	perf: incremental distance index update on path changes (#576 ) ## Summary Replace full `buildDistanceIndex()` rebuild with incremental `removeTxFromDistanceIndex`/`addTxToDistanceIndex` for only the transmissions whose paths actually changed during `IngestNewObservations`. ## Problem When any transmission's best path changed during observation ingestion, the entire distance index was rebuilt — iterating all 30K+ packets, resolving all hops, and computing haversine distances. This `O(total_packets × avg_hops)` operation ran under a write lock, blocking all API readers. A 30-second debounce (`distRebuildInterval`) was added in #557 to mitigate this, but it only delayed the pain — the full rebuild still happened, just less frequently. ## Fix - Added `removeTxFromDistanceIndex(tx)` — filters out all `distHopRecord` and `distPathRecord` entries for a specific transmission - Added `addTxToDistanceIndex(tx)` — computes and appends new distance records for a single transmission - In `IngestNewObservations`, changed path-change handling to call remove+add for each affected tx instead of marking dirty and waiting for a full rebuild - Removed `distDirty`, `distLast`, and `distRebuildInterval` since incremental updates are cheap enough to apply immediately ## Complexity - Before: `O(total_packets × avg_hops)` per rebuild (30K+ packets) - After: `O(changed_txs × avg_hops + total_dist_records)` — the remove is a linear scan of the distance slices, but only for affected txs; the add is `O(hops)` per changed tx The remove scan over `distHops`/`distPaths` slices is linear in slice length, but this is still far cheaper than the full rebuild which also does JSON parsing, hop resolution, and haversine math for every packet. ## Tests - Updated `TestDistanceRebuildDebounce` → `TestDistanceIncrementalUpdate` to verify incremental behavior and check for duplicate path records - All existing tests pass (`go test ./...` in both `cmd/server` and `cmd/ingestor`) Fixes #365 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:50:55 -07:00
Kpa-clawbot	ef30031e2e	perf: cache resolveRegionObservers with 30s TTL (#575 ) ## Summary Cache `resolveRegionObservers()` results with a 30-second TTL to eliminate repeated database queries for region→observer ID mappings. ## Problem `resolveRegionObservers()` queried the database on every call despite the observers table changing infrequently (~20 rows). It's called from 10+ hot paths including `filterPackets()`, `GetChannels()`, and multiple analytics compute functions. When analytics caches are cold, parallel requests each hit the DB independently. ## Solution - Added a dedicated `regionObsMu` mutex + `regionObsCache` map with 30s TTL - Uses a separate mutex (not `s.mu`) to avoid deadlocks — callers already hold `s.mu.RLock()` - Cache is lazily populated per-region and fully invalidated after TTL expires - Follows the same pattern as `getCachedNodesAndPM()` (30s TTL, on-demand rebuild) ## Changes - `cmd/server/store.go`: Added `regionObsMu`, `regionObsCache`, `regionObsCacheTime` fields; rewrote `resolveRegionObservers()` to check cache first; added `fetchAndCacheRegionObs()` helper - `cmd/server/coverage_test.go`: Added `TestResolveRegionObserversCaching` — verifies cache population, cache hits, and nil handling for unknown regions ## Testing - All existing Go tests pass (`go test ./...`) - New test verifies caching behavior (population, hits, nil for unknown regions) Fixes #362 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:50:27 -07:00
Kpa-clawbot	67511ed6a7	perf: combine GetStoreStats into 2 concurrent queries instead of 5 sequential (#574 ) ## Summary `GetStoreStats()` ran 5 sequential DB queries on every call. This combines them into 2 concurrent queries: 1. Node/observer counts — single query using subqueries: `SELECT (SELECT COUNT() FROM nodes WHERE ...), (SELECT COUNT() FROM nodes), (SELECT COUNT() FROM observers)` 2. Observation counts* — single query using conditional aggregation: `SUM(CASE WHEN timestamp > ? THEN 1 ELSE 0 END)` scoped to the 24h window, avoiding a full table scan for the 1h count Both queries run concurrently via goroutines + `sync.WaitGroup`. ## What changed - `cmd/server/store.go`: Rewrote `GetStoreStats()` — 5 sequential `QueryRow` calls → 2 concurrent combined queries - Error handling now propagates query errors instead of silently ignoring them ## Performance justification - Before: 5 sequential round-trips to SQLite, with 2 potentially expensive `COUNT()` scans on the `observations` table - After:* 2 concurrent round-trips; the observation query scans the 24h window once instead of separately scanning for 1h and 24h - The 10s cache (`statsTTL`) remains, so this fires at most once per 10s — but when it does fire, it's ~2.5x fewer round-trips and the observation scan is halved ## Tests - `go test ./...` passes for both `cmd/server` and `cmd/ingestor` Fixes #363 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:48:25 -07:00
Kpa-clawbot	b35b473508	perf(nodes): extract shared fetchNodeDetail() to deduplicate API calls (#573 ) ## Summary Extracts a shared `fetchNodeDetail(pubkey)` helper in `nodes.js` that fetches both `/nodes/{pubkey}` and `/nodes/{pubkey}/health` in parallel. Both `selectNode()` (side panel) and `loadFullNode()` (full-screen view) now call this single function instead of duplicating the fetch logic. ## What Changed - New: `fetchNodeDetail(pubkey)` — shared async function that returns node data with `.healthData` attached - Modified: `loadFullNode()` — uses `fetchNodeDetail()` instead of inline `Promise.all` - Modified: `selectNode()` — uses `fetchNodeDetail()` instead of inline `Promise.all` ## Why The duplicate `api()` calls weren't a major perf issue (TTL caching mitigates most cases), but the duplicated logic was unnecessary tech debt. On mobile, `selectNode()` redirects to `loadFullNode()` via hash change, so the two code paths could fire sequentially with expired cache. ## Testing - All frontend helper tests pass (445/445) - All packet filter tests pass (62/62) - All aging tests pass (29/29) - No behavioral change — only code structure improvement Fixes #391 Co-authored-by: you <you@example.com>	2026-04-04 09:47:59 -07:00
Kpa-clawbot	d4f2c3ac66	perf: index subpath detail lookups instead of scanning all packets (#571 ) ## Summary `GetSubpathDetail()` iterated ALL packets to find those containing a specific subpath — `O(packets × hops × subpath_length)`. With 30K+ packets this caused user-visible latency on every subpath detail click. ## Changes ### `cmd/server/store.go` - Added `spTxIndex map[string][]StoreTx` alongside existing `spIndex` — tracks which transmissions contain each subpath key - Extended `addTxToSubpathIndexFull()` and `removeTxFromSubpathIndexFull()` to maintain both indexes simultaneously - Original `addTxToSubpathIndex()`/`removeTxFromSubpathIndex()` wrappers preserved for backward compatibility - `buildSubpathIndex()` now populates both `spIndex` and `spTxIndex` during `Load()` - All incremental update sites (ingest, path change, eviction) use the `Full` variants - `GetSubpathDetail()` rewritten: direct `O(1)` map lookup on `spTxIndex[key]` instead of scanning all packets ### `cmd/server/coverage_test.go` - Added `TestSubpathTxIndexPopulated`: verifies `spTxIndex` is populated, counts match `spIndex`, and `GetSubpathDetail` returns correct results for both existing and non-existent subpaths ## Complexity - Before:* `O(total_packets × avg_hops × subpath_length)` per request - After: `O(matched_txs)` per request (direct map lookup) ## Tests All tests pass: `cmd/server` (4.6s), `cmd/ingestor` (25.6s) Fixes #358 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:35:00 -07:00
Kpa-clawbot	37300bf5c8	fix: cap prefix map at 8 chars to cut memory ~10x (#570 ) ## Summary `buildPrefixMap()` was generating map entries for every prefix length from 2 to `len(pubkey)` (up to 64 chars), creating ~31 entries per node. With 500 nodes that's ~15K map entries; with 1K+ nodes it balloons to 31K+. ## Changes `cmd/server/store.go`: - Added `maxPrefixLen = 8` constant — MeshCore path hops use 2–6 char prefixes, 8 gives headroom - Capped the prefix generation loop at `maxPrefixLen` instead of `len(pk)` - Added full pubkey as a separate map entry when key is longer than `maxPrefixLen`, ensuring exact-match lookups (used by `resolveWithContext`) still work `cmd/server/coverage_test.go`: - Added `TestPrefixMapCap` with subtests for: - Short prefix resolution still works - Full pubkey exact-match resolution still works - Intermediate prefixes beyond the cap correctly return nil - Short keys (≤8 chars) have all prefix entries - Map size is bounded ## Impact - Map entries per node: ~31 → ~8 (one per prefix length 2–8, plus one full-key entry) - Total map size for 500 nodes: ~15K entries → ~4K entries (~75% reduction) - No behavioral change for path hop resolution (2–6 char prefixes) - No behavioral change for exact pubkey lookups ## Tests All existing tests pass: - `cmd/server`: ✅ - `cmd/ingestor`: ✅ Fixes #364 --------- Co-authored-by: you <you@example.com>	2026-04-04 09:28:38 -07:00
Kpa-clawbot	cb8a2e15c8	perf: index node path lookups instead of scanning all packets (#572 ) ## Summary Index node path lookups in `handleNodePaths()` instead of scanning all packets on every request. ## Problem `handleNodePaths()` iterated ALL packets in the store (`O(total_packets × avg_hops)`) with prefix string matching on every hop. This caused user-facing latency on every node detail page load with 30K+ packets. ## Fix Added a `byPathHop` index (`map[string][]StoreTx`) that maps lowercase hop prefixes and resolved full pubkeys to their transmissions. The handler now does direct map lookups instead of a full scan. ### Index lifecycle - Built* during `Load()` via `buildPathHopIndex()` - Incrementally updated during `IngestNewFromDB()` (new packets) and `IngestNewObservations()` (path changes) - Cleaned up during `EvictStale()` (packet removal) ### Query strategy The handler looks up candidates from the index using: 1. Full pubkey (matches resolved hops from `resolved_path`) 2. 2-char prefix (matches short raw hops) 3. 4-char prefix (matches medium raw hops) 4. Any longer raw hops starting with the 4-char prefix This reduces complexity from `O(total_packets × avg_hops)` to `O(matching_txs + unique_hop_keys)`. ## Tests - `TestNodePathsEndpointUsesIndex` — verifies the endpoint returns correct results using the index - `TestPathHopIndexIncrementalUpdate` — verifies add/remove operations on the index All existing tests pass. Fixes #359 Co-authored-by: you <you@example.com>	2026-04-04 09:25:18 -07:00
Kpa-clawbot	aac038abb9	fix: filter inconsistent hash sizes by role and add 7-day time window (#567 ) ## Summary Fixes #566 — The "Inconsistent Hash Sizes" list on the Analytics page included all node types and had no time window, causing false positives. ## Changes ### 1. Role filter on inconsistent nodes (`cmd/server/store.go`) Added role filter to the `inconsistentNodes` loop in `computeHashCollisions()` so only repeaters and room servers are included. Companions are excluded since they were never affected by the firmware bug. This matches the existing role filter on collision bucketing from #441. ```go // Before: if cn.HashSizeInconsistent { // After: if cn.HashSizeInconsistent && (cn.Role == "repeater" \|\| cn.Role == "room_server") { ``` ### 2. 7-day time window on hash size computation (`cmd/server/store.go`) Added a 7-day recency cutoff to `computeNodeHashSizeInfo()`. Adverts older than 7 days are now skipped, preventing legitimate historical config changes (e.g., testing different byte sizes) from creating permanent false positives. ### 3. Frontend description text (`public/analytics.js`) Updated the description to reflect the filtered scope: now says "Repeaters and room servers" instead of "Nodes", mentions the 7-day window, and notes that companions are excluded. ## Tests - `TestInconsistentNodesExcludesCompanions` — verifies companions are excluded while repeaters and room servers are included - `TestHashSizeInfoTimeWindow` — verifies adverts older than 7 days are excluded from hash size computation - Updated existing hash size tests to use recent timestamps (compatible with the new time window) - All existing tests pass: `cmd/server` ✅, `cmd/ingestor` ✅ ## Perf justification The time window filter adds a single string comparison per advert in the scan loop — O(n) with a tiny constant. No impact on hot paths. --------- Co-authored-by: you <you@example.com>	2026-04-04 09:22:12 -07:00
Kpa-clawbot	588fba226d	perf: track max transmission/observation IDs incrementally (#569 ) ## Summary Replace O(n) map iteration in `MaxTransmissionID()` and `MaxObservationID()` with O(1) field lookups. ## What Changed - Added `maxTxID` and `maxObsID` fields to `PacketStore` - Updated `Load()`, `IngestNewFromDB()`, and `IngestNewObservations()` to track max IDs incrementally as entries are added - `MaxTransmissionID()` and `MaxObservationID()` now return the tracked field directly instead of iterating the entire map ## Performance Before: O(n) iteration over 30K+ map entries under a read lock After: O(1) field return ## Tests - Added `TestMaxTransmissionIDIncremental` verifying the incremental field matches brute-force iteration over the maps - All existing tests pass (`cmd/server` and `cmd/ingestor`) Fixes #356 Co-authored-by: you <you@example.com>	2026-04-04 09:20:17 -07:00
Kpa-clawbot	c670742589	feat: add byte-size filter to map page (#565 ) (#568 ) ## Summary Adds a byte-size filter to the map page, allowing users to filter repeater markers by their hash prefix size (1-byte, 2-byte, or 3-byte). ## What changed `public/map.js` — single file change: 1. New filter state: Added `byteSize` to the `filters` object (default: `'all'`), persisted in `localStorage` 2. New UI section: Added a "Byte Size" fieldset with button group (`All \| 1-byte \| 2-byte \| 3-byte`) in the map controls panel, between "Node Types" and "Display" 3. Filter logic: In `_renderMarkersInner`, when `byteSize !== 'all'`, repeater nodes are filtered by their `hash_size` field. Non-repeater nodes (companions, rooms, sensors) are unaffected — they pass through regardless of the byte-size filter setting 4. Event binding: Button click handlers update the filter, persist to localStorage, and re-render markers ## Design decisions - Client-side only — no backend changes needed. The `hash_size` field is already included in the `/api/nodes` response - Repeaters only — byte size is a repeater configuration concept; other node roles don't have configurable path prefix sizes - Matches existing pattern — uses the same button-group UI as the Status filter (All/Active/Stale) - `hash_size` defaults to 1 — consistent with how the rest of the codebase treats missing `hash_size` (`node.hash_size \|\| 1`) ## Performance No new API calls. Filter is a simple string comparison inside the existing `nodes.filter()` loop in `_renderMarkersInner` — O(1) per node, negligible overhead. Fixes #565 Co-authored-by: you <you@example.com>	2026-04-04 09:14:49 -07:00
efiten	f897ce1b26	fix: use runtime heap stats for memory-based eviction (#564 ) ## Problem Closes #563. Addresses the Packet store estimated memory item in #559. `estimatedMemoryMB()` used a hardcoded formula: ```go return float64(len(s.packets)5120+s.totalObs500) / 1048576.0 ``` This ignored three data structures that grow continuously with every ingest cycle: \| Structure \| Production size \| Heap not counted \| \|---\|---\|---\| \| `distHops []distHopRecord` \| 1,556,833 records \| ~300 MB \| \| `distPaths []distPathRecord` \| 93,090 records \| ~25 MB \| \| `spIndex map[string]int` \| 4,113,234 entries \| ~400 MB \| Result: formula reported ~1.2 GB while actual heap was ~5 GB. With `maxMemoryMB: 1024`, eviction calculated it only needed to shed ~200 MB, removed a handful of packets, and stopped. Memory kept growing until the OOM killer fired. ## Fix Replace `estimatedMemoryMB()` with `runtime.ReadMemStats` so all data structures are automatically counted: ```go func (s *PacketStore) estimatedMemoryMB() float64 { if s.memoryEstimator != nil { return s.memoryEstimator() } var ms runtime.MemStats runtime.ReadMemStats(&ms) return float64(ms.HeapAlloc) / 1048576.0 } ``` Replace the eviction simulation loop (which re-used the same wrong formula) with a proportional calculation: if heap is N× over budget, evict enough packets to keep `(1/N) × 0.9` of the current count. The 0.9 factor adds a 10% buffer so the next ingest cycle doesn't immediately re-trigger. All major data structures (distHops, distPaths, spIndex) scale with packet count, so removing a fraction of packets frees roughly the same fraction of total heap. ## Testing - Updated `TestEvictStale_MemoryBasedEviction` to inject a deterministic estimator via the new `memoryEstimator` field. - Added `TestEvictStale_MemoryBasedEviction_UnderestimatedHeap`: verifies that when actual heap is 5× over limit (the production failure scenario), eviction correctly removes ~80%+ of packets. ``` === RUN TestEvictStale_MemoryBasedEviction [store] Evicted 538 packets (1076 obs) --- PASS === RUN TestEvictStale_MemoryBasedEviction_UnderestimatedHeap [store] Evicted 820 packets (1640 obs) --- PASS ``` Full suite: `go test ./...` — ok (10.3s) ## Perf note `runtime.ReadMemStats` runs once per eviction tick (every 60 s) and once per `/api/perf/store` call. Cost is negligible. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 08:41:54 -07:00
Kpa-clawbot	cbfce41d7e	perf: optimize neighbor graph build (3 fixes for 30s+ CPU) (#562 ) ## Summary Fixes critical performance issue in neighbor graph computation that consumed 65% of CPU (30+ seconds) on a 325K packet dataset. ## Changes ### Fix 1: Cache strings.ToLower results - Added cachedToLower() helper that caches lowercased strings in a local map - Pubkeys repeat across hundreds of thousands of observations - Pre-computes fromLower once per transaction instead of once per observation - Impact: Eliminates ~8.4s (25.3% CPU) ### Fix 2: Cache parsed DecodedJSON via StoreTx.ParsedDecoded() - Added ParsedDecoded() method on StoreTx using sync.Once for thread-safe lazy caching - json.Unmarshal on decoded_json now runs at most once per packet lifetime - Result reused by extractFromNode, indexByNode, trackAdvertPubkey - Impact: Eliminates ~8.8s (26.3% CPU) ### Fix 3: Extend neighbor graph TTL from 60s to 5 minutes - The graph depends on traffic patterns, not individual packets - Reduces rebuild frequency 5x - Impact: ~80% reduction in sustained CPU from graph rebuilds ## Tests - 7 new tests added, all 26+ existing neighbor graph tests pass - BenchmarkBuildFromStore: 727us/op, 237KB/op, 6030 allocs/op Related: #559 --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: you <you@example.com> v3.4.1	2026-04-04 01:25:51 -07:00

1 2 3 4 5 ...

1272 Commits