mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-04-27 20:25:12 +00:00
c48ef5c613e516865e2ee6437dd91dee0ec8286f
163 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c48ef5c613 |
fix: hide undecryptable channel messages by default, add toggle
- Backend: channels API filters out undecrypted messages by default - Backend: ?includeEncrypted=true param to include them - Frontend: 'Show encrypted' toggle in channels sidebar - Frontend: unknown channels grayed out with '(no key)' label - Toggle persists in localStorage Fixes #727 |
||
|
|
71be54f085 |
feat: DB-backed channel messages for full history (#725 M1) (#726)
## Summary Switches channel API endpoints to query SQLite instead of the in-memory packet store, giving users access to the full message history. Implements #725 (M1 only — DB-backed channel messages). Does NOT close #725 — M2-M5 (custom channels, PSK, persistence, retroactive decryption) remain. ## Problem Channel endpoints (`/api/channels`, `/api/channels/{hash}/messages`) preferred the in-memory packet store when available. The store is bounded by `packetStore.maxMemoryMB` — typically showing only recent messages. The SQLite database has the complete history (weeks/months of channel messages) but was only used as a fallback when the store was nil (never in production). ## Fix Reversed the preference order: DB first, in-memory store fallback. Region filtering added to the DB path. Co-authored-by: you <you@example.com> |
||
|
|
65482ff6f6 |
fix: cache invalidation tuning — 7% → 50-80% hit rate (#721)
## Cache Invalidation Tuning — 7% → 50-80% Hit Rate Fixes #720 ### Problem Server-side cache hit rate was 7% (48 hits / 631 misses over 4.7 days). Root causes from the [cache audit report](https://github.com/Kpa-clawbot/CoreScope/issues/720): 1. **`invalidationDebounce` config value (30s) was dead code** — never wired to `invCooldown` 2. **`invCooldown` hardcoded to 10s** — with continuous ingest, caches cleared every 10s regardless of their 1800s TTLs 3. **`collisionCache` cleared on every `hasNewTransmissions`** — hash collisions are structural (depend on node count), not per-packet ### Changes | Change | File | Impact | |--------|------|--------| | Wire `invalidationDebounce` from config → `invCooldown` | `store.go` | Config actually works now | | Default `invCooldown` 10s → 300s (5 min) | `store.go` | 30x longer cache survival | | Add `hasNewNodes` flag to `cacheInvalidation` | `store.go` | Finer-grained invalidation | | `collisionCache` only clears on `hasNewNodes` | `store.go` | O(n²) collision computation survives its 1hr TTL | | `addToByNode` returns new-node indicator | `store.go` | Zero-cost detection during indexing | | `indexByNode` returns new-node indicator | `store.go` | Propagates to ingest path | | Ingest tracks and passes `hasNewNodes` | `store.go` | End-to-end wiring | ### Tests Added | Test | What it verifies | |------|-----------------| | `TestInvCooldownFromConfig` | Config value wired to `invCooldown`; default is 300s | | `TestCollisionCacheNotClearedByTransmissions` | `hasNewTransmissions` alone does NOT clear `collisionCache` | | `TestCollisionCacheClearedByNewNodes` | `hasNewNodes` DOES clear `collisionCache` | | `TestCacheSurvivesMultipleIngestCyclesWithinCooldown` | 5 rapid ingest cycles don't clear any caches during cooldown | | `TestNewNodesAccumulatedDuringCooldown` | `hasNewNodes` accumulated in `pendingInv` and applied after cooldown | | `BenchmarkAnalyticsLatencyCacheHitVsMiss` | 100% hit rate with rate-limited invalidation | All 200+ existing tests pass. Both benchmarks show 100% hit rate. ### Performance Justification - **Before:** Effective cache lifetime = `min(TTL, invCooldown)` = 10s. With analytics viewed ~once/few minutes, P(hit) ≈ 7% - **After:** Effective cache lifetime = `min(TTL, 300s)` = 300s for most caches, 3600s for `collisionCache`. Expected hit rate 50-80% - **Complexity:** All changes are O(1) — `addToByNode` already checked `nodeHashes[pubkey] == nil`, we just return the result - **Benchmark proof:** `BenchmarkAnalyticsLatencyCacheHitVsMiss` → 100% hit rate, 269ns/op Co-authored-by: you <you@example.com> |
||
|
|
7af91f7ef6 |
fix: perf page shows tracked memory instead of heap allocation (#718)
## Summary The perf page "Memory Used" tile displayed `estimatedMB` (Go `runtime.HeapAlloc`), which includes all Go runtime allocations — not just packet store data. This made the displayed value misleading: it showed ~2.4GB heap when only ~833MB was actual tracked packet data. ## Changes ### Frontend (`public/perf.js`) - Primary tile now shows `trackedMB` as **"Tracked Memory"** — the self-accounted packet store memory - Added separate **"Heap (debug)"** tile showing `estimatedMB` for runtime visibility ### Backend - **`types.go`**: Added `TrackedMB` field to `HealthPacketStoreStats` struct - **`routes.go`**: Populate `TrackedMB` in `/health` endpoint response from `GetPerfStoreStatsTyped()` - **`routes_test.go`**: Assert `trackedMB` exists in health endpoint's `packetStore` - **`testdata/golden/shapes.json`**: Updated shape fixture with new field ### What was already correct - `/api/perf/stats` already exposed both `estimatedMB` and `trackedMB` - `trackedMemoryMB()` method already existed in store.go - Eviction logic already used `trackedBytes` (not HeapAlloc) ## Testing - All Go tests pass (`go test ./... -count=1`) - No frontend logic changes beyond template string field swap Fixes #717 Co-authored-by: you <you@example.com> |
||
|
|
f95aa49804 |
fix: exclude TRACE packets from multi-byte capability suspected detection (#715)
## Summary Exclude TRACE packets (payload_type 8) from the "suspected" multi-byte capability inference logic. TRACE packets carry hash size in their own flags — forwarding repeaters read it from the TRACE header, not their compile-time `PATH_HASH_SIZE`. Pre-1.14 repeaters can forward multi-byte TRACEs without actually supporting multi-byte hashes, creating false positives. Fixes #714 ## Changes ### `cmd/server/store.go` - In `computeMultiByteCapability()`, skip packets with `payload_type == 8` (TRACE) when scanning `byPathHop` for suspected multi-byte nodes - "Confirmed" detection (from adverts) is unaffected ### `cmd/server/multibyte_capability_test.go` - `TestMultiByteCapability_TraceExcluded`: TRACE packet with 2-byte path does NOT mark repeater as suspected - `TestMultiByteCapability_NonTraceStillSuspected`: Non-TRACE packet with 2-byte path still marks as suspected - `TestMultiByteCapability_ConfirmedUnaffectedByTraceExclusion`: Confirmed status from advert unaffected by TRACE exclusion ## Testing All 7 multi-byte capability tests pass. Full `cmd/server` and `cmd/ingestor` test suites pass. Co-authored-by: you <you@example.com> |
||
|
|
4a7e20a8cb |
fix: redesign memory eviction — self-accounting trackedBytes, watermarks, safety cap (#711)
## Problem `HeapAlloc`-based eviction cascades on large databases — evicts down to near-zero packets because Go runtime overhead exceeds `maxMemoryMB` even with an empty packet store. ## Fix (per Carmack spec on #710) 1. **Self-accounting `trackedBytes`** — running counter maintained on insert/evict, computed from actual struct sizes. No `runtime.ReadMemStats`. 2. **High/low watermark hysteresis** (100%/85%) — evict to 85% of budget, don't re-trigger until 100% crossed again. 3. **25% per-pass safety cap** — never evict more than a quarter of packets in one cycle. 4. **Oldest-first** — evict from sorted head, O(1) candidate selection. `maxMemoryMB` now means packet store budget, not total process heap. Fixes #710 Co-authored-by: you <you@example.com> |
||
|
|
e893a1b3c4 |
fix: index relay hops in byNode for liveness tracking (#708)
## Problem Nodes that only appear as relay hops in packet paths (via `resolved_path`) were never indexed in `byNode`, so `last_heard` was never computed for them. This made relay-only nodes show as dead/stale even when actively forwarding traffic. Fixes #660 ## Root Cause `indexByNode()` only indexed pubkeys from decoded JSON fields (`pubKey`, `destPubKey`, `srcPubKey`). Relay nodes appearing in `resolved_path` were ignored entirely. ## Fix `indexByNode()` now also iterates: 1. `ResolvedPath` entries from each observation 2. `tx.ResolvedPath` (best observation's resolved path, used for DB-loaded packets) A per-call `indexed` set prevents double-indexing when the same pubkey appears in both decoded JSON and resolved path. Extracted `addToByNode()` helper to deduplicate the nodeHashes/byNode append logic. ## Scope **Phase 1 only** — server-side in-memory indexing. No DB changes, no ingestor changes. This makes `last_heard` reflect relay activity with zero risk to persistence. ## Tests 5 new test cases in `TestIndexByNodeResolvedPath`: - Resolved path pubkeys from observations get indexed - Null entries in resolved path are skipped - Relay-only nodes (no decoded JSON match) appear in `byNode` - Dedup between decoded JSON and resolved path - `tx.ResolvedPath` indexed when observations are empty All existing tests pass unchanged. ## Complexity O(observations × path_length) per packet — typically 1-3 observations × 1-3 hops. No hot-path regression. --------- Co-authored-by: you <you@example.com> |
||
|
|
fcba2a9f3d |
fix: set PRAGMA busy_timeout on all RW SQLite connections (#707)
## Problem `SQLITE_BUSY` contention between the ingestor and server's async persistence goroutine drops `resolved_path` and `neighbor_edges` updates. The DSN parameter `_busy_timeout=10000` may not be honored by the modernc/sqlite driver. ## Fix - **`openRW()` now sets `PRAGMA busy_timeout = 5000`** after opening the connection, guaranteeing SQLite retries for up to 5 seconds before returning `SQLITE_BUSY` - **Refactored `PruneOldPackets` and `PruneOldMetrics`** to use `openRW()` instead of duplicating connection setup — all RW connections now get consistent busy_timeout handling - Added test verifying the pragma is set correctly ## Changes | File | Change | |------|--------| | `cmd/server/neighbor_persist.go` | `openRW()` sets `PRAGMA busy_timeout = 5000` after open | | `cmd/server/db.go` | `PruneOldPackets` and `PruneOldMetrics` use `openRW()` instead of inline `sql.Open` | | `cmd/server/neighbor_persist_test.go` | `TestOpenRW_BusyTimeout` verifies pragma is set | ## Performance No performance impact — `PRAGMA busy_timeout` is a connection-level setting with zero overhead on uncontended writes. Under contention, it converts immediate `SQLITE_BUSY` failures into brief retries (up to 5s), which is strictly better than dropping data. Fixes #705 --------- Co-authored-by: you <you@example.com> |
||
|
|
ef8bce5002 |
feat: repeater multi-byte capability inference table (#706)
## Summary Adds a new "Repeater Multi-Byte Capability" section to the Hash Stats analytics tab that classifies each repeater's ability to handle multi-byte hash prefixes (firmware >= v1.14). Fixes #689 ## What Changed ### Backend (`cmd/server/store.go`) - New `computeMultiByteCapability()` method that infers capability for each repeater using two evidence sources: - **Confirmed** (100% reliable): node has advertised with `hash_size >= 2`, leveraging existing `computeNodeHashSizeInfo()` data - **Suspected** (<100%): node's prefix appears as a hop in packets with multi-byte path headers, using the `byPathHop` index. Prefix collisions mean this isn't definitive. - **Unknown**: no multi-byte evidence — could be pre-1.14 or 1.14+ with default settings - Extended `/api/analytics/hash-sizes` response with `multiByteCapability` array ### Frontend (`public/analytics.js`) - New `renderMultiByteCapability()` function on the Hash Stats tab - Color-coded table: green confirmed, yellow suspected, gray unknown - Filter buttons to show all/confirmed/suspected/unknown - Column sorting by name, role, status, evidence, max hash size, last seen - Clickable rows link to node detail pages ### Tests (`cmd/server/multibyte_capability_test.go`) - `TestMultiByteCapability_Confirmed`: advert with hash_size=2 → confirmed - `TestMultiByteCapability_Suspected`: path appearance only → suspected - `TestMultiByteCapability_Unknown`: 1-byte advert only → unknown - `TestMultiByteCapability_PrefixCollision`: two nodes sharing prefix, one confirmed via advert, other correctly marked suspected (not confirmed) ## Performance - `computeMultiByteCapability()` runs once per cache cycle (15s TTL via hash-sizes cache) - Leverages existing `GetNodeHashSizeInfo()` cache (also 15s TTL) — no redundant advert scanning - Path hop scan is O(repeaters × prefix lengths) lookups in the `byPathHop` map, with early break on first match per prefix - Only computed for global (non-regional) requests to avoid unnecessary work --------- Co-authored-by: you <you@example.com> |
||
|
|
922ebe54e7 |
BYOP Advert signature validation (#686)
For BYOP mode in the packet analyzer, perform signature validation on advert packets and display whether successful or not. This is added as we observed many corrupted advert packets that would be easily detectable as such if signature validation checks were performed. At present this MR is just to add this status in BYOP mode so there is minimal impact to the application and no performance penalty for having to perform these checks on all packets. Moving forward it probably makes sense to do these checks on all advert packets so that corrupt packets can be ignored in several contexts (like node lists for example). Let me know what you think and I can adjust as needed. --------- Co-authored-by: you <you@example.com> |
||
|
|
9917d50622 |
fix: resolve neighbor graph duplicate entries from different prefix lengths (#699)
## Problem The neighbor graph creates separate entries for the same physical node when observed with different prefix lengths. For example, a 1-byte prefix `B0` (ambiguous, unresolved) and a 2-byte prefix `B05B` (resolved to Busbee) appear as two separate neighbors of the same node. Fixes #698 ## Solution ### Part 1: Post-build resolution pass (Phase 1.5) New function `resolveAmbiguousEdges(pm, graph)` in `neighbor_graph.go`: - Called after `BuildFromStore()` completes the full graph, before any API use - Iterates all ambiguous edges and attempts resolution via `resolveWithContext` with full graph context - Only accepts high-confidence resolutions (`neighbor_affinity`, `geo_proximity`, `unique_prefix`) — rejects `first_match`/`gps_preference` fallbacks to avoid false positives - Merges with existing resolved edges (count accumulation, max LastSeen) or updates in-place - Phase 1 edge collection loop is **unchanged** ### Part 2: API-layer dedup (defense-in-depth) New function `dedupPrefixEntries()` in `neighbor_api.go`: - Scans neighbor response for unresolved prefix entries matching resolved pubkey entries - Merges counts, timestamps, and observers; removes the unresolved entry - O(n²) on ~50 neighbors per node — negligible cost ### Performance Phase 1.5 runs O(ambiguous_edges × candidates). Per Carmack's analysis: ~50ms at 2K nodes on the 5-min rebuild cycle. Hot ingest path untouched. ## Tests 9 new tests in `neighbor_dedup_test.go`: 1. **Geo proximity resolution** — ambiguous edge resolved when candidate has GPS near context node 2. **Merge with existing** — ambiguous edge merged into existing resolved edge (count accumulation) 3. **No-match preservation** — ambiguous edge left as-is when prefix has no candidates 4. **API dedup** — unresolved prefix merged with resolved pubkey in response 5. **Integration** — node with both 1-byte and 2-byte prefix observations shows single neighbor entry 6. **Phase 1 regression** — non-ambiguous edge collection unchanged 7. **LastSeen preservation** — merge keeps higher LastSeen timestamp 8. **No-match dedup** — API dedup doesn't merge non-matching prefixes 9. **Benchmark** — Phase 1.5 with 500+ edges All existing tests pass (server + ingestor). --------- Co-authored-by: you <you@example.com> |
||
|
|
2e1a4a2e0d |
fix: handle companion nodes without adverts in My Mesh health cards (#696)
## Summary Fixes #665 — companion nodes claimed in "My Mesh" showed "Could not load data" because they never sent an advert, so they had no `nodes` table entry, causing the health API to return 404. ## Three-Layer Fix ### 1. API Resilience (`cmd/server/store.go`) `GetNodeHealth()` now falls back to building a partial response from the in-memory packet store when `GetNodeByPubkey()` returns nil. Returns a synthetic node stub (`role: "unknown"`, `name: "Unknown"`) with whatever stats exist from packets, instead of returning nil → 404. ### 2. Ingestor Cleanup (`cmd/ingestor/main.go`) Removed phantom sender node creation that used `"sender-" + name` as the pubkey. Channel messages don't carry the sender's real pubkey, so these synthetic entries were unreachable from the claiming/health flow — they just polluted the nodes table with unmatchable keys. ### 3. Frontend UX (`public/home.js`) The catch block in `loadMyNodes()` now distinguishes 404 (node not in DB yet) from other errors: - **404**: Shows 📡 "Waiting for first advert — this node has been seen in channel messages but hasn't advertised yet" - **Other errors**: Shows ❓ "Could not load data" (unchanged) ## Tests - Added `TestNodeHealthPartialFromPackets` — verifies a node with packets but no DB entry returns 200 with synthetic node stub and stats - Updated `TestHandleMessageChannelMessage` — verifies channel messages no longer create phantom sender nodes - All existing tests pass (`cmd/server`, `cmd/ingestor`) Co-authored-by: you <you@example.com> |
||
|
|
fcad49594b |
fix: include path.hopsCompleted in TRACE WebSocket broadcasts (#695)
## Summary Fixes #683 — TRACE packets on the live map were showing the full path instead of distinguishing completed vs remaining hops. ## Root Cause Both WebSocket broadcast builders in `store.go` constructed the `decoded` map with only `header` and `payload` keys — `path` was never included. The frontend reads `decoded.path.hopsCompleted` to split trace routes into solid (completed) and dashed (remaining) segments, but that field was always `undefined`. ## Fix For TRACE packets (payload type 9), call `DecodePacket()` on the raw hex during broadcast and include the resulting `Path` struct in `decoded["path"]`. This populates `hopsCompleted` which the frontend already knows how to consume. Both broadcast builders are patched: - `IngestNewFromDB()` — new transmissions path (~line 1419) - `IngestNewObservations()` — new observations path (~line 1680) TRACE packets are infrequent, so the per-packet decode overhead is negligible. ## Testing - Added `TestIngestTraceBroadcastIncludesPath` — verifies that TRACE broadcast maps include `decoded.path` with correct `hopsCompleted` value - All existing tests pass (`cmd/server` + `cmd/ingestor`) Co-authored-by: you <you@example.com> |
||
|
|
22bf33700e |
Fix: filter path-hop candidates by resolved_path to prevent prefix collisions (#658)
## Problem
The "Paths Through This Node" API endpoint (`/api/nodes/{pubkey}/paths`)
returns unrelated packets when two nodes share a hex prefix. For
example, querying paths for "Kpa Roof Solar" (`c0dedad4...`) returns 316
packets that actually belong to "C0ffee SF" (`C0FFEEC7...`) because both
share the `c0` prefix in the `byPathHop` index.
Fixes #655
## Root Cause
`handleNodePaths()` in `routes.go` collects candidates from the
`byPathHop` index using 2-char and 4-char hex prefixes for speed, but
never verifies that the target node actually appears in each candidate's
resolved path. The broad index lookup is intentional, but the
**post-filter was missing**.
## Fix
Added `nodeInResolvedPath()` helper in `store.go` that checks whether a
transmission's `resolved_path` (from the neighbor affinity graph via
`resolveWithContext`) contains the target node's full pubkey. The
filter:
- **Includes** packets where `resolved_path` contains the target node's
full pubkey
- **Excludes** packets where `resolved_path` resolved to a different
node (prefix collision)
- **Excludes** packets where `resolved_path` is nil/empty (ambiguous —
avoids false positives)
The check examines both the best observation's resolved_path
(`tx.ResolvedPath`) and all individual observations, so packets are
included if *any* observation resolved the target.
## Tests
- `TestNodeInResolvedPath` — unit test for the helper with 5 cases
(match, different node, nil, all-nil elements, match in observation
only)
- `TestNodePathsPrefixCollisionFilter` — integration test: two nodes
sharing `aa` prefix, verifies the collision packet is excluded from one
and included for the other
- Updated test DB schema to include `resolved_path` column and seed data
with resolved pubkeys
- All existing tests pass (165 additions, 8 modifications)
## Performance
No impact on hot paths. The filter runs once per API call on the
already-collected candidate set (typically small). `nodeInResolvedPath`
is O(observations × hops) per candidate — negligible since observations
per transmission are typically 1–5.
---------
Co-authored-by: you <you@example.com>
|
||
|
|
7d71dc857b |
feat: expose hopsCompleted for TRACE packets, show real path on live map (#656)
## Summary TRACE packets on the live map previously animated the **full intended route** regardless of how far the trace actually reached. This made it impossible to distinguish a completed route from a failed one — undermining the primary diagnostic purpose of trace packets. ## Changes ### Backend — `cmd/server/decoder.go` - Added `HopsCompleted *int` field to the `Path` struct - For TRACE packets, the header path contains SNR bytes (one per hop that actually forwarded). Before overwriting `path.Hops` with the full intended route from the payload, we now capture the header path's `HashCount` as `hopsCompleted` - This field is included in API responses and WebSocket broadcasts via the existing JSON serialization ### Frontend — `public/live.js` - For TRACE packets with `hopsCompleted < totalHops`: - Animate only the **completed** portion (solid line + pulse) - Draw the **unreached** remainder as a dashed/ghosted line (25% opacity, `6,8` dash pattern) with ghost markers - Dashed lines and ghost markers auto-remove after 10 seconds - When `hopsCompleted` is absent or equals total hops, behavior is unchanged ### Tests — `cmd/server/decoder_test.go` - `TestDecodePacket_TraceHopsCompleted` — partial completion (2 of 4 hops) - `TestDecodePacket_TraceNoSNR` — zero completion (trace not forwarded yet) - `TestDecodePacket_TraceFullyCompleted` — all hops completed ## How it works The MeshCore firmware appends an SNR byte to `pkt->path[]` at each hop that forwards a TRACE packet. The count of these SNR bytes (`path_len`) indicates how far the trace actually got. CoreScope's decoder already parsed the header path, but the TRACE-specific code overwrote it with the payload hops (full intended route) without preserving the progress information. Now we save that count first. Fixes #651 --------- Co-authored-by: you <you@example.com> |
||
|
|
088b4381c3 |
Fix: Hash Stats 'By Repeaters' includes non-repeater nodes (#654)
## Summary The "By Repeaters" section on the Hash Stats analytics page was counting **all** node types (companions, room servers, sensors, etc.) instead of only repeaters. This made the "By Repeaters" distribution identical to "Multi-Byte Hash Adopters", defeating the purpose of the breakdown. Fixes #652 ## Root Cause `computeAnalyticsHashSizes()` in `cmd/server/store.go` built its `byNode` map from advert packet data without cross-referencing node roles from the node store. Both `distributionByRepeaters` and `multiByteNodes` consumed this unfiltered map. ## Changes ### `cmd/server/store.go` - Build a `nodeRoleByPK` lookup map from `getCachedNodesAndPM()` at the start of the function - Store `role` in each `byNode` entry when processing advert packets - **`distributionByRepeaters`**: filter to only count nodes whose role contains "repeater" - **`multiByteNodes`**: include `role` field in output so the frontend can filter/group by node type ### `cmd/server/coverage_test.go` - Add `TestHashSizesDistributionByRepeatersFiltersRole`: verifies that companion nodes are excluded from `distributionByRepeaters` but included in `multiByteNodes` with correct role ### `cmd/server/routes_test.go` - Fix `TestHashAnalyticsZeroHopAdvert`: invalidate node cache after DB insert so role lookup works - Fix `TestAnalyticsHashSizeSameNameDifferentPubkey`: insert node records as repeaters + invalidate cache ## Testing All `cmd/server` tests pass (68 insertions, 3 deletions across 3 files). Co-authored-by: you <you@example.com> |
||
|
|
144e98bcdf |
fix: hide hash size for zero-hop direct adverts (#649) (#653)
## Fix: Zero-hop DIRECT packets report bogus hash_size Closes #649 ### Problem When a DIRECT packet has zero hops (pathByte lower 6 bits = 0), the generic `hash_size = (pathByte >> 6) + 1` formula produces a bogus value (1-4) instead of 0/unknown. This causes incorrect hash size displays and analytics for zero-hop direct adverts. ### Solution **Frontend (JS):** - `packets.js` and `nodes.js` now check `(pathByte & 0x3F) === 0` to detect zero-hop packets and suppress bogus hash_size display. **Backend (Go):** - Both `cmd/server/decoder.go` and `cmd/ingestor/decoder.go` reset `HashSize=0` for DIRECT packets where `pathByte & 0x3F == 0` (hash_count is zero). - TRACE packets are excluded since they use hashSize to parse hop data from the payload. - The condition uses `pathByte & 0x3F == 0` (not `pathByte == 0x00`) to correctly handle the case where hash_size bits are non-zero but hash_count is zero — matching the JS frontend approach. ### Testing **Backend:** - Added 4 tests each in `cmd/server/decoder_test.go` and `cmd/ingestor/decoder_test.go`: - DIRECT + pathByte 0x00 → HashSize=0 ✅ - DIRECT + pathByte 0x40 (hash_size bits set, hash_count=0) → HashSize=0 ✅ - Non-DIRECT + pathByte 0x00 → HashSize=1 (unchanged) ✅ - DIRECT + pathByte 0x01 (1 hop) → HashSize=1 (unchanged) ✅ - All existing tests pass (`go test ./...` in both cmd/server and cmd/ingestor) **Frontend:** - Verified hash size display is suppressed for zero-hop direct adverts --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: you <you@example.com> |
||
|
|
0f5e2db5cf |
feat: auto-generated OpenAPI 3.0 spec endpoint + Swagger UI (#530) (#632)
## Summary
Auto-generated OpenAPI 3.0.3 spec endpoint (`/api/spec`) and Swagger UI
(`/api/docs`) for the CoreScope API.
## What
- **`cmd/server/openapi.go`** — Route metadata map
(`routeDescriptions()`) + spec builder that walks the mux router to
generate a complete OpenAPI 3.0.3 spec at runtime. Includes:
- All 47 API endpoints grouped by tag (admin, analytics, channels,
config, nodes, observers, packets)
- Query parameter documentation for key endpoints (packets, nodes,
search, resolve-hops)
- Path parameter extraction from mux `{name}` patterns
- `ApiKeyAuth` security scheme for API-key-protected endpoints
- Swagger UI served as a self-contained HTML page using unpkg CDN
- **`cmd/server/openapi_test.go`** — Tests for spec endpoint (validates
JSON structure, required fields, path count, security schemes,
self-exclusion of `/api/spec` and `/api/docs`), Swagger UI endpoint, and
`extractPathParams` helper.
- **`cmd/server/routes.go`** — Stores router reference on `Server`
struct for spec generation; registers `/api/spec` and `/api/docs`
routes.
## Design Decisions
- **Runtime spec generation** vs static YAML: The spec walks the actual
router, so it can never drift from registered routes. Route metadata
(summaries, descriptions, tags, auth flags) is maintained in a parallel
map — the test enforces minimum path count to catch drift.
- **No external dependencies**: Uses only stdlib + existing gorilla/mux.
Swagger UI loaded from unpkg CDN (no vendored assets).
- **Security tagging**: Auth-protected endpoints (those behind
`requireAPIKey` middleware) are tagged with `security: [{ApiKeyAuth:
[]}]` in the spec, matching the actual middleware configuration.
## Testing
- `go test -run TestOpenAPI` — validates spec structure, field presence,
path count ≥ 20, security schemes
- `go test -run TestSwagger` — validates HTML response with swagger-ui
references
- `go test -run TestExtractPathParams` — unit tests for path parameter
extraction
---------
Co-authored-by: you <you@example.com>
|
||
|
|
dc5b5ce9a0 |
fix: reject weak/default API keys + startup warning (#532) (#628)
## Summary Hardens API key security for write endpoints (fixes #532): 1. **Constant-time comparison** — uses `crypto/subtle.ConstantTimeCompare` to prevent timing attacks on API key validation 2. **Weak key blocklist** — rejects known default/example keys (`test`, `password`, `change-me`, `your-secret-api-key-here`, etc.) 3. **Minimum length enforcement** — keys shorter than 16 characters are rejected 4. **Startup warning** — logs a clear warning if the configured key is weak or a known default 5. **Generic error messages** — HTTP 403 response uses opaque "forbidden" message to prevent information leakage about why a key was rejected ### Security Model - **Empty key** → all write endpoints disabled (403) - **Weak/default key** → all write endpoints disabled (403), startup warning logged - **Wrong key** → 401 unauthorized - **Strong correct key** → request proceeds ### Files Changed - `cmd/server/config.go` — `IsWeakAPIKey()` function + blocklist - `cmd/server/routes.go` — constant-time comparison via `constantTimeEqual()`, weak key rejection - `cmd/server/main.go` — startup warning for weak keys - `cmd/server/apikey_security_test.go` — comprehensive test coverage - `cmd/server/routes_test.go` — existing tests updated to use strong keys ### Reviews - ✅ Self-review: all security properties verified - ✅ djb Final Review: timing fix correct, blocklist pragmatic, error messages opaque, tests comprehensive. **Verdict: Ship it.** ### Test Results All existing + new tests pass. Coverage includes: weak key detection (blocklist + length + case-insensitive), empty key handling, strong key acceptance, wrong key rejection, and constant-time comparison. --------- Co-authored-by: you <you@example.com> |
||
|
|
30e7e9ae3c |
docs: document lock ordering for cacheMu and channelsCacheMu (#624)
## Summary Documents the lock ordering for all five mutexes in `PacketStore` (`store.go`) to prevent future deadlocks. ## What changed Added a comment block above the `PacketStore` struct documenting: - All 5 mutexes (`mu`, `cacheMu`, `channelsCacheMu`, `groupedCacheMu`, `regionObsMu`) - What each mutex guards - The required acquisition order (numbered 1–5) - The nesting relationships that exist today (`cacheMu → channelsCacheMu` in `invalidateCachesFor` and `rebuildAnalyticsCaches`) - Confirmation that no reverse ordering exists (no deadlock risk) ## Verification - Grepped all lock acquisition sites to confirm no reverse nesting exists - `go build ./...` passes — documentation-only change Fixes #413 --------- Co-authored-by: you <you@example.com> |
||
|
|
05fbcb09dd |
fix: wire cacheTTL.analyticsHashSizes config to collision cache (#420) (#622)
## Summary Fixes #420 — wires `cacheTTL` config values to server-side cache durations that were previously hardcoded. ## Problem `collisionCacheTTL` was hardcoded at 60s in `store.go`. The config has `cacheTTL.analyticsHashSizes: 3600` (1 hour) but it was never read — the `/api/config/cache` endpoint just passed the raw map to the client without applying values server-side. ## Changes - **`store.go`**: Add `cacheTTLSec()` helper to safely extract duration values from the `cacheTTL` config map. `NewPacketStore` now accepts an optional `cacheTTL` map (variadic, backward-compatible) and wires: - `cacheTTL.analyticsHashSizes` → `collisionCacheTTL` - `cacheTTL.analyticsRF` → `rfCacheTTL` - **Default changed**: `collisionCacheTTL` default raised from 60s → 3600s (1 hour). Hash collision computation is expensive and data changes rarely — 60s was causing unnecessary recomputation. - **`main.go`**: Pass `cfg.CacheTTL` to `NewPacketStore`. - **Tests**: Added `TestCacheTTLFromConfig` and `TestCacheTTLDefaults` in eviction_test.go. Updated existing `TestHashCollisionsCacheTTL` for the new default. ## Audit of other cacheTTL values The remaining `cacheTTL` keys (`stats`, `nodeDetail`, `nodeHealth`, `nodeList`, `bulkHealth`, `networkStatus`, `observers`, `channels`, `channelMessages`, `analyticsTopology`, `analyticsChannels`, `analyticsSubpaths`, `analyticsSubpathDetail`, `nodeAnalytics`, `nodeSearch`, `invalidationDebounce`) are **client-side only** — served via `/api/config/cache` and consumed by the frontend. They don't have corresponding server-side caches to wire to. The only server-side caches (`rfCache`, `topoCache`, `hashCache`, `chanCache`, `distCache`, `subpathCache`, `collisionCache`) all use either `rfCacheTTL` or `collisionCacheTTL`, both now configurable. ## Complexity O(1) config lookup at store init time. No hot-path impact. Co-authored-by: you <you@example.com> |
||
|
|
b587f20d1c |
feat: add distance column to neighbor table in node details (#617)
Closes #616 ## What Adds a **Distance** column to the neighbor table on the node detail page. When both the viewed node and a neighbor have GPS coordinates recorded, the table shows the haversine distance between them (e.g. `3.2 km`). When either node lacks GPS, the cell shows `—`. ## Changes **Backend** (`cmd/server/neighbor_api.go`): - Added `distance_km *float64` (omitempty) to `NeighborEntry` - In `handleNodeNeighbors`: look up source node coords from `nodeMap`, then for each resolved (non-ambiguous) neighbor with GPS, compute `haversineKm` and set the field **Frontend** (`public/nodes.js`): - Added `Distance` column header between Last Seen and Conf - Cell renders `X.X km` or `—` (muted) when unavailable **Tests** (`cmd/server/neighbor_api_test.go`): - `TestNeighborAPI_DistanceKm_WithGPS`: two nodes with real coords → `distance_km` is positive - `TestNeighborAPI_DistanceKm_NoGPS`: two nodes at 0,0 → `distance_km` is nil ## Verification Test at **https://staging.on8ar.eu** — navigate to any node detail page and scroll to the Neighbors section. Nodes with GPS coordinates show a distance; those without show `—`. |
||
|
|
767c8a5a3e |
perf: async chunked backfill — HTTP serves within 2 minutes (#612) (#614)
## Summary Adds two config knobs for controlling backfill scope and neighbor graph data retention, plus removes the dead synchronous backfill function. ## Changes ### Config knobs #### `resolvedPath.backfillHours` (default: 24) Controls how far back (in hours) the async backfill scans for observations with NULL `resolved_path`. Transmissions with `first_seen` older than this window are skipped, reducing startup time for instances with large historical datasets. #### `neighborGraph.maxAgeDays` (default: 30) Controls the maximum age of `neighbor_edges` entries. Edges with `last_seen` older than this are pruned from both SQLite and the in-memory graph. Pruning runs on startup (after a 4-minute stagger) and every 24 hours thereafter. ### Dead code removal - Removed the synchronous `backfillResolvedPaths` function that was replaced by the async version. ### Implementation details - `backfillResolvedPathsAsync` now accepts a `backfillHours` parameter and filters by `tx.FirstSeen` - `NeighborGraph.PruneOlderThan(cutoff)` removes stale edges from the in-memory graph - `PruneNeighborEdges(conn, graph, maxAgeDays)` prunes both DB and in-memory graph - Periodic pruning ticker follows the same pattern as metrics pruning (24h interval, staggered start) - Graceful shutdown stops the edge prune ticker ### Config example Both knobs added to `config.example.json` with `_comment` fields. ## Tests - Config default/override tests for both knobs - `TestGraphPruneOlderThan` — in-memory edge pruning - `TestPruneNeighborEdgesDB` — SQLite + in-memory pruning together - `TestBackfillRespectsHourWindow` — verifies old transmissions are excluded by backfill window --------- Co-authored-by: you <you@example.com> |
||
|
|
232770a858 |
feat(rf-health): M2 — airtime, error rate, battery charts with delta computation (#605)
## M2: Airtime + Channel Quality + Battery Charts Implements M2 of #600 — server-side delta computation and three new charts in the RF Health detail view. ### Backend Changes **Delta computation** for cumulative counters (`tx_air_secs`, `rx_air_secs`, `recv_errors`): - Computes per-interval deltas between consecutive samples - **Reboot handling:** detects counter reset (current < previous), skips that delta, records reboot timestamp - **Gap handling:** if time between samples > 2× interval, inserts null (no interpolation) - Returns `tx_airtime_pct` and `rx_airtime_pct` as percentages (delta_secs / interval_secs × 100) - Returns `recv_error_rate` as delta_errors / (delta_recv + delta_errors) × 100 **`resolution` query param** on `/api/observers/{id}/metrics`: - `5m` (default) — raw samples - `1h` — hourly aggregates (GROUP BY hour with AVG/MAX) - `1d` — daily aggregates **Schema additions:** - `packets_sent` and `packets_recv` columns added to `observer_metrics` (migration) - Ingestor parses these fields from MQTT stats messages **API response** now includes: - `tx_airtime_pct`, `rx_airtime_pct`, `recv_error_rate` (computed deltas) - `reboots` array with timestamps of detected reboots - `is_reboot_sample` flag on affected samples ### Frontend Changes Three new charts in the RF Health detail view, stacked vertically below noise floor: 1. **Airtime chart** — TX (red) + RX (blue) as separate SVG lines, Y-axis 0-100%, direct labels at endpoints 2. **Error Rate chart** — `recv_error_rate` line, shown only when data exists 3. **Battery chart** — voltage line with 3.3V low reference, shown only when battery_mv > 0 All charts: - Share X-axis and time range (aligned vertically) - Reboot markers as vertical hairlines spanning all charts - Direct labels on data (no legends) - Resolution auto-selected: `1h` for 7d/30d ranges - Charts hidden when no data exists ### Tests - `TestComputeDeltas`: normal deltas, reboot detection, gap detection - `TestGetObserverMetricsResolution`: 5m/1h/1d downsampling verification - Updated `TestGetObserverMetrics` for new API signature --------- Co-authored-by: you <you@example.com> |
||
|
|
747aea37b7 |
fix(rf-health): add region filter support to metrics summary
Frontend passes RegionFilter query string to summary API. Backend filters results by observer IATA region. Added iata field to MetricsSummaryRow. |
||
|
|
6f35d4d417 |
feat: RF Health Dashboard M1 — observer metrics + small multiples grid (#604)
## RF Health Dashboard — M1: Observer Metrics Storage, API & Small Multiples Grid Implements M1 of #600. ### What this does Adds a complete RF health monitoring pipeline: MQTT stats ingestion → SQLite storage → REST API → interactive dashboard with small multiples grid. ### Backend Changes **Ingestor (`cmd/ingestor/`)** - New `observer_metrics` table via migration system (`_migrations` pattern) - Parse `tx_air_secs`, `rx_air_secs`, `recv_errors` from MQTT status messages (same pattern as existing `noise_floor` and `battery_mv`) - `INSERT OR REPLACE` with timestamps rounded to nearest 5-min interval boundary (using ingestor wall clock, not observer timestamps) - Missing fields stored as NULLs — partial data is always better than no data - Configurable retention pruning: `retention.metricsDays` (default 30), runs on startup + every 24h **Server (`cmd/server/`)** - `GET /api/observers/{id}/metrics?since=...&until=...` — per-observer time-series data - `GET /api/observers/metrics/summary?window=24h` — fleet summary with current NF, avg/max NF, sample count - `parseWindowDuration()` supports `1h`, `24h`, `3d`, `7d`, `30d` etc. - Server-side metrics retention pruning (same config, staggered 2min after packet prune) ### Frontend Changes **RF Health tab (`public/analytics.js`, `public/style.css`)** - Small multiples grid showing all observers simultaneously — anomalies pop out visually - Per-observer cell: name, current NF value, battery voltage, sparkline, avg/max stats - NF status coloring: warning (amber) at ≥-100 dBm, critical (red) at ≥-85 dBm — text color only, no background fills - Click any cell → expanded detail view with full noise floor line chart - Reference lines with direct text labels (`-100 warning`, `-85 critical`) — not color bands - Min/max points labeled directly on the chart - Time range selector: preset buttons (1h/3h/6h/12h/24h/3d/7d/30d) + custom from/to datetime picker - Deep linking: `#/analytics?tab=rf-health&observer=...&range=...` - All charts use SVG, matching existing analytics.js patterns - Responsive: 3-4 columns on desktop, 1 on mobile ### Design Decisions (from spec) - Labels directly on data, not in legends - Reference lines with text labels, not color bands - Small multiples grid, not card+accordion (Tufte: instant visual fleet comparison) - Ingestor wall clock for all timestamps (observer clocks may drift) ### Tests Added **Ingestor tests:** - `TestRoundToInterval` — 5 cases for rounding to 5-min boundaries - `TestInsertMetrics` — basic insertion with all fields - `TestInsertMetricsIdempotent` — INSERT OR REPLACE deduplication - `TestInsertMetricsNullFields` — partial data with NULLs - `TestPruneOldMetrics` — retention pruning - `TestExtractObserverMetaNewFields` — parsing tx_air_secs, rx_air_secs, recv_errors **Server tests:** - `TestGetObserverMetrics` — time-series query with since/until filters, NULL handling - `TestGetMetricsSummary` — fleet summary aggregation - `TestObserverMetricsAPIEndpoints` — DB query verification - `TestMetricsAPIEndpoints` — HTTP endpoint response shape - `TestParseWindowDuration` — duration parsing for h/d formats ### Test Results ``` cd cmd/ingestor && go test ./... → PASS (26s) cd cmd/server && go test ./... → PASS (5s) ``` ### What's NOT in this PR (deferred to M2+) - Server-side delta computation for cumulative counters - Airtime charts (TX/RX percentage lines) - Channel quality chart (recv_error_rate) - Battery voltage chart - Reboot detection and chart annotations - Resolution downsampling (1h, 1d aggregates) - Pattern detection / automated diagnosis --------- Co-authored-by: you <you@example.com> |
||
|
|
6ae62ce535 |
perf: make txToMap observations lazy via ExpandObservations flag (#595)
## Summary `txToMap()` previously always allocated observation sub-maps for every packet, even though the `/api/packets` handler immediately stripped them via `delete(p, "observations")` unless `expand=observations` was requested. A typical page of 50 packets with ~5 observations each caused 300+ unnecessary map allocations per request. ## Changes - **`txToMap`**: Add variadic `includeObservations bool` parameter. Observations are only built when `true` is passed, eliminating allocations when they'd just be discarded. - **`PacketQuery`**: Add `ExpandObservations bool` field to thread the caller's intent through the query pipeline. - **`routes.go`**: Set `ExpandObservations` based on `expand=observations` query param. Removed the post-hoc `delete(p, "observations")` loop — observations are simply never created when not requested. - **Single-packet lookups** (`GetPacketByID`, `GetPacketByHash`): Always pass `true` since detail views need observations. - **Multi-node/analytics queries**: Default (no flag) = no observations, matching prior behavior. ## Testing - Added `TestTxToMapLazyObservations` covering all three cases: no flag, `false`, and `true`. - All existing tests pass (`go test ./...`). ## Perf Impact Eliminates ~250 observation map allocations per /api/packets request (at default page size of 50 with ~5 observations each). This is a constant-factor improvement per request — no algorithmic complexity change. Fixes #374 Co-authored-by: you <you@example.com> |
||
|
|
6e2f79c0ad |
perf: optimize QueryGroupedPackets — cache observer count, defer map construction (#594)
## Summary
Optimizes `QueryGroupedPackets()` in `store.go` to eliminate two major
inefficiencies on every grouped packet list request:
### Changes
1. **Cache `UniqueObserverCount` on `StoreTx`** — Instead of iterating
all observations to count unique observers on every query
(O(total_observations) per request), we now track unique observers at
ingest time via an `observerSet` map and pre-computed
`UniqueObserverCount` field. This is updated incrementally as
observations arrive.
2. **Defer map construction until after pagination** — Previously,
`map[string]interface{}` was built for ALL 30K+ filtered results before
sorting and paginating. Now the grouped cache stores sorted `[]*StoreTx`
pointers (lightweight), and `groupedTxsToPage()` builds maps only for
the requested page (typically 50 items). This eliminates ~30K map
allocations per cache miss.
3. **Lighter cache footprint** — The grouped cache now stores
`[]*StoreTx` instead of `*PacketResult` with pre-built maps, reducing
memory pressure and GC work.
### Complexity
- Observer counting: O(1) per query (was O(total_observations))
- Map construction: O(page_size) per query (was O(n) where n = all
filtered results)
- Sort remains O(n log n) on cache miss, but the cache (3s TTL) absorbs
repeated requests
### Testing
- `cd cmd/server && go test ./...` — all tests pass
- `cd cmd/ingestor && go build ./...` — builds clean
Fixes #370
---------
Co-authored-by: you <you@example.com>
|
||
|
|
b0862f7a41 |
fix: replace time.Tick with NewTicker in prune goroutine for graceful shutdown (#593)
## Summary Replace `time.Tick()` with `time.NewTicker()` in the auto-prune goroutine so it stops cleanly during graceful shutdown. ## Problem `time.Tick` creates a ticker that can never be garbage collected or stopped. While the prune goroutine runs for the process lifetime, it won't stop during graceful shutdown — the goroutine leaks past the shutdown sequence. ## Fix - Create a `time.NewTicker` and a done channel - Use `select` to listen on both the ticker and done channel - Stop the ticker and close the done channel in the shutdown path (after `poller.Stop()`) - Pattern matches the existing `StartEvictionTicker()` approach ## Testing - `go build ./...` — compiles cleanly - `go test ./...` — all tests pass Fixes #377 Co-authored-by: you <you@example.com> |
||
|
|
45991eca09 |
perf: combine chained filterPackets passes into single scan (#592)
## Summary Combines the chained `filterTxSlice` calls in `filterPackets()` into a single pass over the packet slice. ## Problem When multiple filter parameters are specified (e.g., `type=4&route=1&since=...&until=...`), each filter created a new intermediate `[]*StoreTx` slice. With N filters, this meant N separate scans and N-1 unnecessary allocations. ## Fix All filter predicates (type, route, observer, hash, since, until, region, node) are pre-computed before the loop, then evaluated in a single `filterTxSlice` call. This eliminates all intermediate allocations. **Preserved behavior:** - Fast-path index lookups for hash-only and observer-only queries remain unchanged - Node-only fast-path via `byNode` index preserved - All existing filter semantics maintained (same comparison operators, same null checks) **Complexity:** Single `O(n)` pass regardless of how many filters are active, vs previous `O(n * k)` where k = number of active filters (each pass is O(n) but allocates). ## Testing All existing tests pass (`cd cmd/server && go test ./...`). Fixes #373 Co-authored-by: you <you@example.com> |
||
|
|
76c42556a2 |
perf: sort snrVals/rssiVals once in computeAnalyticsRF (#591)
## Summary Sort `snrVals` and `rssiVals` once upfront in `computeAnalyticsRF()` and read min/max/median directly from the sorted slices, instead of copying and sorting per stat call. ## Changes - Sort both slices once before computing stats (2 sorts total instead of 4+ copy+sorts) - Read `min` from `sorted[0]`, `max` from `sorted[len-1]`, `median` from `sorted[len/2]` - Remove the now-unused `sortedF64` and `medianF64` helper closures ## Performance impact With 100K+ observations, this eliminates multiple O(n log n) copy+sort operations. Previously each call to `medianF64` did a full copy + sort, and `minF64`/`maxF64` did O(n) scans on the unsorted array. Now: 2 in-place sorts total, O(1) lookups for min/max/median. Fixes #366 Co-authored-by: you <you@example.com> |
||
|
|
6f8378a31c |
perf: batch-remove from secondary indexes in EvictStale (#590)
## Summary `EvictStale()` was doing O(n) linear scans per evicted item to remove from secondary indexes (`byObserver`, `byPayloadType`, `byNode`). Evicting 1000 packets from an observer with 50K observations meant 1000 × 50K = 50M comparisons — all under a write lock. ## Fix Replace per-item removal with batch single-pass filtering: 1. **Collect phase**: Walk evicted packets once, building sets of evicted tx IDs, observation IDs, and affected index keys 2. **Filter phase**: For each affected index slice, do a single pass keeping only non-evicted entries **Before**: O(evicted_count × index_slice_size) per index — quadratic in practice **After**: O(evicted_count + index_slice_size) per affected key — linear ## Changes - `cmd/server/store.go`: Restructured `EvictStale()` eviction loop into collect + batch-filter pattern ## Testing - All existing tests pass (`cd cmd/server && go test ./...`) Fixes #368 Co-authored-by: you <you@example.com> |
||
|
|
56115ee0a4 |
perf: use byNode index in QueryMultiNodePackets instead of full scan (#589)
## Summary `QueryMultiNodePackets()` was scanning ALL packets with `strings.Contains` on JSON blobs — O(packets × pubkeys × json_length). With 30K+ packets and multiple pubkeys, this caused noticeable latency on `/api/packets?nodes=...`. ## Fix Replace the full scan with lookups into the existing `byNode` index, which already maps pubkeys to their transmissions. Merge results with hash-based deduplication, then apply time filters. **Before:** O(N × P × J) where N=all packets, P=pubkeys, J=avg JSON length **After:** O(M × P) where M=packets per pubkey (typically small), plus O(R log R) sort for pagination correctness Results are sorted by `FirstSeen` after merging to maintain the oldest-first ordering expected by the pagination logic. Fixes #357 Co-authored-by: you <you@example.com> |
||
|
|
321d1cf913 |
perf: apply time filter early in GetNodeAnalytics to avoid full packet scan (#588)
## Problem
`GetNodeAnalytics()` in `store.go` scans ALL 30K+ packets doing
`strings.Contains` on every JSON blob when the node has a name, then
filters by time range *after* the full scan. This is `O(packets ×
json_length)` on every `/api/nodes/{pubkey}/analytics` request.
## Fix
Move the `fromISO` time check inside the scan loop so old packets are
skipped **before** the expensive `strings.Contains` matching. For the
non-name path (indexed-only), the time filter is also applied inline,
eliminating the separate `allPkts` intermediate slice.
### Before
1. Scan all packets → collect matches (including old ones) → `allPkts`
2. Filter `allPkts` by time → `packets`
### After
1. Scan packets, skip `tx.FirstSeen <= fromISO` immediately → `packets`
This avoids `strings.Contains` calls on packets outside the requested
time window (typically 7 days out of months of data).
## Complexity
- **Before:** `O(total_packets × avg_json_length)` for name matching
- **After:** `O(recent_packets × avg_json_length)` — only packets within
the time window are string-matched
## Testing
- `cd cmd/server && go test ./...` — all tests pass
Fixes #367
Co-authored-by: you <you@example.com>
|
||
|
|
790a713ba9 |
perf: combine 4 subpath API calls into single bulk endpoint (#587)
## Summary
Consolidates the 4 parallel `/api/analytics/subpaths` calls in the Route
Patterns tab into a single `/api/analytics/subpaths-bulk` endpoint,
eliminating 3 redundant server-side scans of the subpath index on cache
miss.
## Changes
### Backend (`cmd/server/routes.go`, `cmd/server/store.go`)
- New `GET
/api/analytics/subpaths-bulk?groups=2-2:50,3-3:30,4-4:20,5-8:15`
endpoint
- Groups format: `minLen-maxLen:limit` comma-separated
- `GetAnalyticsSubpathsBulk()` iterates `spIndex` once, bucketing
entries into per-group accumulators by hop length
- Hop name resolution is done once per raw hop and shared across groups
- Results are cached per-group for compatibility with existing
single-key cache lookups
- Region-filtered queries fall back to individual
`GetAnalyticsSubpaths()` calls (region filtering requires
per-transmission observer checks)
### Frontend (`public/analytics.js`)
- `renderSubpaths()` now makes 1 API call instead of 4
- Response shape: `{ results: [{ subpaths, totalPaths }, ...] }` —
destructured into the same `[d2, d3, d4, d5]` variables
### Tests (`cmd/server/routes_test.go`)
- `TestAnalyticsSubpathsBulk`: validates 3-group response shape, missing
params error, invalid format error
## Performance
- **Before:** 4 API calls → 4 scans of `spIndex` + 4× hop resolution on
cache miss
- **After:** 1 API call → 1 scan of `spIndex` + 1× hop resolution
(shared cache)
- Cache miss cost reduced by ~75% for this tab
- No change on cache hit (individual group caching still works)
Fixes #398
Co-authored-by: you <you@example.com>
|
||
|
|
cd470dffbe |
perf: batch observation fetching to eliminate N+1 API calls on sort change (#586)
## Summary
Fixes the N+1 API call pattern when changing observation sort mode on
the packets page. Previously, switching sort to Path or Time fired
individual `/api/packets/{hash}` requests for **every**
multi-observation group without cached children — potentially 100+
concurrent requests.
## Changes
### Backend: Batch observations endpoint
- **New endpoint:** `POST /api/packets/observations` accepts `{"hashes":
["h1", "h2", ...]}` and returns all observations keyed by hash in a
single response
- Capped at 200 hashes per request to prevent abuse
- 4 test cases covering empty input, invalid JSON, too-many-hashes, and
valid requests
### Frontend: Use batch endpoint
- `packets.js` sort change handler now collects all hashes needing
observation data and sends a single POST request instead of N individual
GETs
- Same behavior, single round-trip
## Performance
- **Before:** Changing sort with 100 visible groups → 100 concurrent API
requests, browser connection queueing (6 per host), several seconds of
lag
- **After:** Single POST request regardless of group count, response
time proportional to store lookup (sub-millisecond per hash in memory)
Fixes #389
---------
Co-authored-by: you <you@example.com>
|
||
|
|
45d8116880 |
perf: query only matching node locations in handleObservers (#579)
## Summary `handleObservers()` in `routes.go` was calling `GetNodeLocations()` which fetches ALL nodes from the DB just to match ~10 observer IDs against node public keys. With 500+ nodes this is wasteful. ## Changes - **`db.go`**: Added `GetNodeLocationsByKeys(keys []string)` — queries only the rows matching the given public keys using a parameterized `WHERE LOWER(public_key) IN (?, ?, ...)` clause. - **`routes.go`**: `handleObservers` now collects observer IDs and calls the targeted method instead of the full-table scan. - **`coverage_test.go`**: Added `TestGetNodeLocationsByKeys` covering known key, empty keys, and unknown key cases. ## Performance With ~10 observers and 500+ nodes, the query goes from scanning all 500 rows to fetching only ~10. The original `GetNodeLocations()` is preserved for any other callers. Fixes #378 Co-authored-by: you <you@example.com> |
||
|
|
f3d5d1e021 |
perf: resolve hops from in-memory prefix map instead of N+1 DB queries (#577)
## Summary Replace N+1 per-hop DB queries in `handleResolveHops` with O(1) lookups against the in-memory prefix map that already exists in the packet store. ## Problem Each hop in the `resolve-hops` API triggered a separate `SELECT ... LIKE ?` query against the nodes table. With 10 hops, that's 10 DB round-trips — unnecessary when `getCachedNodesAndPM()` already maintains an in-memory prefix map that can resolve hops instantly. ## Changes - **routes.go**: Replace the per-hop DB query loop with `pm.m[hopLower]` lookups from the prefix map. Convert `nodeInfo` → `HopCandidate` inline. Remove unused `rows`/`sql.Scan` code. - **store.go**: Add `InvalidateNodeCache()` method to force prefix map rebuild (needed by tests that insert nodes after store initialization). - **routes_test.go**: Give `TestResolveHopsAmbiguous` a proper store so hops resolve via the prefix map. - **resolve_context_test.go**: Call `InvalidateNodeCache()` after inserting test nodes. Fix confidence assertion — with GPS candidates and no affinity context, `resolveWithContext` correctly returns `gps_preference` (previously masked because the prefix map didn't have the test nodes). ## Complexity O(1) per hop lookup via hash map vs O(n) DB scan per hop. No hot-path impact — this endpoint is called on-demand, not in a render loop. Fixes #369 --------- Co-authored-by: you <you@example.com> |
||
|
|
02004c5912 |
perf: incremental distance index update on path changes (#576)
## Summary Replace full `buildDistanceIndex()` rebuild with incremental `removeTxFromDistanceIndex`/`addTxToDistanceIndex` for only the transmissions whose paths actually changed during `IngestNewObservations`. ## Problem When any transmission's best path changed during observation ingestion, the **entire distance index was rebuilt** — iterating all 30K+ packets, resolving all hops, and computing haversine distances. This `O(total_packets × avg_hops)` operation ran under a write lock, blocking all API readers. A 30-second debounce (`distRebuildInterval`) was added in #557 to mitigate this, but it only delayed the pain — the full rebuild still happened, just less frequently. ## Fix - Added `removeTxFromDistanceIndex(tx)` — filters out all `distHopRecord` and `distPathRecord` entries for a specific transmission - Added `addTxToDistanceIndex(tx)` — computes and appends new distance records for a single transmission - In `IngestNewObservations`, changed path-change handling to call remove+add for each affected tx instead of marking dirty and waiting for a full rebuild - Removed `distDirty`, `distLast`, and `distRebuildInterval` since incremental updates are cheap enough to apply immediately ## Complexity - **Before:** `O(total_packets × avg_hops)` per rebuild (30K+ packets) - **After:** `O(changed_txs × avg_hops + total_dist_records)` — the remove is a linear scan of the distance slices, but only for affected txs; the add is `O(hops)` per changed tx The remove scan over `distHops`/`distPaths` slices is linear in slice length, but this is still far cheaper than the full rebuild which also does JSON parsing, hop resolution, and haversine math for every packet. ## Tests - Updated `TestDistanceRebuildDebounce` → `TestDistanceIncrementalUpdate` to verify incremental behavior and check for duplicate path records - All existing tests pass (`go test ./...` in both `cmd/server` and `cmd/ingestor`) Fixes #365 --------- Co-authored-by: you <you@example.com> |
||
|
|
ef30031e2e |
perf: cache resolveRegionObservers with 30s TTL (#575)
## Summary Cache `resolveRegionObservers()` results with a 30-second TTL to eliminate repeated database queries for region→observer ID mappings. ## Problem `resolveRegionObservers()` queried the database on every call despite the observers table changing infrequently (~20 rows). It's called from 10+ hot paths including `filterPackets()`, `GetChannels()`, and multiple analytics compute functions. When analytics caches are cold, parallel requests each hit the DB independently. ## Solution - Added a dedicated `regionObsMu` mutex + `regionObsCache` map with 30s TTL - Uses a separate mutex (not `s.mu`) to avoid deadlocks — callers already hold `s.mu.RLock()` - Cache is lazily populated per-region and fully invalidated after TTL expires - Follows the same pattern as `getCachedNodesAndPM()` (30s TTL, on-demand rebuild) ## Changes - **`cmd/server/store.go`**: Added `regionObsMu`, `regionObsCache`, `regionObsCacheTime` fields; rewrote `resolveRegionObservers()` to check cache first; added `fetchAndCacheRegionObs()` helper - **`cmd/server/coverage_test.go`**: Added `TestResolveRegionObserversCaching` — verifies cache population, cache hits, and nil handling for unknown regions ## Testing - All existing Go tests pass (`go test ./...`) - New test verifies caching behavior (population, hits, nil for unknown regions) Fixes #362 --------- Co-authored-by: you <you@example.com> |
||
|
|
67511ed6a7 |
perf: combine GetStoreStats into 2 concurrent queries instead of 5 sequential (#574)
## Summary `GetStoreStats()` ran 5 sequential DB queries on every call. This combines them into **2 concurrent queries**: 1. **Node/observer counts** — single query using subqueries: `SELECT (SELECT COUNT(*) FROM nodes WHERE ...), (SELECT COUNT(*) FROM nodes), (SELECT COUNT(*) FROM observers)` 2. **Observation counts** — single query using conditional aggregation: `SUM(CASE WHEN timestamp > ? THEN 1 ELSE 0 END)` scoped to the 24h window, avoiding a full table scan for the 1h count Both queries run concurrently via goroutines + `sync.WaitGroup`. ## What changed - `cmd/server/store.go`: Rewrote `GetStoreStats()` — 5 sequential `QueryRow` calls → 2 concurrent combined queries - Error handling now propagates query errors instead of silently ignoring them ## Performance justification - **Before:** 5 sequential round-trips to SQLite, with 2 potentially expensive `COUNT(*)` scans on the `observations` table - **After:** 2 concurrent round-trips; the observation query scans the 24h window once instead of separately scanning for 1h and 24h - The 10s cache (`statsTTL`) remains, so this fires at most once per 10s — but when it does fire, it's ~2.5x fewer round-trips and the observation scan is halved ## Tests - `go test ./...` passes for both `cmd/server` and `cmd/ingestor` Fixes #363 --------- Co-authored-by: you <you@example.com> |
||
|
|
d4f2c3ac66 |
perf: index subpath detail lookups instead of scanning all packets (#571)
## Summary `GetSubpathDetail()` iterated ALL packets to find those containing a specific subpath — `O(packets × hops × subpath_length)`. With 30K+ packets this caused user-visible latency on every subpath detail click. ## Changes ### `cmd/server/store.go` - Added `spTxIndex map[string][]*StoreTx` alongside existing `spIndex` — tracks which transmissions contain each subpath key - Extended `addTxToSubpathIndexFull()` and `removeTxFromSubpathIndexFull()` to maintain both indexes simultaneously - Original `addTxToSubpathIndex()`/`removeTxFromSubpathIndex()` wrappers preserved for backward compatibility - `buildSubpathIndex()` now populates both `spIndex` and `spTxIndex` during `Load()` - All incremental update sites (ingest, path change, eviction) use the `Full` variants - `GetSubpathDetail()` rewritten: direct `O(1)` map lookup on `spTxIndex[key]` instead of scanning all packets ### `cmd/server/coverage_test.go` - Added `TestSubpathTxIndexPopulated`: verifies `spTxIndex` is populated, counts match `spIndex`, and `GetSubpathDetail` returns correct results for both existing and non-existent subpaths ## Complexity - **Before:** `O(total_packets × avg_hops × subpath_length)` per request - **After:** `O(matched_txs)` per request (direct map lookup) ## Tests All tests pass: `cmd/server` (4.6s), `cmd/ingestor` (25.6s) Fixes #358 --------- Co-authored-by: you <you@example.com> |
||
|
|
37300bf5c8 |
fix: cap prefix map at 8 chars to cut memory ~10x (#570)
## Summary `buildPrefixMap()` was generating map entries for every prefix length from 2 to `len(pubkey)` (up to 64 chars), creating ~31 entries per node. With 500 nodes that's ~15K map entries; with 1K+ nodes it balloons to 31K+. ## Changes **`cmd/server/store.go`:** - Added `maxPrefixLen = 8` constant — MeshCore path hops use 2–6 char prefixes, 8 gives headroom - Capped the prefix generation loop at `maxPrefixLen` instead of `len(pk)` - Added full pubkey as a separate map entry when key is longer than `maxPrefixLen`, ensuring exact-match lookups (used by `resolveWithContext`) still work **`cmd/server/coverage_test.go`:** - Added `TestPrefixMapCap` with subtests for: - Short prefix resolution still works - Full pubkey exact-match resolution still works - Intermediate prefixes beyond the cap correctly return nil - Short keys (≤8 chars) have all prefix entries - Map size is bounded ## Impact - Map entries per node: ~31 → ~8 (one per prefix length 2–8, plus one full-key entry) - Total map size for 500 nodes: ~15K entries → ~4K entries (~75% reduction) - No behavioral change for path hop resolution (2–6 char prefixes) - No behavioral change for exact pubkey lookups ## Tests All existing tests pass: - `cmd/server`: ✅ - `cmd/ingestor`: ✅ Fixes #364 --------- Co-authored-by: you <you@example.com> |
||
|
|
cb8a2e15c8 |
perf: index node path lookups instead of scanning all packets (#572)
## Summary Index node path lookups in `handleNodePaths()` instead of scanning all packets on every request. ## Problem `handleNodePaths()` iterated ALL packets in the store (`O(total_packets × avg_hops)`) with prefix string matching on every hop. This caused user-facing latency on every node detail page load with 30K+ packets. ## Fix Added a `byPathHop` index (`map[string][]*StoreTx`) that maps lowercase hop prefixes and resolved full pubkeys to their transmissions. The handler now does direct map lookups instead of a full scan. ### Index lifecycle - **Built** during `Load()` via `buildPathHopIndex()` - **Incrementally updated** during `IngestNewFromDB()` (new packets) and `IngestNewObservations()` (path changes) - **Cleaned up** during `EvictStale()` (packet removal) ### Query strategy The handler looks up candidates from the index using: 1. Full pubkey (matches resolved hops from `resolved_path`) 2. 2-char prefix (matches short raw hops) 3. 4-char prefix (matches medium raw hops) 4. Any longer raw hops starting with the 4-char prefix This reduces complexity from `O(total_packets × avg_hops)` to `O(matching_txs + unique_hop_keys)`. ## Tests - `TestNodePathsEndpointUsesIndex` — verifies the endpoint returns correct results using the index - `TestPathHopIndexIncrementalUpdate` — verifies add/remove operations on the index All existing tests pass. Fixes #359 Co-authored-by: you <you@example.com> |
||
|
|
aac038abb9 |
fix: filter inconsistent hash sizes by role and add 7-day time window (#567)
## Summary Fixes #566 — The "Inconsistent Hash Sizes" list on the Analytics page included all node types and had no time window, causing false positives. ## Changes ### 1. Role filter on inconsistent nodes (`cmd/server/store.go`) Added role filter to the `inconsistentNodes` loop in `computeHashCollisions()` so only repeaters and room servers are included. Companions are excluded since they were never affected by the firmware bug. This matches the existing role filter on collision bucketing from #441. ```go // Before: if cn.HashSizeInconsistent { // After: if cn.HashSizeInconsistent && (cn.Role == "repeater" || cn.Role == "room_server") { ``` ### 2. 7-day time window on hash size computation (`cmd/server/store.go`) Added a 7-day recency cutoff to `computeNodeHashSizeInfo()`. Adverts older than 7 days are now skipped, preventing legitimate historical config changes (e.g., testing different byte sizes) from creating permanent false positives. ### 3. Frontend description text (`public/analytics.js`) Updated the description to reflect the filtered scope: now says "Repeaters and room servers" instead of "Nodes", mentions the 7-day window, and notes that companions are excluded. ## Tests - `TestInconsistentNodesExcludesCompanions` — verifies companions are excluded while repeaters and room servers are included - `TestHashSizeInfoTimeWindow` — verifies adverts older than 7 days are excluded from hash size computation - Updated existing hash size tests to use recent timestamps (compatible with the new time window) - All existing tests pass: `cmd/server` ✅, `cmd/ingestor` ✅ ## Perf justification The time window filter adds a single string comparison per advert in the scan loop — O(n) with a tiny constant. No impact on hot paths. --------- Co-authored-by: you <you@example.com> |
||
|
|
588fba226d |
perf: track max transmission/observation IDs incrementally (#569)
## Summary Replace O(n) map iteration in `MaxTransmissionID()` and `MaxObservationID()` with O(1) field lookups. ## What Changed - Added `maxTxID` and `maxObsID` fields to `PacketStore` - Updated `Load()`, `IngestNewFromDB()`, and `IngestNewObservations()` to track max IDs incrementally as entries are added - `MaxTransmissionID()` and `MaxObservationID()` now return the tracked field directly instead of iterating the entire map ## Performance Before: O(n) iteration over 30K+ map entries under a read lock After: O(1) field return ## Tests - Added `TestMaxTransmissionIDIncremental` verifying the incremental field matches brute-force iteration over the maps - All existing tests pass (`cmd/server` and `cmd/ingestor`) Fixes #356 Co-authored-by: you <you@example.com> |
||
|
|
f897ce1b26 |
fix: use runtime heap stats for memory-based eviction (#564)
## Problem Closes #563. Addresses the *Packet store estimated memory* item in #559. `estimatedMemoryMB()` used a hardcoded formula: ```go return float64(len(s.packets)*5120+s.totalObs*500) / 1048576.0 ``` This ignored three data structures that grow continuously with every ingest cycle: | Structure | Production size | Heap not counted | |---|---|---| | `distHops []distHopRecord` | 1,556,833 records | ~300 MB | | `distPaths []distPathRecord` | 93,090 records | ~25 MB | | `spIndex map[string]int` | 4,113,234 entries | ~400 MB | Result: formula reported ~1.2 GB while actual heap was ~5 GB. With `maxMemoryMB: 1024`, eviction calculated it only needed to shed ~200 MB, removed a handful of packets, and stopped. Memory kept growing until the OOM killer fired. ## Fix Replace `estimatedMemoryMB()` with `runtime.ReadMemStats` so all data structures are automatically counted: ```go func (s *PacketStore) estimatedMemoryMB() float64 { if s.memoryEstimator != nil { return s.memoryEstimator() } var ms runtime.MemStats runtime.ReadMemStats(&ms) return float64(ms.HeapAlloc) / 1048576.0 } ``` Replace the eviction simulation loop (which re-used the same wrong formula) with a proportional calculation: if heap is N× over budget, evict enough packets to keep `(1/N) × 0.9` of the current count. The 0.9 factor adds a 10% buffer so the next ingest cycle doesn't immediately re-trigger. All major data structures (distHops, distPaths, spIndex) scale with packet count, so removing a fraction of packets frees roughly the same fraction of total heap. ## Testing - Updated `TestEvictStale_MemoryBasedEviction` to inject a deterministic estimator via the new `memoryEstimator` field. - Added `TestEvictStale_MemoryBasedEviction_UnderestimatedHeap`: verifies that when actual heap is 5× over limit (the production failure scenario), eviction correctly removes ~80%+ of packets. ``` === RUN TestEvictStale_MemoryBasedEviction [store] Evicted 538 packets (1076 obs) --- PASS === RUN TestEvictStale_MemoryBasedEviction_UnderestimatedHeap [store] Evicted 820 packets (1640 obs) --- PASS ``` Full suite: `go test ./...` — ok (10.3s) ## Perf note `runtime.ReadMemStats` runs once per eviction tick (every 60 s) and once per `/api/perf/store` call. Cost is negligible. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
cbfce41d7e |
perf: optimize neighbor graph build (3 fixes for 30s+ CPU) (#562)
## Summary Fixes critical performance issue in neighbor graph computation that consumed 65% of CPU (30+ seconds) on a 325K packet dataset. ## Changes ### Fix 1: Cache strings.ToLower results - Added cachedToLower() helper that caches lowercased strings in a local map - Pubkeys repeat across hundreds of thousands of observations - Pre-computes fromLower once per transaction instead of once per observation - **Impact:** Eliminates ~8.4s (25.3% CPU) ### Fix 2: Cache parsed DecodedJSON via StoreTx.ParsedDecoded() - Added ParsedDecoded() method on StoreTx using sync.Once for thread-safe lazy caching - json.Unmarshal on decoded_json now runs at most once per packet lifetime - Result reused by extractFromNode, indexByNode, trackAdvertPubkey - **Impact:** Eliminates ~8.8s (26.3% CPU) ### Fix 3: Extend neighbor graph TTL from 60s to 5 minutes - The graph depends on traffic patterns, not individual packets - Reduces rebuild frequency 5x - **Impact:** ~80% reduction in sustained CPU from graph rebuilds ## Tests - 7 new tests added, all 26+ existing neighbor graph tests pass - BenchmarkBuildFromStore: 727us/op, 237KB/op, 6030 allocs/op Related: #559 --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: you <you@example.com> |
||
|
|
1e1c4cb91f |
fix: include resolved_path in groupByHash packet response
QueryGroupedPackets builds its map manually and was missing resolved_path. The non-grouped path (txToMap) included it. |
||
|
|
0c340e1eb6 |
fix: set hasResolvedPath flag after ensuring column exists
detectSchema() runs at DB open time before ensureResolvedPathColumn() adds the column during Load(). On first run (or any run where the column was just added), hasResolvedPath stayed false, causing Load() to skip reading resolved_path from SQLite. This forced a full backfill of all observations on every restart, burning CPU for minutes on large DBs. Fix: set hasResolvedPath = true after ensureResolvedPathColumn succeeds. |