mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-05-11 13:24:44 +00:00
cd470dffbefe6deacb23f5d9ef93eeeaecd8c11e
153 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
cd470dffbe |
perf: batch observation fetching to eliminate N+1 API calls on sort change (#586)
## Summary
Fixes the N+1 API call pattern when changing observation sort mode on
the packets page. Previously, switching sort to Path or Time fired
individual `/api/packets/{hash}` requests for **every**
multi-observation group without cached children — potentially 100+
concurrent requests.
## Changes
### Backend: Batch observations endpoint
- **New endpoint:** `POST /api/packets/observations` accepts `{"hashes":
["h1", "h2", ...]}` and returns all observations keyed by hash in a
single response
- Capped at 200 hashes per request to prevent abuse
- 4 test cases covering empty input, invalid JSON, too-many-hashes, and
valid requests
### Frontend: Use batch endpoint
- `packets.js` sort change handler now collects all hashes needing
observation data and sends a single POST request instead of N individual
GETs
- Same behavior, single round-trip
## Performance
- **Before:** Changing sort with 100 visible groups → 100 concurrent API
requests, browser connection queueing (6 per host), several seconds of
lag
- **After:** Single POST request regardless of group count, response
time proportional to store lookup (sub-millisecond per hash in memory)
Fixes #389
---------
Co-authored-by: you <you@example.com>
|
||
|
|
45d8116880 |
perf: query only matching node locations in handleObservers (#579)
## Summary `handleObservers()` in `routes.go` was calling `GetNodeLocations()` which fetches ALL nodes from the DB just to match ~10 observer IDs against node public keys. With 500+ nodes this is wasteful. ## Changes - **`db.go`**: Added `GetNodeLocationsByKeys(keys []string)` — queries only the rows matching the given public keys using a parameterized `WHERE LOWER(public_key) IN (?, ?, ...)` clause. - **`routes.go`**: `handleObservers` now collects observer IDs and calls the targeted method instead of the full-table scan. - **`coverage_test.go`**: Added `TestGetNodeLocationsByKeys` covering known key, empty keys, and unknown key cases. ## Performance With ~10 observers and 500+ nodes, the query goes from scanning all 500 rows to fetching only ~10. The original `GetNodeLocations()` is preserved for any other callers. Fixes #378 Co-authored-by: you <you@example.com> |
||
|
|
f3d5d1e021 |
perf: resolve hops from in-memory prefix map instead of N+1 DB queries (#577)
## Summary Replace N+1 per-hop DB queries in `handleResolveHops` with O(1) lookups against the in-memory prefix map that already exists in the packet store. ## Problem Each hop in the `resolve-hops` API triggered a separate `SELECT ... LIKE ?` query against the nodes table. With 10 hops, that's 10 DB round-trips — unnecessary when `getCachedNodesAndPM()` already maintains an in-memory prefix map that can resolve hops instantly. ## Changes - **routes.go**: Replace the per-hop DB query loop with `pm.m[hopLower]` lookups from the prefix map. Convert `nodeInfo` → `HopCandidate` inline. Remove unused `rows`/`sql.Scan` code. - **store.go**: Add `InvalidateNodeCache()` method to force prefix map rebuild (needed by tests that insert nodes after store initialization). - **routes_test.go**: Give `TestResolveHopsAmbiguous` a proper store so hops resolve via the prefix map. - **resolve_context_test.go**: Call `InvalidateNodeCache()` after inserting test nodes. Fix confidence assertion — with GPS candidates and no affinity context, `resolveWithContext` correctly returns `gps_preference` (previously masked because the prefix map didn't have the test nodes). ## Complexity O(1) per hop lookup via hash map vs O(n) DB scan per hop. No hot-path impact — this endpoint is called on-demand, not in a render loop. Fixes #369 --------- Co-authored-by: you <you@example.com> |
||
|
|
02004c5912 |
perf: incremental distance index update on path changes (#576)
## Summary Replace full `buildDistanceIndex()` rebuild with incremental `removeTxFromDistanceIndex`/`addTxToDistanceIndex` for only the transmissions whose paths actually changed during `IngestNewObservations`. ## Problem When any transmission's best path changed during observation ingestion, the **entire distance index was rebuilt** — iterating all 30K+ packets, resolving all hops, and computing haversine distances. This `O(total_packets × avg_hops)` operation ran under a write lock, blocking all API readers. A 30-second debounce (`distRebuildInterval`) was added in #557 to mitigate this, but it only delayed the pain — the full rebuild still happened, just less frequently. ## Fix - Added `removeTxFromDistanceIndex(tx)` — filters out all `distHopRecord` and `distPathRecord` entries for a specific transmission - Added `addTxToDistanceIndex(tx)` — computes and appends new distance records for a single transmission - In `IngestNewObservations`, changed path-change handling to call remove+add for each affected tx instead of marking dirty and waiting for a full rebuild - Removed `distDirty`, `distLast`, and `distRebuildInterval` since incremental updates are cheap enough to apply immediately ## Complexity - **Before:** `O(total_packets × avg_hops)` per rebuild (30K+ packets) - **After:** `O(changed_txs × avg_hops + total_dist_records)` — the remove is a linear scan of the distance slices, but only for affected txs; the add is `O(hops)` per changed tx The remove scan over `distHops`/`distPaths` slices is linear in slice length, but this is still far cheaper than the full rebuild which also does JSON parsing, hop resolution, and haversine math for every packet. ## Tests - Updated `TestDistanceRebuildDebounce` → `TestDistanceIncrementalUpdate` to verify incremental behavior and check for duplicate path records - All existing tests pass (`go test ./...` in both `cmd/server` and `cmd/ingestor`) Fixes #365 --------- Co-authored-by: you <you@example.com> |
||
|
|
ef30031e2e |
perf: cache resolveRegionObservers with 30s TTL (#575)
## Summary Cache `resolveRegionObservers()` results with a 30-second TTL to eliminate repeated database queries for region→observer ID mappings. ## Problem `resolveRegionObservers()` queried the database on every call despite the observers table changing infrequently (~20 rows). It's called from 10+ hot paths including `filterPackets()`, `GetChannels()`, and multiple analytics compute functions. When analytics caches are cold, parallel requests each hit the DB independently. ## Solution - Added a dedicated `regionObsMu` mutex + `regionObsCache` map with 30s TTL - Uses a separate mutex (not `s.mu`) to avoid deadlocks — callers already hold `s.mu.RLock()` - Cache is lazily populated per-region and fully invalidated after TTL expires - Follows the same pattern as `getCachedNodesAndPM()` (30s TTL, on-demand rebuild) ## Changes - **`cmd/server/store.go`**: Added `regionObsMu`, `regionObsCache`, `regionObsCacheTime` fields; rewrote `resolveRegionObservers()` to check cache first; added `fetchAndCacheRegionObs()` helper - **`cmd/server/coverage_test.go`**: Added `TestResolveRegionObserversCaching` — verifies cache population, cache hits, and nil handling for unknown regions ## Testing - All existing Go tests pass (`go test ./...`) - New test verifies caching behavior (population, hits, nil for unknown regions) Fixes #362 --------- Co-authored-by: you <you@example.com> |
||
|
|
67511ed6a7 |
perf: combine GetStoreStats into 2 concurrent queries instead of 5 sequential (#574)
## Summary `GetStoreStats()` ran 5 sequential DB queries on every call. This combines them into **2 concurrent queries**: 1. **Node/observer counts** — single query using subqueries: `SELECT (SELECT COUNT(*) FROM nodes WHERE ...), (SELECT COUNT(*) FROM nodes), (SELECT COUNT(*) FROM observers)` 2. **Observation counts** — single query using conditional aggregation: `SUM(CASE WHEN timestamp > ? THEN 1 ELSE 0 END)` scoped to the 24h window, avoiding a full table scan for the 1h count Both queries run concurrently via goroutines + `sync.WaitGroup`. ## What changed - `cmd/server/store.go`: Rewrote `GetStoreStats()` — 5 sequential `QueryRow` calls → 2 concurrent combined queries - Error handling now propagates query errors instead of silently ignoring them ## Performance justification - **Before:** 5 sequential round-trips to SQLite, with 2 potentially expensive `COUNT(*)` scans on the `observations` table - **After:** 2 concurrent round-trips; the observation query scans the 24h window once instead of separately scanning for 1h and 24h - The 10s cache (`statsTTL`) remains, so this fires at most once per 10s — but when it does fire, it's ~2.5x fewer round-trips and the observation scan is halved ## Tests - `go test ./...` passes for both `cmd/server` and `cmd/ingestor` Fixes #363 --------- Co-authored-by: you <you@example.com> |
||
|
|
d4f2c3ac66 |
perf: index subpath detail lookups instead of scanning all packets (#571)
## Summary `GetSubpathDetail()` iterated ALL packets to find those containing a specific subpath — `O(packets × hops × subpath_length)`. With 30K+ packets this caused user-visible latency on every subpath detail click. ## Changes ### `cmd/server/store.go` - Added `spTxIndex map[string][]*StoreTx` alongside existing `spIndex` — tracks which transmissions contain each subpath key - Extended `addTxToSubpathIndexFull()` and `removeTxFromSubpathIndexFull()` to maintain both indexes simultaneously - Original `addTxToSubpathIndex()`/`removeTxFromSubpathIndex()` wrappers preserved for backward compatibility - `buildSubpathIndex()` now populates both `spIndex` and `spTxIndex` during `Load()` - All incremental update sites (ingest, path change, eviction) use the `Full` variants - `GetSubpathDetail()` rewritten: direct `O(1)` map lookup on `spTxIndex[key]` instead of scanning all packets ### `cmd/server/coverage_test.go` - Added `TestSubpathTxIndexPopulated`: verifies `spTxIndex` is populated, counts match `spIndex`, and `GetSubpathDetail` returns correct results for both existing and non-existent subpaths ## Complexity - **Before:** `O(total_packets × avg_hops × subpath_length)` per request - **After:** `O(matched_txs)` per request (direct map lookup) ## Tests All tests pass: `cmd/server` (4.6s), `cmd/ingestor` (25.6s) Fixes #358 --------- Co-authored-by: you <you@example.com> |
||
|
|
37300bf5c8 |
fix: cap prefix map at 8 chars to cut memory ~10x (#570)
## Summary `buildPrefixMap()` was generating map entries for every prefix length from 2 to `len(pubkey)` (up to 64 chars), creating ~31 entries per node. With 500 nodes that's ~15K map entries; with 1K+ nodes it balloons to 31K+. ## Changes **`cmd/server/store.go`:** - Added `maxPrefixLen = 8` constant — MeshCore path hops use 2–6 char prefixes, 8 gives headroom - Capped the prefix generation loop at `maxPrefixLen` instead of `len(pk)` - Added full pubkey as a separate map entry when key is longer than `maxPrefixLen`, ensuring exact-match lookups (used by `resolveWithContext`) still work **`cmd/server/coverage_test.go`:** - Added `TestPrefixMapCap` with subtests for: - Short prefix resolution still works - Full pubkey exact-match resolution still works - Intermediate prefixes beyond the cap correctly return nil - Short keys (≤8 chars) have all prefix entries - Map size is bounded ## Impact - Map entries per node: ~31 → ~8 (one per prefix length 2–8, plus one full-key entry) - Total map size for 500 nodes: ~15K entries → ~4K entries (~75% reduction) - No behavioral change for path hop resolution (2–6 char prefixes) - No behavioral change for exact pubkey lookups ## Tests All existing tests pass: - `cmd/server`: ✅ - `cmd/ingestor`: ✅ Fixes #364 --------- Co-authored-by: you <you@example.com> |
||
|
|
cb8a2e15c8 |
perf: index node path lookups instead of scanning all packets (#572)
## Summary Index node path lookups in `handleNodePaths()` instead of scanning all packets on every request. ## Problem `handleNodePaths()` iterated ALL packets in the store (`O(total_packets × avg_hops)`) with prefix string matching on every hop. This caused user-facing latency on every node detail page load with 30K+ packets. ## Fix Added a `byPathHop` index (`map[string][]*StoreTx`) that maps lowercase hop prefixes and resolved full pubkeys to their transmissions. The handler now does direct map lookups instead of a full scan. ### Index lifecycle - **Built** during `Load()` via `buildPathHopIndex()` - **Incrementally updated** during `IngestNewFromDB()` (new packets) and `IngestNewObservations()` (path changes) - **Cleaned up** during `EvictStale()` (packet removal) ### Query strategy The handler looks up candidates from the index using: 1. Full pubkey (matches resolved hops from `resolved_path`) 2. 2-char prefix (matches short raw hops) 3. 4-char prefix (matches medium raw hops) 4. Any longer raw hops starting with the 4-char prefix This reduces complexity from `O(total_packets × avg_hops)` to `O(matching_txs + unique_hop_keys)`. ## Tests - `TestNodePathsEndpointUsesIndex` — verifies the endpoint returns correct results using the index - `TestPathHopIndexIncrementalUpdate` — verifies add/remove operations on the index All existing tests pass. Fixes #359 Co-authored-by: you <you@example.com> |
||
|
|
aac038abb9 |
fix: filter inconsistent hash sizes by role and add 7-day time window (#567)
## Summary Fixes #566 — The "Inconsistent Hash Sizes" list on the Analytics page included all node types and had no time window, causing false positives. ## Changes ### 1. Role filter on inconsistent nodes (`cmd/server/store.go`) Added role filter to the `inconsistentNodes` loop in `computeHashCollisions()` so only repeaters and room servers are included. Companions are excluded since they were never affected by the firmware bug. This matches the existing role filter on collision bucketing from #441. ```go // Before: if cn.HashSizeInconsistent { // After: if cn.HashSizeInconsistent && (cn.Role == "repeater" || cn.Role == "room_server") { ``` ### 2. 7-day time window on hash size computation (`cmd/server/store.go`) Added a 7-day recency cutoff to `computeNodeHashSizeInfo()`. Adverts older than 7 days are now skipped, preventing legitimate historical config changes (e.g., testing different byte sizes) from creating permanent false positives. ### 3. Frontend description text (`public/analytics.js`) Updated the description to reflect the filtered scope: now says "Repeaters and room servers" instead of "Nodes", mentions the 7-day window, and notes that companions are excluded. ## Tests - `TestInconsistentNodesExcludesCompanions` — verifies companions are excluded while repeaters and room servers are included - `TestHashSizeInfoTimeWindow` — verifies adverts older than 7 days are excluded from hash size computation - Updated existing hash size tests to use recent timestamps (compatible with the new time window) - All existing tests pass: `cmd/server` ✅, `cmd/ingestor` ✅ ## Perf justification The time window filter adds a single string comparison per advert in the scan loop — O(n) with a tiny constant. No impact on hot paths. --------- Co-authored-by: you <you@example.com> |
||
|
|
588fba226d |
perf: track max transmission/observation IDs incrementally (#569)
## Summary Replace O(n) map iteration in `MaxTransmissionID()` and `MaxObservationID()` with O(1) field lookups. ## What Changed - Added `maxTxID` and `maxObsID` fields to `PacketStore` - Updated `Load()`, `IngestNewFromDB()`, and `IngestNewObservations()` to track max IDs incrementally as entries are added - `MaxTransmissionID()` and `MaxObservationID()` now return the tracked field directly instead of iterating the entire map ## Performance Before: O(n) iteration over 30K+ map entries under a read lock After: O(1) field return ## Tests - Added `TestMaxTransmissionIDIncremental` verifying the incremental field matches brute-force iteration over the maps - All existing tests pass (`cmd/server` and `cmd/ingestor`) Fixes #356 Co-authored-by: you <you@example.com> |
||
|
|
f897ce1b26 |
fix: use runtime heap stats for memory-based eviction (#564)
## Problem Closes #563. Addresses the *Packet store estimated memory* item in #559. `estimatedMemoryMB()` used a hardcoded formula: ```go return float64(len(s.packets)*5120+s.totalObs*500) / 1048576.0 ``` This ignored three data structures that grow continuously with every ingest cycle: | Structure | Production size | Heap not counted | |---|---|---| | `distHops []distHopRecord` | 1,556,833 records | ~300 MB | | `distPaths []distPathRecord` | 93,090 records | ~25 MB | | `spIndex map[string]int` | 4,113,234 entries | ~400 MB | Result: formula reported ~1.2 GB while actual heap was ~5 GB. With `maxMemoryMB: 1024`, eviction calculated it only needed to shed ~200 MB, removed a handful of packets, and stopped. Memory kept growing until the OOM killer fired. ## Fix Replace `estimatedMemoryMB()` with `runtime.ReadMemStats` so all data structures are automatically counted: ```go func (s *PacketStore) estimatedMemoryMB() float64 { if s.memoryEstimator != nil { return s.memoryEstimator() } var ms runtime.MemStats runtime.ReadMemStats(&ms) return float64(ms.HeapAlloc) / 1048576.0 } ``` Replace the eviction simulation loop (which re-used the same wrong formula) with a proportional calculation: if heap is N× over budget, evict enough packets to keep `(1/N) × 0.9` of the current count. The 0.9 factor adds a 10% buffer so the next ingest cycle doesn't immediately re-trigger. All major data structures (distHops, distPaths, spIndex) scale with packet count, so removing a fraction of packets frees roughly the same fraction of total heap. ## Testing - Updated `TestEvictStale_MemoryBasedEviction` to inject a deterministic estimator via the new `memoryEstimator` field. - Added `TestEvictStale_MemoryBasedEviction_UnderestimatedHeap`: verifies that when actual heap is 5× over limit (the production failure scenario), eviction correctly removes ~80%+ of packets. ``` === RUN TestEvictStale_MemoryBasedEviction [store] Evicted 538 packets (1076 obs) --- PASS === RUN TestEvictStale_MemoryBasedEviction_UnderestimatedHeap [store] Evicted 820 packets (1640 obs) --- PASS ``` Full suite: `go test ./...` — ok (10.3s) ## Perf note `runtime.ReadMemStats` runs once per eviction tick (every 60 s) and once per `/api/perf/store` call. Cost is negligible. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
cbfce41d7e |
perf: optimize neighbor graph build (3 fixes for 30s+ CPU) (#562)
## Summary Fixes critical performance issue in neighbor graph computation that consumed 65% of CPU (30+ seconds) on a 325K packet dataset. ## Changes ### Fix 1: Cache strings.ToLower results - Added cachedToLower() helper that caches lowercased strings in a local map - Pubkeys repeat across hundreds of thousands of observations - Pre-computes fromLower once per transaction instead of once per observation - **Impact:** Eliminates ~8.4s (25.3% CPU) ### Fix 2: Cache parsed DecodedJSON via StoreTx.ParsedDecoded() - Added ParsedDecoded() method on StoreTx using sync.Once for thread-safe lazy caching - json.Unmarshal on decoded_json now runs at most once per packet lifetime - Result reused by extractFromNode, indexByNode, trackAdvertPubkey - **Impact:** Eliminates ~8.8s (26.3% CPU) ### Fix 3: Extend neighbor graph TTL from 60s to 5 minutes - The graph depends on traffic patterns, not individual packets - Reduces rebuild frequency 5x - **Impact:** ~80% reduction in sustained CPU from graph rebuilds ## Tests - 7 new tests added, all 26+ existing neighbor graph tests pass - BenchmarkBuildFromStore: 727us/op, 237KB/op, 6030 allocs/op Related: #559 --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: you <you@example.com> |
||
|
|
1e1c4cb91f |
fix: include resolved_path in groupByHash packet response
QueryGroupedPackets builds its map manually and was missing resolved_path. The non-grouped path (txToMap) included it. |
||
|
|
0c340e1eb6 |
fix: set hasResolvedPath flag after ensuring column exists
detectSchema() runs at DB open time before ensureResolvedPathColumn() adds the column during Load(). On first run (or any run where the column was just added), hasResolvedPath stayed false, causing Load() to skip reading resolved_path from SQLite. This forced a full backfill of all observations on every restart, burning CPU for minutes on large DBs. Fix: set hasResolvedPath = true after ensureResolvedPathColumn succeeds. |
||
|
|
ae38cdefb4 |
feat: server-side hop resolution at ingest — resolved_path (#556)
## Summary Implements server-side hop prefix resolution at ingest time with a persisted neighbor graph. Hop prefixes in `path_json` are now resolved to full 64-char pubkeys at ingest and stored as `resolved_path` on each observation, eliminating the need for client-side resolution via `HopResolver`. Fixes #555 ## What changed ### New file: `cmd/server/neighbor_persist.go` SQLite persistence layer for the neighbor graph and resolved paths: - `neighbor_edges` table creation and management - Load/build/persist neighbor edges from/to SQLite - `resolved_path` column migration on observations - `resolvePathForObs()` — resolves hop prefixes using `resolveWithContext` with 4-tier priority (affinity → geo → GPS → first match) - Cold startup backfill for observations missing `resolved_path` - Async persistence of edges and resolved paths during ingest (non-blocking) ### Modified: `cmd/server/store.go` - `StoreObs` gains `ResolvedPath []*string` field - `StoreTx` gains `ResolvedPath []*string` (cached from best observation) - `Load()` dynamically includes `resolved_path` in SQL query when column exists - `IngestNewFromDB()` resolves paths at ingest time and persists asynchronously - `pickBestObservation()` propagates `ResolvedPath` to transmission - `txToMap()` and `enrichObs()` include `resolved_path` in API responses - All 7 `pm.resolve()` call sites migrated to `pm.resolveWithContext()` with the persisted graph - Broadcast maps include `resolved_path` per observation ### Modified: `cmd/server/db.go` - `DB` struct gains `hasResolvedPath bool` flag - `detectSchema()` checks for `resolved_path` column existence - Graceful degradation when column is absent (test DBs, old schemas) ### Modified: `cmd/server/main.go` - Startup sequence: ensure tables → load/build graph → backfill resolved paths → re-pick best observations ### Modified: `cmd/server/routes.go` - `mapSliceToTransmissions()` and `mapSliceToObservations()` propagate `resolved_path` - Node paths handler uses `resolveWithContext` with graph ### Modified: `cmd/server/types.go` - `TransmissionResp` and `ObservationResp` gain `ResolvedPath []*string` with `omitempty` ### New file: `cmd/server/neighbor_persist_test.go` 16 tests covering: - Path resolution (unambiguous, empty, unresolvable prefixes) - Marshal/unmarshal of resolved_path JSON - SQLite table creation and column migration (idempotent) - Edge persistence and loading - Schema detection - Full Load() with resolved_path - API response serialization (present when set, omitted when nil) ## Design decisions 1. **Async persistence** — resolved paths and neighbor edges are written to SQLite in a goroutine to avoid blocking the ingest loop. The in-memory state is authoritative. 2. **Schema compatibility** — `DB.hasResolvedPath` flag allows the server to work with databases that don't yet have the `resolved_path` column. SQL queries dynamically include/exclude the column. 3. **`pm.resolve()` retained** — Not removed as dead code because existing tests use it directly. All production call sites now use `resolveWithContext` with the persisted graph. 4. **Edge persistence is conservative** — Only unambiguous edges (single candidate) are persisted to `neighbor_edges`. Ambiguous prefixes are handled by the in-memory `NeighborGraph` via Jaccard disambiguation. 5. **`null` = unresolved** — Ambiguous prefixes store `null` in the resolved_path array. Frontend falls back to prefix display. ## Performance - `resolveWithContext` per hop: ~1-5μs (map lookups, no DB queries) - Typical packet has 0-5 hops → <25μs total resolution overhead per packet - Edge/path persistence is async → zero impact on ingest latency - Backfill is one-time on first startup with the new column ## Test results ``` cd cmd/server && go test ./... -count=1 → ok (4.4s) cd cmd/ingestor && go test ./... -count=1 → ok (25.5s) ``` --------- Co-authored-by: you <you@example.com> |
||
|
|
43673e86f2 |
fix: perf stats MaxMB reads from config instead of hardcoded 1024 (#558)
Perf stats `GetPerfStoreStats` returned a hardcoded `MaxMB: 1024` regardless of the configured `packetStore.maxMemoryMB`. Now reads from `s.maxMemoryMB`. Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
81ef51cc5c |
fix: debounce distance index rebuild to prevent CPU hot loop (#557)
## Problem On busy meshes (325K+ transmissions, 50 observers), the distance index rebuild runs on **every ingest poll** (~1s interval), computing haversine distances for 1M+ hop records. Each rebuild takes 2-3 seconds but new observations arrive faster than it can finish, creating a CPU hot loop that starves the HTTP server. Discovered on the Cascadia Mesh instance where `corescope-server` was consuming 15 minutes of CPU time in 10 minutes of uptime, the API was completely unresponsive, and health checks were timing out. ### Server logs showing the hot loop: ``` [store] Built distance index: 1797778 hop records, 207072 path records [store] Built distance index: 1797806 hop records, 207075 path records [store] Built distance index: 1797811 hop records, 207075 path records [store] Built distance index: 1797820 hop records, 207075 path records ``` Every 2 seconds, nonstop. ## Root Cause `IngestNewObservations` calls `buildDistanceIndex()` synchronously whenever `pickBestObservation` selects a longer path. With 50 observers sending observations every second, paths change on nearly every poll cycle, triggering a full rebuild each time. ## Fix - Mark distance index dirty on path changes instead of rebuilding inline - Rebuild at most every **30 seconds** (configurable via `distLast` timer) - Set `distLast` after initial `Load()` to prevent immediate re-rebuild on first ingest - Distance data is at most 30s stale — acceptable for an analytics view ## Testing - `go build`, `go vet`, `go test` all pass - No behavioral change for the initial load or the analytics API response shape - Distance data freshness goes from real-time to 30s max staleness --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: you <you@example.com> |
||
|
|
9a39198d92 |
fix: only count repeaters in hash collision analysis (#441) (#548)
Fixes #441 ## Summary Hash collision analysis was including ALL node types, inflating collision counts with irrelevant data. Per MeshCore firmware analysis, **only repeaters matter for collision analysis** — they're the only role that forwards packets and appears in routing `path[]` arrays. ## Root Causes Fixed 1. **`hash_size==0` nodes counted in all buckets** — nodes with unknown hash size were included via `cn.HashSize == bytes || cn.HashSize == 0`, polluting every bucket 2. **Non-repeater roles included** — companions, rooms, sensors, and observers were counted even though their hash collisions never cause routing ambiguity ## Fix Changed `computeHashCollisions()` filter from: ```go // Before: include everything except companions if cn.HashSize == bytes && cn.Role != "companion" { ``` To: ```go // After: only include repeaters (per firmware analysis) if cn.HashSize == bytes && cn.Role == "repeater" { ``` ## Why only repeaters? From [MeshCore firmware analysis](https://github.com/Kpa-clawbot/CoreScope/issues/441#issuecomment-4185218547): - Only repeaters override `allowPacketForward()` to return `true` - Only repeaters append their hash to `path[]` during relay - Companions, rooms, sensors, observers never forward packets - Cross-role collisions are benign (companion silently drops, real repeater still forwards) ## Tests - `TestHashCollisionsOnlyRepeaters` — verifies companions, rooms, sensors, and hash_size==0 nodes are all excluded --------- Co-authored-by: you <you@example.com> |
||
|
|
59bff5462c |
fix: rate-limit cache invalidation to prevent 0% hit rate (#533) (#546)
## Summary Fixes #533 — server cache hit rate always 0%. ## Root Cause `invalidateCachesFor()` is called at the end of every `IngestNewFromDB()` and `IngestNewObservations()` cycle (~2-5s). Since new data arrives continuously, caches are cleared faster than any analytics request can hit them, resulting in a permanent 0% cache hit rate. The cache TTL (15s/60s) is irrelevant because entries are evicted by invalidation long before they expire. ## Fix Rate-limit cache invalidation with a 10-second cooldown: - First call after cooldown goes through immediately - Subsequent calls during cooldown accumulate dirty flags in `pendingInv` - Next call after cooldown merges pending + current flags and applies them - Eviction bypasses cooldown (data removal requires immediate clearing) Analytics data may be at most ~10s stale, which is acceptable for a dashboard. ## Changes - **`store.go`**: Added `lastInvalidated`, `pendingInv`, `invCooldown` fields. Refactored `invalidateCachesFor()` to rate-limit non-eviction invalidation. Extracted `applyCacheInvalidation()` helper. - **`cache_invalidation_test.go`**: Added 4 new tests: - `TestInvalidationRateLimited` — verifies caches survive during cooldown - `TestInvalidationCooldownAccumulatesFlags` — verifies flag merging - `TestEvictionBypassesCooldown` — verifies eviction always clears immediately - `BenchmarkCacheHitDuringIngestion` — confirms 100% hit rate during rapid ingestion (was 0%) ## Perf Proof ``` BenchmarkCacheHitDuringIngestion-16 3467889 1018 ns/op 100.0 hit% ``` Before: 0% hit rate under continuous ingestion. After: 100% hit rate during cooldown periods. Co-authored-by: you <you@example.com> |
||
|
|
8c1cd8a9fe |
perf: track advert pubkeys incrementally, eliminate per-request JSON parsing (#360) (#544)
## Summary `GetPerfStoreStats()` and `GetPerfStoreStatsTyped()` iterated **all** ADVERT packets and called `json.Unmarshal` on each one — under a read lock — on every `/api/perf` and `/api/health` request. With 5K+ adverts, each health check triggered thousands of JSON parses. ## Fix Added a refcounted `advertPubkeys map[string]int` to `PacketStore` that tracks distinct pubkeys incrementally during `Load()`, `IngestNewFromDB()`, and eviction. The perf/health handlers now just read `len(s.advertPubkeys)` — O(1) with zero allocations. ## Benchmark Results (5K adverts, 200 distinct pubkeys) | Method | ns/op | allocs/op | |--------|-------|-----------| | `GetPerfStoreStatsTyped` | **78** | **0** | | `GetPerfStoreStats` | **2,565** | **9** | Before this change, both methods performed O(N) JSON unmarshals per call. ## Tests Added - `TestAdvertPubkeyTracking` — verifies incremental tracking through add/evict lifecycle - `TestAdvertPubkeyPublicKeyField` — covers the `public_key` JSON field variant - `TestAdvertPubkeyNonAdvert` — ensures non-ADVERT packets don't affect count - `BenchmarkGetPerfStoreStats` — 5K adverts benchmark - `BenchmarkGetPerfStoreStatsTyped` — 5K adverts benchmark Fixes #360 --------- Co-authored-by: you <you@example.com> |
||
|
|
9b9f396af5 |
perf: replace O(n²) observation dedup with map-based O(n) (#355) (#543)
## Summary Fixes #355 — replaces O(n²) observation dedup in `Load()`, `IngestNewFromDB()`, and `IngestNewObservations()` with an O(1) map-based lookup. ## Changes - Added `obsKeys map[string]bool` field to `StoreTx` for O(1) dedup keyed on `observerID + "|" + pathJSON` - Replaced all 3 linear-scan dedup sites in `store.go` with map lookups - Lazy-init `obsKeys` for transmissions created before this change (in `IngestNewFromDB` and `IngestNewObservations`) - Added regression test (`TestObsDedupCorrectness`) verifying dedup correctness - Added nil-map safety test (`TestObsDedupNilMapSafety`) - Added benchmark comparing map vs linear scan ## Benchmark Results (ARM64, 16 cores) | Observations | Map (O(1)) | Linear (O(n)) | Speedup | |---|---|---|---| | 10 | 34 ns/op | 41 ns/op | 1.2x | | 50 | 34 ns/op | 186 ns/op | 5.5x | | 100 | 34 ns/op | 361 ns/op | 10.6x | | 500 | 34 ns/op | 4,903 ns/op | **146x** | Map lookup is constant time regardless of observation count. The linear scan degrades quadratically — at 500 observations per transmission (realistic for popular packets seen by many observers), the old code is 146x slower per dedup check. All existing tests pass. --------- Co-authored-by: you <you@example.com> |
||
|
|
b472c8de30 |
perf: replace O(n²) selection sort with sort.Slice (#354) (#542)
## Summary Fixes #354 Replaces the O(n²) selection sort in `sortedCopy()` with Go's built-in `sort.Float64s()` (O(n log n)). ## Changes - **`cmd/server/routes.go`**: Replaced manual nested-loop selection sort with `sort.Float64s(cp)` - **`cmd/server/helpers_test.go`**: Added regression test with 1000-element random input + benchmark ## Benchmark Results (ARM64) ``` BenchmarkSortedCopy/n=256 ~16μs/op 1 alloc BenchmarkSortedCopy/n=1000 ~95μs/op 1 alloc BenchmarkSortedCopy/n=10000 ~1.3ms/op 1 alloc ``` With the old O(n²) sort, n=10000 would take ~50ms+. The new implementation scales as O(n log n). ## Testing - All existing `TestSortedCopy` tests pass (unchanged behavior) - New `TestSortedCopyLarge` validates correctness on 1000 random elements - `go test ./...` passes in `cmd/server` Co-authored-by: you <you@example.com> |
||
|
|
709e5a4776 |
fix: observer filter drops groups in grouped packets view (#464) (#531)
## Summary - When `groupByHash=true`, each group only carries its representative (best-path) `observer_id`. The client-side filter was checking only that field, silently dropping groups that were seen by the selected observer but had a different representative. - `loadPackets` now passes the `observer` param to the server so `filterPackets`/`buildGroupedWhere` do the correct "any observation matches" check. - Client-side observer filter in `renderTableRows` is skipped for grouped mode (server already filtered correctly). - Both `db.go` and `store.go` observer filtering extended to support comma-separated IDs (multi-select UI). ## Test plan - [ ] Set an observer filter on the Packets screen with grouping enabled — all groups that have **any** observation from the selected observer(s) should appear, not just groups where that observer is the representative - [ ] Multi-select two observers — groups seen by either should appear - [ ] Toggle to flat (ungrouped) mode — per-observation filter still works correctly - [ ] Existing grouped packets tests pass: `cd cmd/server && go test ./...` Fixes #464 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: you <you@example.com> |
||
|
|
54fab0551e |
fix: add home defaults to server theme config (#525) (#526)
## Summary Fixes #525 — Customizer v2 home section shows empty fields and adding FAQ kills steps. ## Root Cause Server returned `home: null` from `/api/config/theme` when no home config existed in config.json or theme.json. The customizer had no built-in defaults, so all home fields appeared empty. When a user added a single override (e.g. FAQ), `computeEffective` started from `home: null`, created `home: {}`, and only applied the user's override — wiping steps and everything else. ## Fix ### Server-side (primary) In `handleConfigTheme()`, replaced the conditional `home` assignment with `mergeMap` using built-in defaults matching what `home.js` hardcodes: - `heroTitle`: "CoreScope" - `heroSubtitle`: "Real-time MeshCore LoRa mesh network analyzer" - `steps`: 4 default getting-started steps - `footerLinks`: Packets + Network Map links Config/theme overrides merge on top, so customization still works. ### Client-side (defense-in-depth) Added `DEFAULT_HOME` constant in `customize-v2.js`. `computeEffective()` now falls back to these defaults when server returns `home: null`, ensuring the customizer works even without server defaults. ## Tests - **Go**: `TestConfigThemeHomeDefaults` — verifies `/api/config/theme` returns non-null home with heroTitle, steps, footerLinks when no config is set - **JS**: Two new tests in `test-frontend-helpers.js` — verifies `computeEffective` provides defaults when home is null, and that user overrides merge correctly with defaults Co-authored-by: you <you@example.com> |
||
|
|
0e1beac52f |
fix: neighbor affinity graph empty results + performance + accessibility (#523) (#524)
## Summary Fixes the neighbor affinity graph returning empty results despite abundant ADVERT data in the store. **Root cause:** `extractFromNode()` in `neighbor_graph.go` only checked for `"from_node"` and `"from"` fields in the decoded JSON, but real ADVERT packets store the originator public key as `"pubKey"`. This meant `fromNode` was always empty, so: - Zero-hop edges (originator↔observer) were never created - Originator↔path[0] edges were never created - Only observer↔path[last] edges could be created (and only for non-empty paths) **Fix:** Check `"pubKey"` first in `extractFromNode()`, then fall through to `"from_node"` and `"from"` for other packet types. ## Bugs Fixed | Bug | Issue | Fix | |-----|-------|-----| | Empty graph results | #522 | `extractFromNode()` now reads `pubKey` field from ADVERTs | | 3-4s response time | #523 comment | Graph was rebuilding correctly with 60s TTL cache — the slow response was due to iterating all packets finding zero matches. With edges now being found, the cache works as designed. | | Incomplete visualization | #523 comment | Downstream of bug 1+2 — fixed by fixing the builder | | Accessibility | #523 comment | Added text-based neighbor list, dynamic aria-label, keyboard focus CSS, dashed lines for ambiguous edges, confidence symbols | ## Changes - **`cmd/server/neighbor_graph.go`** — Fixed `extractFromNode()` to check `pubKey` field (real ADVERT format) - **`cmd/server/neighbor_graph_test.go`** — Added 2 new tests: `TestBuildNeighborGraph_AdvertPubKeyField` (real ADVERT format) and `TestBuildNeighborGraph_OneByteHashPrefixes` (1-byte prefix collision scenario) - **`public/analytics.js`** — Added accessible text-based neighbor list, dynamic aria-label, dashed line pattern for ambiguous edges - **`public/style.css`** — Added `:focus-visible` keyboard focus indicator for canvas ## Testing All Go tests pass (`go test ./... -count=1`). New tests verify the fix prevents regression. Fixes #523, Fixes #522 --------- Co-authored-by: you <you@example.com> |
||
|
|
34489e0446 |
fix: customizer v2 — phantom overrides, missing defaults, stale dark mode (#518) (#520)
Fixes #518, Fixes #514, Fixes #515, Fixes #516 ## Summary Fixes all customizer v2 bugs from the consolidated tracker (#518). Both server and client changes. ## Server Changes (`routes.go`) - **typeColors defaults** — added all 10 type color defaults matching `roles.js` `TYPE_COLORS`. Previously returned `{}`, causing all type colors to render as black. - **themeDark defaults** — added 22 dark mode color defaults matching the Default preset. Previously returned `{}`, causing dark mode to have no server-side defaults. ## Client Changes (`customize-v2.js`) - [x] **P0: Phantom override cleanup on init** — new `_cleanPhantomOverrides()` runs on startup, scanning `cs-theme-overrides` and removing any values that match server defaults (arrays via `JSON.stringify`, scalars via `===`). - [x] **P1: `setOverride` auto-prunes matching defaults** — after debounced write, iterates the delta and removes any key whose value matches the server default. Prevents phantom overrides from accumulating. - [x] **P1: `_countOverrides` counts only real diffs** — now iterates keys and calls `_isOverridden()` instead of blindly counting `Object.keys().length`. Badge count reflects actual overrides only. - [x] **P1: `_isOverridden` handles arrays/objects** — uses `JSON.stringify` comparison for non-scalar values (home.steps, home.checklist, etc.). - [x] **P1: Type color fallback** — `_renderNodes()` falls back to `window.TYPE_COLORS` when effective typeColors are empty, preventing black color swatches. - [x] **P1: Dark/light toggle re-renders panel** — MutationObserver on `data-theme` now calls `_refreshPanel()` when panel is open, so switching modes updates the Theme tab immediately. ## Tests 6 new unit tests added to `test-customizer-v2.js`: - Phantom scalar overrides cleaned on init - Phantom array overrides cleaned on init - Real overrides preserved after cleanup - `isOverridden` handles matching arrays (returns false) - `isOverridden` handles differing arrays (returns true) - `setOverride` prunes value matching server default All 48 tests pass. Go tests pass. --------- Co-authored-by: you <you@example.com> |
||
|
|
58f791266d |
feat: affinity debugging tools (#482) — milestone 6 (#521)
## Summary Milestone 6 of #482: Observability & Debugging tools for the neighbor affinity system. These tools exist because someone will need them at 3 AM when "Show Neighbors is showing the wrong node for C0DE" and they have 5 minutes to diagnose it. ## Changes ### 1. Debug API — `GET /api/debug/affinity` - Full graph state dump: all edges with weights, observation counts, last-seen timestamps - Per-prefix resolution log with disambiguation reasoning (Jaccard scores, ratios, thresholds) - Query params: `?prefix=C0DE` filter to specific prefix, `?node=<pubkey>` for specific node's edges - Protected by API key (same auth as `/api/admin/prune`) - Response includes: edge count, node count, cache age, last rebuild time ### 2. Debug Overlay on Map - Toggle-able checkbox "🔍 Affinity Debug" in map controls - Draws lines between nodes showing affinity edges with color coding: - Green = high confidence (score ≥ 0.6) - Yellow = medium (0.3–0.6) - Red = ambiguous (< 0.3) - Line thickness proportional to weight, dashed for ambiguous - Unresolved prefixes shown as ❓ markers - Click edge → popup with observation count, last seen, score, observers - Hidden behind `debugAffinity` config flag or `localStorage.setItem('meshcore-affinity-debug', 'true')` ### 3. Per-Node Debug Panel - Expandable "🔍 Affinity Debug" section in node detail page (collapsed by default) - Shows: neighbor edges table with scores, prefix resolutions with reasoning trace - Candidates table with Jaccard scores, highlighting the chosen candidate - Graph-level stats summary ### 4. Server-Side Structured Logging - Integrated into `disambiguate()` — logs every resolution decision during graph build - Format: `[affinity] resolve C0DE: c0dedad4 score=47 Jaccard=0.82 vs c0dedad9 score=3 Jaccard=0.11 → neighbor_affinity (ratio 15.7×)` - Logs ambiguous decisions: `scores too close (12 vs 9, ratio 1.3×) → ambiguous` - Gated by `debugAffinity` config flag ### 5. Dashboard Stats Widget - Added to analytics overview tab when debug mode is enabled - Metrics: total edges/nodes, resolved/ambiguous counts (%), avg confidence, cold-start coverage, cache age, last rebuild ## Files Changed - `cmd/server/neighbor_debug.go` — new: debug API handler, resolution builder, cold-start coverage - `cmd/server/neighbor_debug_test.go` — new: 7 tests for debug API - `cmd/server/neighbor_graph.go` — added structured logging to disambiguate(), `logFn` field, `BuildFromStoreWithLog` - `cmd/server/neighbor_api.go` — pass debug flag through `BuildFromStoreWithLog` - `cmd/server/config.go` — added `DebugAffinity` config field - `cmd/server/routes.go` — registered `/api/debug/affinity` route, exposed `debugAffinity` in client config - `cmd/server/types.go` — added `DebugAffinity` to `ClientConfigResponse` - `public/map.js` — affinity debug overlay layer with edge visualization - `public/nodes.js` — per-node affinity debug panel - `public/analytics.js` — dashboard stats widget - `test-e2e-playwright.js` — 3 Playwright tests for debug UI ## Tests - ✅ 7 Go unit tests (API shape, prefix/node filters, auth, structured logging, cold-start coverage) - ✅ 3 Playwright E2E tests (overlay checkbox, toggle without crash, panel expansion) - ✅ All existing tests pass (`go test ./cmd/server/... -count=1`) Part of #482 --------- Co-authored-by: you <you@example.com> |
||
|
|
5151030697 |
feat: affinity-aware hop resolution (#482) — milestone 4 (#511)
## Summary Milestone 4 of #482: adds affinity-aware hop resolution to improve disambiguation accuracy across all hop resolution in the app. ### What changed **Backend — `prefixMap.resolveWithContext()` (store.go)** New method that applies a 4-tier disambiguation priority when multiple nodes match a hop prefix: | Priority | Strategy | When it wins | |----------|----------|-------------| | 1 | **Affinity graph score** | Neighbor graph has data, score ratio ≥ 3× runner-up | | 2 | **Geographic proximity** | Context nodes have GPS, pick closest candidate | | 3 | **GPS preference** | At least one candidate has coordinates | | 4 | **First match** | No signal — current naive fallback | The existing `resolve()` method is unchanged for backward compatibility. New callers that have context (originator, observer, adjacent hops) can use `resolveWithContext()` for better results. **API — `handleResolveHops` (routes.go)** Enhanced `/api/resolve-hops` endpoint: - New query params: `from_node`, `observer` — provide context for affinity scoring - New response fields on `HopCandidate`: `affinityScore` (float, 0.0–1.0) - New response fields on `HopResolution`: `bestCandidate` (pubkey when confident), `confidence` (one of `unique_prefix`, `neighbor_affinity`, `ambiguous`) - Backward compatible: without context params, behavior is identical to before (just adds `confidence` field) **Types (types.go)** - `HopCandidate.AffinityScore *float64` - `HopResolution.BestCandidate *string` - `HopResolution.Confidence string` ### Tests - 7 unit tests for `resolveWithContext` covering all 4 priority tiers + edge cases - 2 unit tests for `geoDistApprox` - 4 API tests for enhanced `/api/resolve-hops` response shape - All existing tests pass (no regressions) ### Impact This improves ALL hop resolution across the app — analytics, route display, subpath analysis, and any future feature that resolves hop prefixes. The affinity graph (from M1/M2) now feeds directly into disambiguation decisions. Part of #482 --------- Co-authored-by: you <you@example.com> |
||
|
|
e66085092e |
feat: neighbor affinity API endpoints (#482) — milestone 2 (#508)
## Summary Milestone 2 of the neighbor affinity graph (#482). Adds two API endpoints that expose the neighbor graph built in M1 (PR #507). ### Endpoints #### `GET /api/nodes/{pubkey}/neighbors` Returns neighbors for a specific node with affinity scores. **Query params:** `min_count` (default 1), `min_score` (default 0.0), `include_ambiguous` (default true) **Response shape:** ```json { "node": "pubkey", "neighbors": [ { "pubkey": "...", "prefix": "BB", "name": "...", "role": "repeater", "count": 847, "score": 0.95, "first_seen": "...", "last_seen": "...", "avg_snr": -8.2, "observers": ["obs1"], "ambiguous": false } ], "total_observations": 847 } ``` Ambiguous entries have `candidates` array; unresolved prefixes have `unresolved: true`. #### `GET /api/analytics/neighbor-graph` Returns full graph summary for analytics/visualization. **Query params:** `min_count` (default 5), `min_score` (default 0.1), `region` (IATA code filter) **Response shape:** ```json { "nodes": [{ "pubkey": "...", "name": "...", "role": "...", "neighbor_count": 5 }], "edges": [{ "source": "...", "target": "...", "weight": 847, "score": 0.95, "ambiguous": false }], "stats": { "total_nodes": 42, "total_edges": 87, "ambiguous_edges": 3, "avg_cluster_size": 4.2 } } ``` ### Wiring - `NeighborGraph` + `neighborMu` added to `Server` struct - Lazy initialization: graph built on first API call, cached with 60s TTL - Node name/role lookups via existing `getCachedNodesAndPM()` - Region filtering via existing `resolveRegionObservers()` ### Tests (15 tests) - Empty graph, single neighbor, multiple neighbors (sorted by score) - Ambiguous candidates with candidate list - Unresolved prefix (orphan) with `unresolved: true` - `min_count` filter, `min_score` filter, `include_ambiguous=false` filter - Unknown node returns 200 with empty neighbors - Graph endpoint: empty, with edges, default min_count, ambiguous count - Region filter (graceful when no store) - Response shape validation (all required keys present) All existing tests continue to pass. Part of #482 --------- Co-authored-by: you <you@example.com> |
||
|
|
4a56be0b48 |
feat: neighbor affinity graph builder (#482) — milestone 1 (#507)
## Summary Milestone 1 of 7 for the neighbor affinity graph feature (#482). Implements the core `NeighborGraph` data structure and `BuildFromStore()` algorithm. **Spec:** `docs/specs/neighbor-affinity-graph.md` on `spec/482-neighbor-affinity` branch. ## What's Built ### `cmd/server/neighbor_graph.go` - **`NeighborGraph` struct** — thread-safe (sync.RWMutex) in-memory graph with edge map and per-node index - **`BuildFromStore(*PacketStore)`** — iterates all packets/observations to extract first-hop edges: - `originator ↔ path[0]` for ADVERT packets only (originator identity known) - `observer ↔ path[last]` for ALL packet types - Zero-hop ADVERTs: `originator ↔ observer` direct edge - **Affinity scoring** — `score = min(1.0, count/100) × exp(-λ × hours)` with 7-day half-life - **Jaccard disambiguation** — resolves ambiguous hash prefixes using mutual-neighbor overlap - **Confidence threshold** — auto-resolve only when best ≥ 3× second-best AND ≥ 3 observations - **Transitivity poisoning guard** — only fully-resolved edges used as evidence - **Orphan prefix handling** — unknown prefixes stored as unresolved markers - **Cache management** — 60s TTL, `IsStale()` check for rebuild triggering ### `cmd/server/neighbor_graph_test.go` 22 unit tests covering all spec requirements: | Test | What it validates | |------|-------------------| | EmptyStore | Empty graph from empty store | | AdvertSingleHopPath | Both edge types from single-hop ADVERT | | AdvertMultiHopPath | originator↔path[0] + observer↔path[last] | | AdvertZeroHop | Direct originator↔observer edge | | NonAdvertEmptyPath | No edges from non-ADVERT empty path | | NonAdvertOnlyObserverEdge | Only observer↔last_hop for non-ADVERTs | | NonAdvertSingleHop | observer↔path[0] only | | HashCollision | Ambiguous edge with candidates | | JaccardScoring | Jaccard coefficient computation | | ConfidenceAutoResolve | Auto-resolve when ratio ≥ 3× | | EqualScoresAmbiguous | Remains ambiguous with equal scores | | ObserverSelfEdgeGuard | No self-edges | | OrphanPrefix | Unresolved prefix handling | | AffinityScore_Fresh | Score ≈ 1.0 for fresh high-count | | AffinityScore_Decayed | Score ≈ 0.5 at 7-day half-life | | AffinityScore_LowCount | Score ≈ 0.05 for count=5 | | AffinityScore_StaleAndLow | Score ≈ 0 for old low-count | | CountAccumulation | 5 observations → count=5 | | MultipleObservers | Observer set tracks all witnesses | | TimeDecayOldObservations | Month-old edge scores very low | | ADVERTOnlyConstraint | Non-ADVERTs don't create originator edges | | CacheTTL | Stale detection works correctly | ## Not in scope (future milestones) - API endpoints (M2) - Frontend integration (M3-M5) - Debug tools (M6) - Analytics visualization (M7) Part of #482 --------- Co-authored-by: you <you@example.com> |
||
|
|
b1d89d7d9f |
fix: apply region filter in GetNodes — was silently ignored (#496) (#497)
## Summary - `db.GetNodes` accepted a `region` param from the HTTP handler but never used it — every region-filter selection was silently ignored and all nodes were always returned - Added a subquery filtering `nodes.public_key` against ADVERT transmissions (payload_type=4) observed by observers with matching IATA codes - Handles both v2 (`observer_id TEXT`) and v3 (`observer_idx INT`) schemas ## Test plan - [x] 4 new subtests added to `TestGetNodesFiltering`: SJC (1 node), SFO (1 node), SJC,SFO multi (1 node deduped), AMS unknown (0 nodes) - [x] All existing Go tests still pass - [x] Deploy to staging, open `/nodes`, select a region in the filter bar — only nodes observed by observers in that region should appear Closes #496 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: you <you@example.com> |
||
|
|
c173ab7e80 |
perf: skip JSON parse in indexByNode when no pubkey fields present (#376) (#499)
## Summary - `indexByNode` was calling `json.Unmarshal` for every packet during `Load()` and `IngestNewFromDB()`, even channel messages and other payloads that can never contain node pubkey fields - All three target fields (`"pubKey"`, `"destPubKey"`, `"srcPubKey"`) share the common substring `"ubKey"` — added a `strings.Contains` pre-check that skips the JSON parse entirely for packets that don't match - At 30K+ packets on startup, this eliminates the majority of `json.Unmarshal` calls in `indexByNode` (channel messages, status packets, etc. all bypass it) ## Test plan - [x] 5 new subtests in `TestIndexByNodePreCheck`: ADVERT with pubKey indexed, destPubKey indexed, channel message skipped, empty JSON skipped, duplicate hash deduped - [x] All existing Go tests pass - [x] Deploy to staging and verify node-filtered packet queries still work correctly Closes #376 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: you <you@example.com> |
||
|
|
4664c90db4 |
fix: skip zero-hop adverts when checking node hash size (#493)
Fixes issue router IDs flapping between 1byte and multi-byte as described in https://github.com/Kpa-clawbot/CoreScope/issues/303 with a minimal patch + test coverage. This fix is critical for regions using multi-byte IDs. Closes https://github.com/Kpa-clawbot/CoreScope/issues/303 --------- Co-authored-by: you <you@example.com> |
||
|
|
2755dc3875 |
test: push ingestor coverage from 70% to 84% (#344) (#492)
## Summary Push Go ingestor test coverage from **70.2% → 84.0%** (92.8% excluding the untestable `main()` and `init()` functions). Part of #344 — ingestor coverage ## What Changed Added `coverage_boost_test.go` with 60+ new test functions covering previously untested code paths: ### Coverage Before → After by Function | Function | Before | After | |----------|--------|-------| | `NodeDaysOrDefault` | 0% | 100% | | `MoveStaleNodes` | 0% | 76.5% | | `NodePassesGeoFilter` | 40% | 100% | | `handleMessage` | 41.4% | 92.1% | | `ResolvedSources` | 71.4% | 100% | | `extractObserverMeta` | 100% | 100% | | `decodeAdvert` | 88.2% | 94.1% | | `decryptChannelMessage` | 88.4% | 93.0% | | **Total** | **70.2%** | **84.0%** | ### Test Categories Added - **Config**: `NodeDaysOrDefault` all branches, broker scheme normalization (`mqtt://` → `tcp://`, `mqtts://` → `ssl://`) - **Database**: `MoveStaleNodes` (stale/fresh/replace), duplicate transmission handling, default timestamps, telemetry updates, schema migration verification - **Decoder**: Sensor telemetry parsing, location + features with truncated data, `countNonPrintable` with invalid UTF-8, `decryptChannelMessage` error paths (invalid key/MAC/ciphertext/alignment), short payload handling - **Geo Filter**: All branches (nil filter, nil coords, inside/outside) - **Message Handler**: Channel messages (with/without sender, empty text), direct messages, geo-filtered adverts, corrupted adverts (all-zero pubkey), non-advert packets, `Score`/`Direction` case-insensitive fallbacks, status messages with full hardware metadata ### Why Not 90%+ The remaining ~16% uncovered statements are: - `main()` function (68 blocks) — program entry point with MQTT client setup, signal handling, goroutines — not unit-testable without major refactoring - `init()` function — `--version` flag + `os.Exit(0)` — kills the test process - `prepareStatements()` error returns — only trigger on corrupted/incompatible SQLite databases - `applySchema()` migration error paths — only trigger on filesystem/SQLite failures Excluding `main()` and `init()`, effective coverage is **92.8%**. ## Test Results All 100+ tests pass (existing + new): ``` ok github.com/corescope/ingestor 25.945s coverage: 84.0% of statements ``` --------- Co-authored-by: you <you@example.com> |
||
|
|
a45ac71508 |
fix: restore color-coded hex breakdown in packet detail (#329) (#500)
## Summary
- `BuildBreakdown` was never ported from the deleted Node.js
`decoder.js` to Go — the server has returned `breakdown: {}` since the
Go migration (commit `742ed865`), so `createColoredHexDump()` and
`buildHexLegend()` in the frontend always received an empty `ranges`
array and rendered everything as monochrome
- Implemented `BuildBreakdown()` in `decoder.go` — computes labeled byte
ranges matching the frontend's `LABEL_CLASS` map: `Header`, `Transport
Codes`, `Path Length`, `Path`, `Payload`; ADVERT packets get sub-ranges:
`PubKey`, `Timestamp`, `Signature`, `Flags`, `Latitude`, `Longitude`,
`Name`
- Wired into `handlePacketDetail` (was `struct{}{}`)
- Also adds per-section color classes to the field breakdown table
(`section-header`, `section-transport`, `section-path`,
`section-payload`) so the table rows get matching background tints
## Test plan
- [x] Open any packet detail pane — hex dump should show color-coded
sections (red header, orange path length, blue transport codes, green
path hops, yellow/colored payload)
- [x] Legend below action buttons should appear with color swatches
- [x] ADVERT packets: PubKey/Timestamp/Signature/Flags each get their
own distinct color
- [x] Field breakdown table section header rows should be tinted per
section
- [x] 8 new Go tests: all pass
Closes #329
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|
|
bf2e721dd7 |
feat: auto-inject cache busters at server startup — eliminates merge conflicts (#481)
## Problem Every PR that touches `public/` files requires manually bumping cache buster timestamps in `index.html` (e.g. `?v=1775111407`). Since all PRs change the same lines in the same file, this causes **constant merge conflicts** — it's been the #1 source of unnecessary PR friction. ## Solution Replace all hardcoded `?v=TIMESTAMP` values in `index.html` with a `?v=__BUST__` placeholder. The Go server replaces `__BUST__` with the current Unix timestamp **once at startup** when it reads `index.html`, then serves the pre-processed HTML from memory. Every server restart automatically picks up fresh cache busters — no manual intervention needed. ## What changed | File | Change | |------|--------| | `public/index.html` | All `v=1775111407` → `v=__BUST__` (28 occurrences) | | `cmd/server/main.go` | `spaHandler` reads index.html at init, replaces `__BUST__` with Unix timestamp, serves from memory for `/`, `/index.html`, and SPA fallback | | `cmd/server/helpers_test.go` | New `TestSpaHandlerCacheBust` — verifies placeholder replacement works for root, SPA fallback, and direct `/index.html` requests. Also added tests for root `/` and `/index.html` routes | | `AGENTS.md` | Rule 3 updated: cache busters are now automatic, agents should not manually edit them | ## Testing - `go build ./...` — compiles cleanly - `go test ./...` — all tests pass (including new cache-bust tests) - `node test-frontend-helpers.js && node test-packet-filter.js && node test-aging.js` — all frontend tests pass - No hardcoded timestamps remain in `index.html` --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: you <you@example.com> |
||
|
|
f9cfad9cd4 |
fix: update observer last_seen on packet ingestion (#479)
## Summary Related to #463 (partial fix — addresses packet path, status message path still needs investigation) — Observers incorrectly showing as offline despite actively forwarding packets. ## Root Cause Observer `last_seen` was only updated when status topic messages (`meshcore/<region>/<observer_id>/status`) were received via `UpsertObserver`. When packets were ingested from an observer, the observer's `last_seen` was **not** updated — only the `observer_idx` was resolved for the observation record. This meant observers with low traffic that published status messages less frequently than the 10-minute online threshold would appear offline on the observers page, even though they were clearly alive and forwarding packets. ## Changes **`cmd/ingestor/db.go`:** - Added `stmtUpdateObserverLastSeen` prepared statement: `UPDATE observers SET last_seen = ? WHERE rowid = ?` - In `InsertTransmission`, after resolving `observer_idx`, update the observer's `last_seen` to the packet timestamp - This ensures any observer actively forwarding traffic stays marked as online **`cmd/ingestor/db_test.go`:** - Added `TestInsertTransmissionUpdatesObserverLastSeen` — verifies that inserting a packet from an observer updates its `last_seen` from a backdated value to the packet timestamp ## Performance The added `UPDATE` is a single-row update by `rowid` (primary key) — O(1) with no index overhead. It runs once per packet insertion when an observer is resolved, which was already doing a `SELECT` by `rowid` anyway. No measurable impact on ingestion throughput. ## Test Results All existing tests pass: - `cmd/ingestor`: 26.6s ✅ - `cmd/server`: 3.7s ✅ --------- Co-authored-by: you <you@example.com> |
||
|
|
96d0bbe487 |
fix: replace Euclidean distance with haversine in analytics hop distances (#478)
## Summary Fixes #433 — Replace the inaccurate Euclidean distance approximation in `analytics.js` hop distances with proper haversine calculation, matching the server-side computation introduced in PR #415. ## Problem PR #415 moved collision analysis server-side and switched from the frontend's Euclidean approximation (`dLat×111, dLon×85`) to proper haversine. However, the **hop distance** calculation in `analytics.js` (subpath detail panel) still used the old Euclidean formula. This caused: - **Inconsistent distances** between hop distances and collision distances - **Significant errors at high latitudes** — e.g., Oslo→Stockholm: Euclidean gives ~627km, haversine gives ~415km (51% error) - The `dLon×85` constant assumes ~40° latitude; at 60° latitude the real scale factor is ~55.5km/degree, not 85 ## Changes | File | Change | |------|--------| | `public/analytics.js` | Replace `dLat*111, dLon*85` Euclidean with `HopResolver.haversineKm()` (with inline fallback) | | `public/hop-resolver.js` | Export `haversineKm` in the public API for reuse | | `test-frontend-helpers.js` | Add 4 tests: export check, zero distance, SF→LA accuracy, Euclidean vs haversine divergence | | `cmd/server/helpers_test.go` | Add `TestHaversineKm`: zero, SF→LA, symmetry, Oslo→Stockholm accuracy | | `public/index.html` | Cache buster bump | ## Performance No performance impact — `haversineKm` replaces an inline arithmetic expression with another inline arithmetic expression of identical O(1) complexity. Only called per hop pair in the subpath detail panel (typically <10 hops). ## Testing - `node test-frontend-helpers.js` — 248 passed, 0 failed - `go test -run TestHaversineKm` — PASS Co-authored-by: you <you@example.com> |
||
|
|
6712da7d7c |
fix: add region filtering to hash-collisions endpoint (#477)
## Summary The `/api/analytics/hash-collisions` endpoint always returned global results, ignoring the active region filter. Every other analytics endpoint (RF, topology, hash-sizes, channels, distance, subpaths) respected the `?region=` query parameter — this was the only one that didn't. Fixes #438 ## Changes ### Backend (`cmd/server/`) - **routes.go**: Extract `region` query param and pass to `GetAnalyticsHashCollisions(region)` - **store.go**: - `collisionCache` changed from `*cachedResult` → `map[string]*cachedResult` (keyed by region, `""` = global) — consistent with `rfCache`, `topoCache`, etc. - `GetAnalyticsHashCollisions(region)` and `computeHashCollisions(region)` now accept a region parameter - When region is specified, resolves regional observers, scans packets for nodes seen by those observers, and filters the node list before computing collisions - Cache invalidation updated to clear the map (not set to nil) ### Frontend (`public/`) - **analytics.js**: The hash-collisions fetch was missing `+ sep` (the region query string). All other fetches in the same `Promise.all` block had it — this was simply overlooked in PR #415. - **index.html**: Cache busters bumped ### Tests (`cmd/server/routes_test.go`) - `TestHashCollisionsRegionParamIgnored` → renamed to `TestHashCollisionsRegionParam` with updated comments reflecting that region is now accepted (with no configured regional observers, results match global — which the test verifies) ## Performance No new hot-path work. Region filtering adds one scan of `s.packets` (same as every other region-filtered analytics endpoint) only when `?region=` is provided. Results are cached per-region with the existing 60s TTL. Without `?region=`, behavior is unchanged. Co-authored-by: you <you@example.com> |
||
|
|
623ebc879b |
fix: add mutex synchronization to PerfStats to eliminate data races (#469)
## Summary Fixes #361 — `perfMiddleware()` wrote to shared `PerfStats` fields (`Requests`, `TotalMs`, `Endpoints` map, `SlowQueries` slice) without any synchronization, causing data races under concurrent HTTP requests. ## Changes ### `cmd/server/routes.go` - **Added `sync.Mutex` to `PerfStats` struct** — single mutex protects all fields - **`perfMiddleware`** — all shared state mutations (counter increments, endpoint map access, slice appends) now happen under lock. Key normalization (regex, mux route lookup) moved outside the lock since it uses no shared state - **`handleHealth`** — snapshots `Requests`, `TotalMs`, `SlowQueries` under lock before building response - **`handlePerf`** — copies all endpoint data and slow queries under lock into local snapshots, then does expensive work (sorting, percentile calculation) outside the lock - **`handlePerfReset`** — resets fields in-place instead of replacing the pointer (avoids unlocking a different mutex) ### `cmd/server/perfstats_race_test.go` (new) - Regression test: 50 concurrent writer goroutines + 10 concurrent reader goroutines hammering `PerfStats` simultaneously - Verifies no race conditions (via `-race` flag) and counter consistency ## Design Decisions - **Single mutex over atomics**: The issue suggested `atomic.Int64` for counters, but since slices/maps need a mutex anyway, a single mutex is simpler and the critical section is small (microseconds). No measurable contention at CoreScope's scale. - **Copy-under-lock pattern**: Expensive operations (sorting, percentile computation) happen outside the lock to minimize hold time. - **In-place reset**: `handlePerfReset` clears fields rather than replacing the `PerfStats` pointer, ensuring the mutex remains valid for concurrent goroutines. ## Testing - `go test -race -count=1 ./cmd/server/...` — **PASS** (all existing tests + new race test) - New `TestPerfStatsConcurrentAccess` specifically validates concurrent access patterns Co-authored-by: you <you@example.com> |
||
|
|
f87eb3601c |
fix: graceful container shutdown for reliable deployments (#453)
## Summary Fixes #450 — staging deployment flaky due to container not shutting down cleanly. ## Root Causes 1. **Server never closed DB on shutdown** — SQLite WAL lock held indefinitely, blocking new container startup 2. **`httpServer.Close()` instead of `Shutdown()`** — abruptly kills connections instead of draining them 3. **No `stop_grace_period` in compose configs** — Docker sends SIGTERM then immediately SIGKILL (default 10s is often not enough for WAL checkpoint) 4. **Supervisor didn't forward SIGTERM** — missing `stopsignal`/`stopwaitsecs` meant Go processes got SIGKILL instead of graceful shutdown 5. **Deploy scripts used default `docker stop` timeout** — only 10s grace period ## Changes ### Go Server (`cmd/server/`) - **Graceful HTTP shutdown**: `httpServer.Shutdown(ctx)` with 15s context timeout — drains in-flight requests before closing - **WebSocket cleanup**: New `Hub.Close()` method sends `CloseGoingAway` frames to all connected clients - **DB close on shutdown**: Explicitly closes DB after HTTP server stops (was never closed before) - **WAL checkpoint**: `PRAGMA wal_checkpoint(TRUNCATE)` before DB close — flushes WAL to main DB file and removes WAL/SHM lock files ### Go Ingestor (`cmd/ingestor/`) - **WAL checkpoint on shutdown**: New `Store.Checkpoint()` method, called before `Close()` - **Longer MQTT disconnect timeout**: 5s (was 1s) to allow in-flight messages to drain ### Docker Compose (all 4 variants) - Added `stop_grace_period: 30s` and `stop_signal: SIGTERM` ### Supervisor Configs (both variants) - Added `stopsignal=TERM` and `stopwaitsecs=20` to server and ingestor programs ### Deploy Scripts - `deploy-staging.sh`: `docker stop -t 30` with explicit grace period - `deploy-live.sh`: `docker stop -t 30` with explicit grace period ## Shutdown Sequence (after fix) 1. Docker sends SIGTERM to supervisord (PID 1) 2. Supervisord forwards SIGTERM to server + ingestor (waits up to 20s each) 3. Server: stops poller → drains HTTP (15s) → closes WS clients → checkpoints WAL → closes DB 4. Ingestor: stops tickers → disconnects MQTT (5s) → checkpoints WAL → closes DB 5. Docker waits up to 30s total before SIGKILL ## Tests All existing tests pass: - `cd cmd/server && go test ./...` ✅ - `cd cmd/ingestor && go test ./...` ✅ --------- Co-authored-by: you <you@example.com> Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com> |
||
|
|
01ca843309 |
perf: move collision analysis to server-side endpoint (fixes #386) (#415)
## Summary Moves the hash collision analysis from the frontend to a new server-side endpoint, eliminating a major performance bottleneck on the analytics collision tab. Fixes #386 ## Problem The collision tab was: 1. **Downloading all nodes** (`/nodes?limit=2000`) — ~500KB+ of data 2. **Running O(n²) pairwise distance calculations** on the browser main thread (~2M comparisons with 2000 nodes) 3. **Building prefix maps client-side** (`buildOneBytePrefixMap`, `buildTwoBytePrefixInfo`, `buildCollisionHops`) iterating all nodes multiple times ## Solution ### New endpoint: `GET /api/analytics/hash-collisions` Returns pre-computed collision analysis with: - `inconsistent_nodes` — nodes with varying hash sizes - `by_size` — per-byte-size (1, 2, 3) collision data: - `stats` — node counts, space usage, collision counts - `collisions` — pre-computed collisions with pairwise distances and classifications (local/regional/distant/incomplete) - `one_byte_cells` — 256-cell prefix map for 1-byte matrix rendering - `two_byte_cells` — first-byte-grouped data for 2-byte matrix rendering ### Caching Uses the existing `cachedResult` pattern with a new `collisionCache` map. Invalidated on `hasNewTransmissions` (same trigger as the hash-sizes cache) and on eviction. ### Frontend changes - `renderCollisionTab` now accepts pre-fetched `collisionData` from the parallel API load - New `renderHashMatrixFromServer` and `renderCollisionsFromServer` functions consume server-computed data directly - No more `/nodes?limit=2000` fetch from the collision tab - Old client-side functions (`buildOneBytePrefixMap`, etc.) preserved for test helper exports ## Test results - `go test ./...` (server): ✅ pass - `go test ./...` (ingestor): ✅ pass - `test-packet-filter.js`: ✅ 62 passed - `test-aging.js`: ✅ 29 passed - `test-frontend-helpers.js`: ✅ 227 passed ## Performance impact | Metric | Before | After | |--------|--------|-------| | Data transferred | ~500KB (all nodes) | ~50KB (collision data only) | | Client computation | O(n²) distance calc | None (server-cached) | | Main thread blocking | Yes (2000 nodes × pairwise) | No | | Server caching | N/A | 15s TTL, invalidated on new transmissions | --------- Co-authored-by: you <you@example.com> Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com> |
||
|
|
47d081c705 |
perf: targeted analytics cache invalidation (fixes #375) (#379)
## Problem Every time new data is ingested (`IngestNewFromDB`, `IngestNewObservations`, `EvictStale`), **all 6 analytics caches** are wiped by creating new empty maps — regardless of what kind of data actually changed. With the poller running every 1 second, this means the 15s cache TTL is effectively bypassed because caches are cleared far more frequently than they expire. ## Fix Introduces a `cacheInvalidation` flags struct and `invalidateCachesFor()` method that selectively clears only the caches affected by the ingested data: | Flag | Caches Cleared | |------|----------------| | `hasNewObservations` | RF (SNR/RSSI data changed) | | `hasNewPaths` | Topology, Distance, Subpaths | | `hasNewTransmissions` | Hash sizes | | `hasChannelData` | Channels (GRP_TXT payload_type 5) + channels list cache | | `eviction` | All (data removed, everything potentially stale) | ### Impact For a typical ingest cycle with ADVERT/ACK/TXT_MSG packets (no GRP_TXT): - **Before:** All 6 caches cleared every cycle - **After:** Channel cache preserved (most common case), hash cache preserved on observation-only ingestion For observation-only ingestion (`IngestNewObservations`): - **Before:** All 6 caches cleared - **After:** Only RF cache cleared (+ topo/dist/subpath if paths actually changed) ## Tests 7 new unit tests in `cache_invalidation_test.go` covering: - Eviction clears all caches - Observation-only ingest preserves non-RF caches - Transmission-only ingest clears only hash cache - Channel data clears only channel cache - Path changes clear topo/dist/subpath - Combined flags work correctly - No flags = no invalidation All existing tests pass. ### Post-rebase fix Restored `channelsCacheRes` invalidation that was accidentally dropped during the refactor. The old code cleared this separate channels list cache on every ingest, but `invalidateCachesFor()` didn't include it. Now cleared on `hasChannelData` and `eviction`. Fixes #375 --------- Co-authored-by: you <you@example.com> |
||
|
|
be313f60cb |
fix: extract score/direction from MQTT, strip units, fix type safety issues (#371)
## Summary Fixes #353 — addresses all 5 findings from the CoreScope code analysis. ## Changes ### Finding 1 (Major): `score` field never extracted from MQTT - Added `Score *float64` field to `PacketData` and `MQTTPacketMessage` structs - Extract `msg["score"]` with `msg["Score"]` case fallback via `toFloat64` in all three MQTT handlers (raw packet, channel message, direct message) - Pass through to DB observation insert instead of hardcoded `nil` ### Finding 2 (Major): `direction` field never extracted from MQTT - Added `Direction *string` field to `PacketData` and `MQTTPacketMessage` structs - Extract `msg["direction"]` with `msg["Direction"]` case fallback as string in all three MQTT handlers - Pass through to DB observation insert instead of hardcoded `nil` ### Finding 3 (Minor): `toFloat64` doesn't strip units - Added `stripUnitSuffix()` that removes common RF/signal unit suffixes (dBm, dB, mW, km, mi, m) case-insensitively before `ParseFloat` - Values like `"-110dBm"` or `"5.5dB"` now parse correctly ### Finding 4 (Minor): Bare type assertions in store.go - Changed `firstSeen` and `lastSeen` from `interface{}` to typed `string` variables at `store.go:5020` - Removed unsafe `.(string)` type assertions in comparisons ### Finding 5 (Minor): `distHopRecord.SNR` typed as `interface{}` - Changed `distHopRecord.SNR` from `interface{}` to `*float64` - Updated assignment (removed intermediate `snrVal` variable, pass `tx.SNR` directly) - Updated output serialization to use `floatPtrOrNil(h.SNR)` for consistent JSON output ## Tests Added - `TestBuildPacketDataScoreAndDirection` — verifies Score/Direction flow through BuildPacketData - `TestBuildPacketDataNilScoreDirection` — verifies nil handling when fields absent - `TestInsertTransmissionWithScoreAndDirection` — end-to-end: inserts with score/direction, verifies DB values - `TestStripUnitSuffix` — covers all supported suffixes, case insensitivity, and passthrough - `TestToFloat64WithUnits` — verifies unit-bearing strings parse correctly All existing tests pass. Co-authored-by: you <you@example.com> |
||
|
|
8a0862523d |
fix: add migration for missing observations.timestamp index (#332)
## Problem On installations where the database predates the `idx_observations_timestamp` index, `/api/stats` takes 30s+ because `GetStoreStats()` runs two full table scans: ```sql SELECT COUNT(*) FROM observations WHERE timestamp > ? -- last hour SELECT COUNT(*) FROM observations WHERE timestamp > ? -- last 24h ``` The index is only created in the `if !obsExists` block, so any database where the `observations` table already existed before that code was added never gets it. ## Fix Adds a one-time migration (`obs_timestamp_index_v1`) that runs at ingestor startup: ```sql CREATE INDEX IF NOT EXISTS idx_observations_timestamp ON observations(timestamp) ``` On large installations this index creation may take a few seconds on first startup after the upgrade, but subsequent stats queries become instant. ## Test plan - [ ] Restart ingestor on an older database and confirm `[migration] observations timestamp index created` appears in logs - [ ] Confirm `/api/stats` response time drops from 30s+ to <100ms 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
7e8b30aa1f |
perf: fix slow /api/packets and /api/channels on large stores (#328)
## Problem Two endpoints were slow on larger installations: **`/packets?limit=50000&groupByHash=true` — 16s+** `QueryGroupedPackets` did two expensive things on every request: 1. O(n × observations) scan per packet to find `latest` timestamp 2. Held `s.mu.RLock()` during the O(n log n) sort, blocking all concurrent reads **`/channels` — 13s+** `GetChannels` iterated all payload-type-5 packets and JSON-unmarshaled each one while holding `s.mu.RLock()`, blocking all concurrent reads for the full duration. ## Fix **Packets (`QueryGroupedPackets`):** - Add `LatestSeen string` to `StoreTx`, maintained incrementally in all three observation write paths. Eliminates the per-packet observation scan at query time. - Build output maps under the read lock, sort the local copy after releasing it. - Cache the full sorted result for 3 seconds keyed by filter params. **Channels (`GetChannels`):** - Copy only the fields needed (firstSeen, decodedJSON, region match) under the read lock, then release before JSON unmarshaling. - Cache the result for 15 seconds keyed by region param. - Invalidate cache on new packet ingestion. ## Test plan - [ ] Open packets page on a large store — load time should drop from 16s to <1s - [ ] Open channels page — should load in <100ms instead of 13s+ - [ ] `[SLOW API]` warnings gone for both endpoints - [ ] Packet/channel data is correct (hashes, counts, observer counts) - [ ] Filters (region, type, since/until) still work correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
b2279b230b |
fix: handle string, uint, and uint64 types in toFloat64 (#352)
## Summary Fixes #350 — `toFloat64()` silently drops SNR/RSSI values when bridges send strings instead of numbers. ## Problem Some MQTT bridges serialize numeric fields (SNR, RSSI, battery_mv, etc.) as JSON strings like `"-7.5"` instead of numbers. The existing `toFloat64()` switch only handled `float64`, `float32`, `int`, `int64`, and `json.Number`, so string values fell through to the default case returning `(0, false)` — silently dropping the data. ## Changes - **`cmd/ingestor/main.go`**: Added `string`, `uint`, and `uint64` cases to `toFloat64()` - `string`: uses `strconv.ParseFloat(strings.TrimSpace(n), 64)` to handle whitespace-padded numeric strings - `uint` / `uint64`: straightforward numeric conversion - Added `strconv` import - **`cmd/ingestor/main_test.go`**: Updated `TestToFloat64` with new cases: - Valid string (`"3.14"`), string with spaces (`" -7.5 "`), string integer (`"42"`) - Invalid string (`"hello"`), empty string - `uint(10)`, `uint64(999)` ## Testing All ingestor tests pass (`go test ./...`). Co-authored-by: you <you@example.com> |
||
|
|
738d5fef39 |
fix: poller uses store max IDs to prevent replaying entire DB
When GetMaxTransmissionID() fails silently (e.g., corrupted DB returns 0 from COALESCE), the poller starts from ID 0 and replays the entire database over WebSocket — broadcasting thousands of old packets per second. Fix: after querying the DB, use the in-memory store's MaxTransmissionID and MaxObservationID as a floor. Since Load() already read the full DB successfully, the store has the correct max IDs. Root cause discovered on staging: DB corruption caused MAX(id) query to fail, returning 0. Poller log showed 'starting from transmission ID 0' followed by 1000-2000 broadcasts per tick walking through 76K rows. Also adds MaxObservationID() to PacketStore for observation cursor safety. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
4cdc554b40 |
fix: use latest advert for node hash size instead of historical mode (#341)
## Summary Fixes #303 — Repeater hash stats now reflect the **latest advert** instead of the historical mode (most frequent). When a node is reconfigured (e.g. from 1-byte to 2-byte hash size), the analytics and node detail pages now show the updated value immediately after the next advert is received. ## Changes ### cmd/server/store.go 1. **computeNodeHashSizeInfo** — Changed hash size determination from statistical mode to latest advert. The most recent advert in chronological order now determines hash_size. The hash_sizes_seen and hash_size_inconsistent tracking is preserved for multi-byte analytics. 2. **computeAnalyticsHashSizes** — Two fixes: - **yNode keyed by pubKey** instead of name, so same-name nodes with different public keys are counted separately in distributionByRepeaters. - **Zero-hop adverts included** — advert originator tracking now happens before the hops check, so zero-hop adverts contribute to per-node stats. ### cmd/server/routes_test.go Added 4 new tests: - TestGetNodeHashSizeInfoLatestWins — 4 historical 1-byte adverts + 1 recent 2-byte advert → hash size should be 2 (not 1 from mode) - TestGetNodeHashSizeInfoNoAdverts — node with no ADVERT packets → graceful nil, no crash - TestAnalyticsHashSizeSameNameDifferentPubkey — two nodes named "SameName" with different pubkeys → counted as 2 separate entries - Updated TestGetNodeHashSizeInfoDominant comment to reflect new behavior ## Context Community report from contributor @kizniche: after reconfiguring a repeater from 1-byte to 2-byte hash and sending a flood advert, the analytics page still showed 1-byte. Root cause was the mode-based computation which required many new adverts to shift the majority. The upstream firmware bug causing stale path bytes (meshcore-dev/MeshCore#2154) has been fixed, making the latest advert reliable. ## Testing - `go vet ./...` — clean - `go test ./... -count=1` — all tests pass (including 4 new ones) - `cmd/ingestor` tests — pass --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |