mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-04-27 06:25:50 +00:00
1d449eabc79c4a0dfbc216bff9cdceec07973255
217 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
42ff5a291b |
fix(#866): full-page obs-switch — update hex + path + direction per observation (#870)
## Problem On `/#/packets/<hash>?obs=<id>`, clicking a different observation updated summary fields (Observer, SNR/RSSI, Timestamp) but **not** hex payload or path details. Sister bug to #849 (fixed in #851 for the detail dialog). ## Root Causes | Cause | Impact | |-------|--------| | `selectPacket` called `renderDetail` without `selectedObservationId` | Initial render missed observation context on some code paths | | `ObservationResp` missing `direction`, `resolved_path`, `raw_hex` | Frontend obs-switch lost direction and resolved_path context | | `obsPacket` construction omitted `direction` field | Direction not preserved when switching observations | ## Fix - `selectPacket` explicitly passes `selectedObservationId` to `renderDetail` - `ObservationResp` gains `Direction`, `ResolvedPath`, `RawHex` fields - `mapSliceToObservations` copies the three new fields - `obsPacket` spreads include `direction` from the observation ## Tests 7 new tests in `test-frontend-helpers.js`: - Observation switch updates `effectivePkt` path - `raw_hex` preserved from packet when obs has none - `raw_hex` from obs overrides when API provides it - `direction` carried through observation spread - `resolved_path` carried through observation spread - `getPathLenOffset` cross-check for transport routes - URL hash `?obs=` round-trip encoding All 584 frontend + 62 filter + 29 aging tests pass. Go server tests pass. Fixes #866 Co-authored-by: you <you@example.com> |
||
|
|
441409203e |
feat(#845): bimodal_clock severity — surface flaky-RTC nodes instead of hiding as 'No Clock' (#850)
## Problem Nodes with flaky RTC (firmware emitting interleaved good and nonsense timestamps) were classified as `no_clock` because the broken samples poisoned the recent median. Operators lost visibility into these nodes — they showed "No Clock" even though ~60% of their adverts had valid timestamps. Observed on staging: a node with 31K samples where recent adverts interleave good skew (-6.8s, -13.6s) with firmware nonsense (-56M, -60M seconds). Under the old logic, median of the mixed window → `no_clock`. ## Solution New `bimodal_clock` severity tier that surfaces flaky-RTC nodes with their real (good-sample) skew value. ### Classification order (first match wins) | Severity | Good Fraction | Description | |----------|--------------|-------------| | `no_clock` | < 10% | Essentially no real clock | | `bimodal_clock` | 10–80% (and bad > 0) | Mixed good/bad — flaky RTC | | `ok`/`warn`/`critical`/`absurd` | ≥ 80% | Normal classification | "Good" = `|skew| <= 1 hour`; "bad" = likely uninitialized RTC nonsense. When `bimodal_clock`, `recentMedianSkewSec` is computed from **good samples only**, so the dashboard shows the real working-clock value (e.g. -7s) instead of the broken median. ### Backend changes - New constant `BimodalSkewThresholdSec = 3600` - New severity `bimodal_clock` in classification logic - New API fields: `goodFraction`, `recentBadSampleCount`, `recentSampleCount` ### Frontend changes - Amber `Bimodal` badge with tooltip showing bad-sample percentage - Bimodal nodes render skew value like ok/warn/severe (not the "No Clock" path) - Warning line below sparkline: "⚠️ X of last Y adverts had nonsense timestamps (likely RTC reset)" ### Tests - 3 new Go unit tests: bimodal (60% good → bimodal_clock), all-bad (→ no_clock), 90%-good (→ ok) - 1 new frontend test: bimodal badge rendering with tooltip - Existing `TestReporterScenario_789` passes unchanged Builds on #789 (recent-window severity). Closes #845 --------- Co-authored-by: you <you@example.com> |
||
|
|
a371d35bfd |
feat(#847): dedupe Top Longest Hops by pair + add obs count and SNR cues (#848)
## Problem The "Top 20 Longest Hops" RF analytics card shows the same repeater pair filling most slots because the query sorts raw hop records by distance with no pair deduplication. A single long link observed 12+ times dominates the leaderboard. ## Fix Dedupe by unordered `(pk1, pk2)` pair. Per pair, keep the max-distance record and compute reliability metrics: | Column | Description | |--------|-------------| | **Obs** | Total observations of this link | | **Best SNR** | Maximum SNR seen (dB) | | **Median SNR** | Median SNR across all observations (dB) | Tooltip on each row shows the timestamp of the best observation. ### Before | # | From | To | Distance | Type | SNR | Packet | |---|------|----|----------|------|-----|--------| | 1 | NodeX | NodeY | 200 mi | R↔R | 5 dB | abc… | | 2 | NodeX | NodeY | 199 mi | R↔R | 6 dB | def… | | 3 | NodeX | NodeY | 198 mi | R↔R | 4 dB | ghi… | ### After | # | From | To | Distance | Type | Obs | Best SNR | Median SNR | Packet | |---|------|----|----------|------|-----|----------|------------|--------| | 1 | NodeX | NodeY | 200 mi | R↔R | 12 | 8.0 dB | 5.2 dB | abc… | | 2 | NodeA | NodeB | 150 mi | C↔R | 3 | 6.5 dB | 6.5 dB | jkl… | ## Changes - **`cmd/server/store.go`**: Group `filteredHops` by unordered pair key, accumulate obs count / best SNR / median SNR per group, sort by max distance, take top 20 - **`cmd/server/types.go`**: Update `DistanceHop` struct — replace `SNR` with `BestSnr`, `MedianSnr`, add `ObsCount` - **`public/analytics.js`**: Replace single SNR column with Obs, Best SNR, Median SNR; add row tooltip with best observation timestamp - **`cmd/server/store_tophops_test.go`**: 3 unit tests — basic dedupe, reverse-pair merge, nil SNR edge case ## Test Coverage - `TestDedupeTopHopsByPair`: 5 records on pair (A,B) + 1 on (C,D) → 2 results, correct obsCount/dist/bestSnr/medianSnr - `TestDedupeTopHopsReversePairMerges`: (B,A) and (A,B) merge into one entry - `TestDedupeTopHopsNilSNR`: all-nil SNR records → bestSnr and medianSnr both nil - Existing `TestAnalyticsRFEndpoint` and `TestAnalyticsRFWithRegion` still pass Closes #847 --------- Co-authored-by: you <you@example.com> |
||
|
|
3f26dc7190 |
obs: surface real RSS alongside tracked store bytes in /api/stats (#832) (#835)
Closes #832. ## Root cause confirmed \`trackedMB\` (\`s.trackedBytes\` in \`store.go\`) only sums per-packet struct + payload sizes recorded at insertion. It excludes the index maps (\`byHash\`, \`byTxID\`, \`byNode\`, \`byObserver\`, \`byPathHop\`, \`byPayloadType\`, hash-prefix maps, name lookups), the analytics LRUs (rfCache/topoCache/hashCache/distCache/subpathCache/chanCache/collisionCache), WS broadcast queues, and Go runtime overhead. It's \"useful packet bytes,\" not RSS — typically 3–5× off on staging. ## Fix (Option C from the issue) Expose four memory fields on \`/api/stats\` from a single cached snapshot: | Field | Source | Semantics | |---|---|---| | \`storeDataMB\` | \`s.trackedBytes\` | in-store packet bytes; eviction watermark input | | \`goHeapInuseMB\` | \`runtime.MemStats.HeapInuse\` | live Go heap | | \`goSysMB\` | \`runtime.MemStats.Sys\` | total Go-managed memory | | \`processRSSMB\` | \`/proc/self/status VmRSS\` (Linux), falls back to \`goSysMB\` | what the kernel sees | \`trackedMB\` is retained as a deprecated alias for \`storeDataMB\` so existing dashboards/QA scripts keep working. Field invariants are documented on \`MemorySnapshot\`: \`processRSSMB ≥ goSysMB ≥ goHeapInuseMB ≥ storeDataMB\` (typical). ## Performance Single \`getMemorySnapshot\` call cached for 1s — \`runtime.ReadMemStats\` (stop-the-world) and the \`/proc/self/status\` read are amortized across burst polling. \`/proc\` read is bounded to 8 KiB, parsed with \`strconv\` only — no shell-out, no untrusted input. \`cgoBytesMB\` is omitted: the build uses pure-Go \`modernc.org/sqlite\`, so there is no cgo allocator to measure. Documented in code comment. ## Tests \`cmd/server/stats_memory_test.go\` asserts presence, types, sign, and ordering invariants. Avoids the flaky \"matches RSS to ±X%\" pattern. \`\`\` $ go test ./... -count=1 -timeout 180s ok github.com/corescope/server 19.410s \`\`\` ## QA plan §1.4 now compares \`processRSSMB\` against procfs RSS (the right invariant); threshold stays at 0.20. --------- Co-authored-by: MeshCore Agent <meshcore-agent@openclaw.local> |
||
|
|
886aabf0ae |
fix(#827): /api/packets/{hash} falls back to DB when in-memory store misses (#831)
Closes #827. ## Problem `/api/packets/{hash}` only consulted the in-memory `PacketStore`. When a packet aged out of memory, the handler 404'd — even though SQLite still had it and `/api/nodes/{pubkey}` `recentAdverts` (which reads from the DB) was actively surfacing the hash. Net effect: the **Analyze →** link on older adverts in the node detail page led to a dead "Not found". Two-store inconsistency: DB has the packet, in-memory doesn't, node detail surfaces it from DB → packet detail can't serve it. ## Fix In `handlePacketDetail`: - After in-memory miss, fall back to `db.GetPacketByHash` (already existed) for hash lookups, and `db.GetTransmissionByID` for numeric IDs. - Track when the result came from the DB; if so and the store has no observations, populate from DB via a new `db.GetObservationsForHash` so the response shows real observations instead of the misleading `observation_count = 1` fallback. ## Tests - `TestPacketDetailFallsBackToDBWhenStoreMisses` — insert a packet directly into the DB after `store.Load()`, confirm store doesn't have it, assert 200 + populated observations. - `TestPacketDetail404WhenAbsentFromBoth` — neither store nor DB → 404 (no false positives). - `TestPacketDetailPrefersStoreOverDB` — both have it; store result wins (no double-fetch). - `TestHandlePacketDetailNoStore` updated: it previously asserted the old buggy 404 behavior; now asserts the correct DB-fallback 200. All `go test ./... -run "PacketDetail|Packet|GetPacket"` and the full `cmd/server` suite pass. ## Out of scope The `/api/packets?hash=` filter is the live in-memory list endpoint and intentionally store-only for performance. Not touched here — happy to file a follow-up if you'd rather harmonise. ## Repro context Verified against prod with a recently-adverting repeater whose recent advert hash lives in `recentAdverts` (DB) but had been evicted from the in-memory store; pre-fix 404, post-fix 200 with full observations. Co-authored-by: you <you@example.com> |
||
|
|
a0fddb50aa |
fix(#789): severity from recent samples; Theil-Sen drift with outlier rejection (#828)
Closes #789. ## The two bugs 1. **Severity from stale median.** `classifySkew(absMedian)` used the all-time `MedianSkewSec` over every advert ever recorded for the node. A repeater that was off for hours and then GPS-corrected stayed pinned to `absurd` because hundreds of historical bad samples poisoned the median. Reporter's case: `medianSkewSec: -59,063,561.8` while `lastSkewSec: -0.8` — current health was perfect, dashboard said catastrophic. 2. **Drift from a single correction jump.** Drift used OLS over every `(ts, skew)` pair, with no outlier rejection. A single GPS-correction event (skew jumps millions of seconds in ~30s) dominated the regression and produced `+1,793,549.9 s/day` — physically nonsense; the existing `maxReasonableDriftPerDay` cap then zeroed it (better than absurd, but still useless). ## The two fixes 1. **Recent-window severity.** New field `recentMedianSkewSec` = median over the last `N=5` samples or last `1h`, whichever is narrower (more current view). Severity now derives from `abs(recentMedianSkewSec)`. `MeanSkewSec`, `MedianSkewSec`, `LastSkewSec` are preserved unchanged so the frontend, fleet view, and any external consumers continue to work. 2. **Theil-Sen drift with outlier filter.** Drift now uses the Theil-Sen estimator (median of all pairwise slopes — textbook robust regression, ~29% breakdown point) on a series pre-filtered to drop samples whose skew jumps more than `maxPlausibleSkewJumpSec = 60s` from the previous accepted point. Real µC drift is fractions of a second per advert; clock corrections fall well outside. Capped at `theilSenMaxPoints = 200` (most-recent) so O(n²) stays bounded for chatty nodes. ## What stays the same - Epoch-0 / out-of-range advert filter (PR #769). - `minDriftSamples = 5` floor. - `maxReasonableDriftPerDay = 86400` hard backstop. - API shape: only additions (`recentMedianSkewSec`); no fields removed or renamed. ## Tests All in `cmd/server/clock_skew_test.go`: - `TestSeverityUsesRecentNotMedian` — 100 bad samples (-60s) + 5 good (-1s) → severity = `ok`, historical median still huge. - `TestDriftRejectsCorrectionJump` — 30 min of clean linear drift + one 1000s jump → drift small (~12 s/day). - `TestTheilSenMatchesOLSWhenClean` — clean linear data, Theil-Sen within ~1% of OLS. - `TestReporterScenario_789` — exact reproducer: 1662 samples, 1657 @ -683 days then 5 @ -1s → severity `ok`, `recentMedianSkewSec ≈ 0`, drift bounded; legacy `medianSkewSec` preserved as historical context. `go test ./... -count=1` (cmd/server) and `node test-frontend-helpers.js` both pass. --------- Co-authored-by: clawbot <bot@corescope.local> Co-authored-by: you <you@example.com> |
||
|
|
cad1f11073 |
fix: bypass IATA filter for status messages, fill SNR on duplicate obs (#694) (#802)
## Problems Two independent ingestor bugs identified in #694: ### 1. IATA filter drops status messages from out-of-region observers The IATA filter ran at the top of `handleMessage()` before any message-type discrimination. Status messages carrying observer metadata (`noise_floor`, battery, airtime) from observers outside the configured IATA regions were silently discarded before `UpsertObserver()` and `InsertMetrics()` ran. **Impact:** Observers running `meshcoretomqtt/1.0.8.0` in BFL and LAX — the only client versions that include `noise_floor` in status messages — had their health data dropped entirely on prod instances filtering to SJC. **Fix:** Moved the IATA filter to the packet path only (after the `parts[3] == "status"` branch). Status messages now always populate observer health data regardless of configured region filter. ### 2. `INSERT OR IGNORE` discards SNR/RSSI on late arrival When the same `(transmission_id, observer_idx, path_json)` observation arrived twice — first without RF fields, then with — `INSERT OR IGNORE` silently discarded the SNR/RSSI from the second arrival. **Fix:** Changed to `ON CONFLICT(...) DO UPDATE SET snr = COALESCE(excluded.snr, snr), rssi = ..., score = ...`. A later arrival with SNR fills in a `NULL`; a later arrival without SNR does not overwrite an existing value. ## Tests - `TestIATAFilterDoesNotDropStatusMessages` — verifies BFL status message is processed when IATA filter includes only SJC, and that BFL packet is still filtered - `TestInsertObservationSNRFillIn` — verifies SNR fills in on second arrival, and is not overwritten by a subsequent null arrival ## Related Partially addresses #694 (upstream client issue of missing SNR in packet messages is out of scope) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
7f024b7aa7 |
fix(#673): replace raw JSON text search with byNode index for node packet queries (#803)
## Summary Fixes #673 - GRP_TXT packets whose message text contains a node's pubkey were incorrectly counted as packets for that node, inflating packet counts and type breakdowns - Two code paths in `store.go` used `strings.Contains` on the full `DecodedJSON` blob — this matched pubkeys appearing anywhere in the JSON, including inside chat message text - `filterPackets` slow path (combined node + other filters): replaced substring search with a hash-set membership check against `byNode[nodePK]` - `GetNodeAnalytics`: removed the full-packet-scan + text search branch entirely; always uses the `byNode` index (which already covers `pubKey`/`destPubKey`/`srcPubKey` via structured field indexing) ## Test Plan - [x] `TestGetNodeAnalytics_ExcludesGRPTXTWithPubkeyInText` — verifies a GRP_TXT packet with the node's pubkey in its text field is not counted in that node's analytics - [x] `TestFilterPackets_NodeQueryDoesNotMatchChatText` — verifies the combined-filter slow path of `filterPackets` returns only the indexed ADVERT, not the chat packet Both tests were written as failing tests against the buggy code and pass after the fix. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
2460e33f94 |
fix(#810): /health.recentPackets resolved_path falls back to longest sibling obs (#821)
## What + why
`fetchResolvedPathForTxBest` (used by every API path that fills the
top-level `resolved_path`, including
`/api/nodes/{pk}/health.recentPackets`) picked the observation with the
longest `path_json` and queried SQL for that single obs ID. When the
longest-path obs had `resolved_path` NULL but a shorter sibling had one,
the helper returned nil and the top-level field was dropped — even
though the data exists. QA #809 §2.1 caught it on the health endpoint
because that page surfaces it per-tx.
Fix: keep the LRU-friendly fast path (try the longest-path obs), then
fall back to scanning all observations of the tx and picking the longest
`path_json` that actually has a stored `resolved_path`.
## Changes
- `cmd/server/resolved_index.go`: extend `fetchResolvedPathForTxBest`
with a fallback through `fetchResolvedPathsForTx`.
- `cmd/server/issue810_repro_test.go`: regression test — seeds a tx
whose longest-path obs lacks `resolved_path` and a shorter sibling has
it, then asserts `/api/packets` and
`/api/nodes/{pk}/health.recentPackets` agree.
## Tests
`go test ./... -count=1` from `cmd/server` — PASS (full suite, ~19s).
## Perf
Fast path unchanged (single LRU/SQL lookup, dominant case). Fallback
only runs when the longest-path obs has NULL `resolved_path` — one
indexed query per affected tx, bounded by observations-per-tx (small).
Closes #810
---------
Co-authored-by: you <you@example.com>
|
||
|
|
d7fe24e2db |
Fix channel filter on Packets page (UI + API) — #812 (#816)
Closes #812 ## Root causes **Server (`/api/packets?channel=…` returned identical totals):** The handler in `cmd/server/routes.go` never read the `channel` query parameter into `PacketQuery`, so it was silently ignored by both the SQLite path (`db.go::buildTransmissionWhere`) and the in-memory path (`store.go::filterPackets`). The codebase already had everything else in place — the `channel_hash` column with an index from #762, decoded `channel` / `channelHashHex` fields on each packet — it just wasn't wired up. **UI (`/#/packets` had no channel filter):** `public/packets.js` rendered observer / type / time-window / region filters but no channel control, and didn't read `?channel=` from the URL. ## Fix ### Server - New `Channel` field on `PacketQuery`; `handlePackets` reads `r.URL.Query().Get("channel")`. - DB path filters by the indexed `channel_hash` column (exact match). - In-memory path: helper `packetMatchesChannel` matches `decoded.channel` (plaintext, e.g. `#test`, `public`) or `enc_<HEX>` against `channelHashHex` for undecryptable GRP_TXT. Uses cached `ParsedDecoded()` so it's O(1) after first parse. Fast-path index guards and the grouped-cache key updated to include channel. - Regression test (`channel_filter_test.go`): `channel=#test` returns ≥1 GRP_TXT packet and fewer than baseline; `channel=nonexistentchannel` returns `total=0`. ### UI - New `<select id="fChannel">` populated from `/api/channels`. - Round-trips via `?channel=…` on the URL hash (read on init, written on change). - Pre-seeds the current value as an option so encrypted hashes not in `/api/channels` still display as selected on reload. - On change, calls `loadPackets()` so the server-side filter applies before pagination. ## Perf Filter adds at most one cached map lookup per packet (DB path uses indexed column, store path uses `ParsedDecoded()` cache). Staging baseline 149–190 ms for `?channel=#test&limit=50`; the new comparison is negligible. Target ≤ 500 ms preserved. ## Tests `cd cmd/server && go test ./... -count=1 -timeout 120s` → PASS. --------- Co-authored-by: you <you@example.com> |
||
|
|
9e90548637 |
perf(#800): remove per-StoreTx ResolvedPath, replace with membership index + on-demand decode (#806)
## Summary Remove `ResolvedPath []*string` field from `StoreTx` and `StoreObs` structs, replacing it with a compact membership index + on-demand SQL decode. This eliminates the dominant heap cost identified in profiling (#791, #799). **Spec:** #800 (consolidated from two rounds of expert + implementer review on #799) Closes #800 Closes #791 ## Design ### Removed - `StoreTx.ResolvedPath []*string` - `StoreObs.ResolvedPath []*string` - `TransmissionResp.ResolvedPath`, `ObservationResp.ResolvedPath` struct fields ### Added | Structure | Purpose | Est. cost at 1M obs | |---|---|---:| | `resolvedPubkeyIndex map[uint64][]int` | FNV-1a(pubkey) → []txID forward index | 50–120 MB | | `resolvedPubkeyReverse map[int][]uint64` | txID → []hashes for clean removal | ~40 MB | | `apiResolvedPathLRU` (10K entries) | FIFO cache for on-demand API decode | ~2 MB | ### Decode-window discipline `resolved_path` JSON decoded once per packet. Consumers fed in order, temp slice dropped — never stored on struct: 1. `addToByNode` — relay node indexing 2. `touchRelayLastSeen` — relay liveness DB updates 3. `byPathHop` resolved-key entries 4. `resolvedPubkeyIndex` + reverse insert 5. WebSocket broadcast map (raw JSON bytes) 6. Persist batch (raw JSON bytes for SQL UPDATE) ### Collision safety When the forward index returns candidates, a batched SQL query confirms exact pubkey presence using `LIKE '%"pubkey"%'` on the `resolved_path` column. ### Feature flag `useResolvedPathIndex` (default `true`). Off-path is conservative: all candidates kept, index not consulted. For one-release rollback safety. ## Files changed | File | Changes | |---|---| | `resolved_index.go` | **New** — index structures, LRU cache, on-demand SQL helpers, collision safety | | `store.go` | Remove RP fields, decode-window discipline in Load/Ingest, on-demand txToMap/obsToMap/enrichObs, eviction cleanup via SQL, memory accounting update | | `types.go` | Remove RP fields from TransmissionResp/ObservationResp | | `routes.go` | Replace `nodeInResolvedPath` with `nodeInResolvedPathViaIndex`, remove RP from mapSlice helpers | | `neighbor_persist.go` | Refactor backfill: reverse-map removal → forward+reverse insert → LRU invalidation | ## Tests added (27 new) **Unit:** - `TestStoreTx_ResolvedPathFieldAbsent` — reflection guard - `TestResolvedPubkeyIndex_BuildFromLoad` — forward+reverse consistency - `TestResolvedPubkeyIndex_HashCollision` — SQL collision safety - `TestResolvedPubkeyIndex_IngestUpdate` — maps reflect new ingests - `TestResolvedPubkeyIndex_RemoveOnEvict` — clean removal via reverse map - `TestResolvedPubkeyIndex_PerObsCoverage` — non-best obs pubkeys indexed - `TestAddToByNode_WithoutResolvedPathField` - `TestTouchRelayLastSeen_WithoutResolvedPathField` - `TestWebSocketBroadcast_IncludesResolvedPath` - `TestBackfill_InvalidatesLRU` - `TestEviction_ByNodeCleanup_OnDemandSQL` - `TestExtractResolvedPubkeys`, `TestMergeResolvedPubkeys` - `TestResolvedPubkeyHash_Deterministic` - `TestLRU_EvictionOnFull` **Endpoint:** - `TestPathsThroughNode_NilResolvedPathFallback` - `TestPacketsAPI_OnDemandResolvedPath` - `TestPacketsAPI_OnDemandResolvedPath_LRUHit` - `TestPacketsAPI_OnDemandResolvedPath_Empty` **Feature flag:** - `TestFeatureFlag_OffPath_PreservesOldBehavior` - `TestFeatureFlag_Toggle_NoStateLeak` **Concurrency:** - `TestReverseMap_NoLeakOnPartialFailure` - `TestDecodeWindow_LockHoldTimeBounded` - `TestLivePolling_LRUUnderConcurrentIngest` **Regression:** - `TestRepeaterLiveness_StillAccurate` **Benchmarks:** - `BenchmarkLoad_BeforeAfter` - `BenchmarkResolvedPubkeyIndex_Memory` - `BenchmarkPathsThroughNode_Latency` - `BenchmarkLivePolling_UnderIngest` ## Benchmark results ``` BenchmarkResolvedPubkeyIndex_Memory/pubkeys=50K 429ms 103MB 777K allocs BenchmarkResolvedPubkeyIndex_Memory/pubkeys=500K 4205ms 896MB 7.67M allocs BenchmarkLoad_BeforeAfter 65ms 20MB 202K allocs BenchmarkPathsThroughNode_Latency 3.9µs 0B 0 allocs BenchmarkLivePolling_UnderIngest 5.4µs 545B 7 allocs ``` Key: per-obs `[]*string` overhead completely eliminated. At 1M obs with 3 hops average, this saves ~72 bytes/obs × 1M = ~68 MB just from the slice headers + pointers, plus the JSON-decoded string data (~900 MB at scale per profiling). ## Design choices - **FNV-1a instead of xxhash**: stdlib availability, no external dependency. Performance is equivalent for this use case (pubkey strings are short). - **FIFO LRU instead of true LRU**: simpler implementation, adequate for the access pattern (mostly sequential obs IDs from live polling). - **Grouped packets view omits resolved_path**: cold path, not worth SQL round-trip per page render. - **Backfill pending check uses reverse-map presence** instead of per-obs field: if a tx has any indexed pubkeys, its observations are considered resolved. Closes #807 --------- Co-authored-by: you <you@example.com> |
||
|
|
a8e1cea683 |
fix: use payload type bits only in content hash (not full header byte) (#787)
## Problem The firmware computes packet content hash as: ``` SHA256(payload_type_byte + [path_len for TRACE] + payload) ``` Where `payload_type_byte = (header >> 2) & 0x0F` — just the payload type bits (2-5). CoreScope was using the **full header byte** in its hash computation, which includes route type bits (0-1) and version bits (6-7). This meant the same logical packet produced different content hashes depending on route type — breaking dedup and packet lookup. **Firmware reference:** `Packet.cpp::calculatePacketHash()` uses `getPayloadType()` which returns `(header >> PH_TYPE_SHIFT) & PH_TYPE_MASK`. ## Fix - Extract only payload type bits: `payloadType := (headerByte >> 2) & 0x0F` - Include `path_len` byte in hash for TRACE packets (matching firmware behavior) - Applied to both `cmd/server/decoder.go` and `cmd/ingestor/decoder.go` ## Tests Added - **Route type independence:** Same payload with FLOOD vs DIRECT route types produces identical hash - **TRACE path_len inclusion:** TRACE packets with different `path_len` produce different hashes - **Firmware compatibility:** Hash output matches manual computation of firmware algorithm ## Migration Impact Existing packets in the DB have content hashes computed with the old (incorrect) formula. Options: 1. **Recompute hashes** via migration (recommended for clean state) 2. **Dual lookup** — check both old and new hash on queries (backward compat) 3. **Accept the break** — old hashes become stale, new packets get correct hashes Recommend option 1 (migration) as a follow-up. The volume of affected packets depends on how many distinct route types were seen for the same logical packet. Fixes #786 --------- Co-authored-by: you <you@example.com> |
||
|
|
bf674ebfa2 |
feat: validate advert signatures on ingest, reject corrupt packets (#794)
## Summary
Validates ed25519 signatures on ADVERT packets during MQTT ingest.
Packets with invalid signatures are rejected before storage, preventing
corrupt/truncated adverts from polluting the database.
## Changes
### Ingestor (`cmd/ingestor/`)
- **Signature validation on ingest**: After decoding an ADVERT, checks
`SignatureValid` from the decoder. Invalid signatures → packet dropped,
never stored.
- **Config flag**: `validateSignatures` (default `true`). Set to `false`
to disable validation for backward compatibility with existing installs.
- **`dropped_packets` table**: New SQLite table recording every rejected
packet with full attribution:
- `hash`, `raw_hex`, `reason`, `observer_id`, `observer_name`,
`node_pubkey`, `node_name`, `dropped_at`
- Indexed on `observer_id` and `node_pubkey` for investigation queries
- **`SignatureDrops` counter**: New atomic counter in `DBStats`, logged
in periodic stats output as `sig_drops=N`
- **Retention**: `dropped_packets` pruned alongside metrics on the same
`retention.metricsDays` schedule
### Server (`cmd/server/`)
- **`GET /api/dropped-packets`** (API key required): Returns recent
drops with optional `?observer=` and `?pubkey=` filters, `?limit=`
(default 100, max 500)
- **`signatureDrops`** field added to `/api/stats` response (count from
`dropped_packets` table)
### Tests (8 new)
| Test | What it verifies |
|------|-----------------|
| `TestSigValidation_ValidAdvertStored` | Valid advert passes validation
and is stored |
| `TestSigValidation_TamperedSignatureDropped` | Tampered signature →
dropped, recorded in `dropped_packets` with correct fields |
| `TestSigValidation_TruncatedAppdataDropped` | Truncated appdata
invalidates signature → dropped |
| `TestSigValidation_DisabledByConfig` | `validateSignatures: false`
skips validation, stores tampered packet |
| `TestSigValidation_DropCounterIncrements` | Counter increments
correctly across multiple drops |
| `TestSigValidation_LogContainsFields` | `dropped_packets` row contains
hash, reason, observer, pubkey, name |
| `TestPruneDroppedPackets` | Old entries pruned, recent entries
retained |
| `TestShouldValidateSignatures_Default` | Config helper returns correct
defaults |
### Config example
```json
{
"validateSignatures": true
}
```
Fixes #793
---------
Co-authored-by: you <you@example.com>
|
||
|
|
d596becca3 |
feat: bounded cold load — limit Load() by memory budget (#790)
## Implements #748 M1 — Bounded Cold Load ### Problem `Load()` pulls the ENTIRE database into RAM before eviction runs. On a 1GB database, this means 3+ GB peak memory at startup, regardless of `maxMemoryMB`. This is the root cause of #743 (OOM on 2GB VMs). ### Solution Calculate the maximum number of transmissions that fit within the `maxMemoryMB` budget and use a SQL subquery LIMIT to load only the newest packets. **Two-phase approach** (avoids the JOIN-LIMIT row count problem): ```sql SELECT ... FROM transmissions t LEFT JOIN observations o ON ... WHERE t.id IN (SELECT id FROM transmissions ORDER BY first_seen DESC LIMIT ?) ORDER BY t.first_seen ASC, o.timestamp DESC ``` ### Changes - **`estimateStoreTxBytesTypical(numObs)`** — estimates memory cost of a typical transmission without needing an actual `StoreTx` instance. Used for budget calculation. - **Budget calculation in `Load()`** — `maxPackets = (maxMemoryMB * 1048576) / avgBytesPerPacket` with a floor of 1000 packets. - **Subquery LIMIT** — loads only the newest N transmissions when bounded. - **`oldestLoaded` tracking** — records the oldest packet timestamp in memory so future SQL fallback queries (M2+) know where in-memory data ends. - **Perf stats** — `oldestLoaded` exposed in `/api/perf/store-stats`. - **Logging** — bounded loads show `Loaded X/Y transmissions (limited by ZMB budget)`. ### When `maxMemoryMB=0` (unlimited) Behavior is completely unchanged — no LIMIT clause, all packets loaded. ### Tests (6 new) | Test | Validates | |------|-----------| | `TestBoundedLoad_LimitedMemory` | With 1MB budget, loads fewer than total (hits 1000 minimum) | | `TestBoundedLoad_NewestFirst` | Loaded packets are the newest, not oldest | | `TestBoundedLoad_OldestLoadedSet` | `oldestLoaded` matches first packet's `FirstSeen` | | `TestBoundedLoad_UnlimitedWithZero` | `maxMemoryMB=0` loads all packets | | `TestBoundedLoad_AscendingOrder` | Packets remain in ascending `first_seen` order after bounded load | | `TestEstimateStoreTxBytesTypical` | Estimate grows with observation count, exceeds floor | Plus benchmarks: `BenchmarkLoad_Bounded` vs `BenchmarkLoad_Unlimited`. ### Perf justification On a 5000-transmission test DB with 1MB budget: - Bounded: loads 1000 packets (the minimum) in ~1.3s - The subquery uses SQLite's index on `first_seen` — O(N log N) for the LIMIT, then indexed JOIN for observations - No full table scan needed when bounded ### Next milestones - **M2**: Packet list/search SQL fallback (uses `oldestLoaded` boundary) - **M3**: Node analytics SQL fallback - **M4-M5**: Remaining endpoint fallbacks + live-only memory store --------- Co-authored-by: you <you@example.com> |
||
|
|
b9ba447046 |
feat: add nodeBlacklist config to hide abusive/troll nodes (#742)
## Problem
Some mesh participants set offensive names, report deliberately false
GPS positions, or otherwise troll the network. Instance operators
currently have no way to hide these nodes from public-facing APIs
without deleting the underlying data.
## Solution
Add a `nodeBlacklist` array to `config.json` containing public keys of
nodes to exclude from all API responses.
### Blacklisted nodes are filtered from:
- `GET /api/nodes` — list endpoint
- `GET /api/nodes/search` — search results
- `GET /api/nodes/{pubkey}` — detail (returns 404)
- `GET /api/nodes/{pubkey}/health` — returns 404
- `GET /api/nodes/{pubkey}/paths` — returns 404
- `GET /api/nodes/{pubkey}/analytics` — returns 404
- `GET /api/nodes/{pubkey}/neighbors` — returns 404
- `GET /api/nodes/bulk-health` — filtered from results
### Config example
```json
{
"nodeBlacklist": [
"aabbccdd...",
"11223344..."
]
}
```
### Design decisions
- **Case-insensitive** — public keys normalized to lowercase
- **Whitespace trimming** — leading/trailing whitespace handled
- **Empty entries ignored** — `""` or `" "` do not cause false positives
- **Nil-safe** — `IsBlacklisted()` on nil Config returns false
- **Backward-compatible** — empty/missing `nodeBlacklist` has zero
effect
- **Lazy-cached set** — blacklist converted to `map[string]bool` on
first lookup
### What this does NOT do (intentionally)
- Does **not** delete or modify database data — only filters API
responses
- Does **not** block packet ingestion — data still flows for analytics
- Does **not** filter `/api/packets` — only node-facing endpoints are
affected
## Testing
- Unit tests for `Config.IsBlacklisted()` (case sensitivity, whitespace,
empty entries, nil config)
- Integration tests for `/api/nodes`, `/api/nodes/{pubkey}`,
`/api/nodes/search`
- Full test suite passes with no regressions
|
||
|
|
fa3f623bd6 |
feat: add observer retention — remove stale observers after configurable days (#764)
## Summary
Observers that stop actively sending data now get removed after a
configurable retention period (default 14 days).
Previously, observers remained in the `observers` table forever. This
meant nodes that were once observers for an instance but are no longer
connected (even if still active in the mesh elsewhere) would continue
appearing in the observer list indefinitely.
## Key Design Decisions
- **Active data requirement**: `last_seen` is only updated when the
observer itself sends packets (via `stmtUpdateObserverLastSeen`). Being
seen by another node does NOT update this field. So an observer must
actively send data to stay listed.
- **Default: 14 days** — observers not seen in 14 days are removed
- **`-1` = keep forever** — for users who want observers to never be
removed
- **`0` = use default (14 days)** — same as not setting the field
- **Runs on startup + daily ticker** — staggered 3 minutes after metrics
prune to avoid DB contention
## Changes
| File | Change |
|------|--------|
| `cmd/ingestor/config.go` | Add `ObserverDays` to `RetentionConfig`,
add `ObserverDaysOrDefault()` |
| `cmd/ingestor/db.go` | Add `RemoveStaleObservers()` — deletes
observers with `last_seen` before cutoff |
| `cmd/ingestor/main.go` | Wire up startup + daily ticker for observer
retention |
| `cmd/server/config.go` | Add `ObserverDays` to `RetentionConfig`, add
`ObserverDaysOrDefault()` |
| `cmd/server/db.go` | Add `RemoveStaleObservers()` (server-side, uses
read-write connection) |
| `cmd/server/main.go` | Wire up startup + daily ticker, shutdown
cleanup |
| `cmd/server/routes.go` | Admin prune API now also removes stale
observers |
| `config.example.json` | Add `observerDays: 14` with documentation |
| `cmd/ingestor/coverage_boost_test.go` | 4 tests: basic removal, empty
store, keep forever (-1), default (0→14) |
| `cmd/server/config_test.go` | 4 tests: `ObserverDaysOrDefault` edge
cases |
## Config Example
```json
{
"retention": {
"nodeDays": 7,
"observerDays": 14,
"packetDays": 30,
"_comment": "observerDays: -1 = keep forever, 0 = use default (14)"
}
}
```
## Admin API
The `/api/admin/prune` endpoint now also removes stale observers (using
`observerDays` from config) and reports `observers_removed` in the
response alongside `packets_deleted`.
## Test Plan
- [x] `TestRemoveStaleObservers` — old observer removed, recent observer
kept
- [x] `TestRemoveStaleObserversNone` — empty store, no errors
- [x] `TestRemoveStaleObserversKeepForever` — `-1` keeps even year-old
observers
- [x] `TestRemoveStaleObserversDefault` — `0` defaults to 14 days
- [x] `TestObserverDaysOrDefault` (ingestor) —
nil/zero/positive/keep-forever
- [x] `TestObserverDaysOrDefault` (server) —
nil/zero/positive/keep-forever
- [x] Both binaries compile cleanly (`go build`)
- [ ] Manual: verify observer count decreases after retention period on
a live instance
|
||
|
|
ceea136e97 |
feat: observer graph representation (M1+M2) (#774)
## Summary Fixes #753 — Milestones M1 and M2: Observer nodes in the neighbor graph are now correctly labeled, colored, and filterable. ### M1: Label + color observers **Backend** (`cmd/server/neighbor_api.go`): - `buildNodeInfoMap()` now queries the `observers` table after building from `nodes` - Observer-only pubkeys (not already in the map as repeaters etc.) get `role: "observer"` and their name from the observers table - Observer-repeaters keep their repeater role (not overwritten) **Frontend**: - CSS variable `--role-observer: #8b5cf6` added to `:root` - `ROLE_COLORS.observer` was already defined in `roles.js` ### M2: Observer filter checkbox (default unchecked) **Frontend** (`public/analytics.js`): - Observer checkbox added to the role filter section, **unchecked by default** - Observers create hub-and-spoke patterns (one observer can have 100+ edges) that drown out the actual repeater topology — hiding them by default keeps the graph clean - Fixed `applyNGFilters()` which previously always showed observers regardless of checkbox state ### Tests - Backend: `TestBuildNodeInfoMap_ObserverEnrichment` — verifies observer-only pubkeys get name+role from observers table, and observer-repeaters keep their repeater role - All existing Go tests pass - All frontend helper tests pass (544/544) --------- Co-authored-by: you <you@example.com> |
||
|
|
ba7cd0fba7 |
fix: clock skew sanity checks — filter epoch-0, cap drift, min samples (#769)
Nodes with dead RTCs show -690d skew and -3 billion s/day drift. Fix: 1. **No Clock severity**: |skew| > 365d → `no_clock`, skip drift 2. **Drift cap**: |drift| > 86400 s/day → nil (physically impossible) 3. **Min samples**: < 5 samples → no drift regression 4. **Frontend**: 'No Clock' badge, '–' for unreliable drift Fixes the crazy stats on the Clock Health fleet view. --------- Co-authored-by: you <you@example.com> |
||
|
|
6a648dea11 |
fix: multi-byte adopters — all node types, role column, advert precedence (#754) (#767)
## Fix: Multi-Byte Adopters Table — Three Bugs (#754) ### Bug 1: Companions in "Unknown" `computeMultiByteCapability()` was repeater-only. Extended to classify **all node types** (companions, rooms, sensors). A companion advertising with 2-byte hash is now correctly "Confirmed". ### Bug 2: No Role Column Added a **Role** column to the merged Multi-Byte Hash Adopters table, color-coded using `ROLE_COLORS` from `roles.js`. Users can now distinguish repeaters from companions without clicking through to node detail. ### Bug 3: Data Source Disagreement When adopter data (from `computeAnalyticsHashSizes`) shows `hashSize >= 2` but capability only found path evidence ("Suspected"), the advert-based adopter data now takes precedence → "Confirmed". The adopter hash sizes are passed into `computeMultiByteCapability()` as an additional confirmed evidence source. ### Changes - `cmd/server/store.go`: Extended capability to all node types, accept adopter hash sizes, prioritize advert evidence - `public/analytics.js`: Added Role column with color-coded badges - `cmd/server/multibyte_capability_test.go`: 3 new tests (companion confirmed, role populated, adopter precedence) ### Tests - All 10 multi-byte capability tests pass - All 544 frontend helper tests pass - All 62 packet filter tests pass - All 29 aging tests pass --------- Co-authored-by: you <you@example.com> |
||
|
|
29157742eb |
feat: show collision details in Hash Usage Matrix for all hash sizes (#758)
## Summary Shows which prefixes are colliding in the Hash Usage Matrix, making the "PREFIX COLLISIONS: N" count actionable. Fixes #757 ## Changes ### Frontend (`public/analytics.js`) - **Clickable collision count**: When collisions > 0, the stat card is clickable and scrolls to the collision details section. Shows a `▼` indicator. - **3-byte collision table**: The collision risk section and `renderCollisionsFromServer` now render for all hash sizes including 3-byte (was previously hidden/skipped for 3-byte). - **Helpful hint**: 3-byte panel now says "See collision details below" when collisions exist. ### Backend (`cmd/server/collision_details_test.go`) - Test that collision details include correct prefix and node name/pubkey pairs - Test that collision details are empty when no collisions exist ### Frontend Tests (`test-frontend-helpers.js`) - Test clickable stat card renders `onclick` and `cursor:pointer` when collisions > 0 - Test non-clickable card when collisions = 0 - Test collision table renders correct node links (`#/nodes/{pubkey}`) - Test no-collision message renders correctly ## What was already there The backend already returned full collision details (prefix, nodes with pubkeys/names/coords, distance classification) in the `hash-collisions` API. The frontend already had `renderCollisionsFromServer` rendering a rich table with node links. The gap was: 1. The 3-byte tab hid the collision risk section entirely 2. No visual affordance to navigate from the stat count to the details ## Perf justification No new computation — collision data was already computed and returned by the API. The only change is rendering it for 3-byte (same as 1-byte/2-byte). The collision list is already limited by the backend sort+slice pattern. --------- Co-authored-by: you <you@example.com> |
||
|
|
0e286d85fd |
fix: channel query performance — add channel_hash column, SQL-level filtering (#762) (#763)
## Problem Channel API endpoints scan entire DB — 2.4s for channel list, 30s for messages. ## Fix - Added `channel_hash` column to transmissions (populated on ingest, backfilled on startup) - `GetChannels()` rewrites to GROUP BY channel_hash (one row per channel vs scanning every packet) - `GetChannelMessages()` filters by channel_hash at SQL level with proper LIMIT/OFFSET - 60s cache for channel list - Index: `idx_tx_channel_hash` for fast lookups Expected: 2.4s → <100ms for list, 30s → <500ms for messages. Fixes #762 --------- Co-authored-by: you <you@example.com> |
||
|
|
3bdf72b4cf |
feat: clock skew UI — node badges, detail sparkline, fleet analytics (#690 M2+M3) (#752)
## Summary Frontend visualizations for clock skew detection. Implements #690 M2 and M3. Does NOT close #690 — M4+M5 remain. ### M2: Node badges + detail sparkline - Severity badges (⏰ green/yellow/orange/red) on node list next to each node - Node detail: Clock Skew section with current value, severity, drift rate - Inline SVG sparkline showing skew history, color-coded by severity zones ### M3: Fleet analytics view - 'Clock Health' section on Analytics page - Sortable table: Name | Skew | Severity | Drift | Last Advert - Filter buttons by severity (OK/Warning/Critical/Absurd) - Summary stats: X nodes OK, Y warning, Z critical - Color-coded rows ### Changes - `public/nodes.js` — badge rendering + detail section - `public/analytics.js` — fleet clock health view - `public/roles.js` — severity color helpers - `public/style.css` — badge + sparkline + fleet table styles - `cmd/server/clock_skew.go` — added fleet summary endpoint - `cmd/server/routes.go` — wired fleet endpoint - `test-frontend-helpers.js` — 11 new tests --------- Co-authored-by: you <you@example.com> |
||
|
|
401fd070f8 |
fix: improve trackedBytes accuracy for memory estimation (#751)
## Problem Fixes #743 — High memory usage / OOM with relatively small dataset. `trackedBytes` severely undercounted actual per-packet memory because it only tracked base struct sizes and string field lengths, missing major allocations: | Structure | Untracked Cost | Scale Impact | |-----------|---------------|--------------| | `spTxIndex` (O(path²) subpath entries) | 40 bytes × path combos | 50-150MB | | `ResolvedPath` on observations | 24 bytes × elements | ~25MB | | Per-tx maps (`obsKeys`, `observerSet`) | 200 bytes/tx flat | ~11MB | | `byPathHop` index entries | 50 bytes/hop | 20-40MB | This caused eviction to trigger too late (or not at all), leading to OOM. ## Fix Expanded `estimateStoreTxBytes` and `estimateStoreObsBytes` to account for: - **Per-tx maps**: +200 bytes flat for `obsKeys` + `observerSet` map headers - **Path hop index**: +50 bytes per hop in `byPathHop` - **Subpath index**: +40 bytes × `hops*(hops-1)/2` combinations for `spTxIndex` - **Resolved paths**: +24 bytes per `ResolvedPath` element on observations Updated the existing `TestEstimateStoreTxBytes` to match new formula. All existing eviction tests continue to pass — the eviction logic itself is unchanged. Also exposed `avgBytesPerPacket` in the perf API (`/api/perf`) so operators can monitor per-packet memory costs. ## Performance Benchmark confirms negligible overhead (called on every insert): ``` BenchmarkEstimateStoreTxBytes 159M ops 7.5 ns/op 0 B/op 0 allocs BenchmarkEstimateStoreObsBytes 1B ops 1.0 ns/op 0 B/op 0 allocs ``` ## Tests - 6 new tests in `tracked_bytes_test.go`: - Reasonable value ranges for different packet sizes - 10-hop packets estimate significantly more than 2-hop (subpath cost) - Observations with `ResolvedPath` estimate more than without - 15 observations estimate >10x a single observation - `trackedBytes` matches sum of individual estimates after batch insert - Eviction triggers correctly with improved estimates - 2 benchmarks confirming sub-10ns estimate cost - Updated existing `TestEstimateStoreTxBytes` for new formula - Full test suite passes --------- Co-authored-by: you <you@example.com> |
||
|
|
a815e70975 |
feat: Clock skew detection — backend computation (M1) (#746)
## Summary Implements **Milestone 1** of #690 — backend clock skew computation for nodes and observers. ## What's New ### Clock Skew Engine (`clock_skew.go`) **Phase 1 — Raw Skew Calculation:** For every ADVERT observation: `raw_skew = advert_timestamp - observation_timestamp` **Phase 2 — Observer Calibration:** Same packet seen by multiple observers → compute each observer's clock offset as the median deviation from the per-packet median observation timestamp. This identifies observers with their own clock drift. **Phase 3 — Corrected Node Skew:** `corrected_skew = raw_skew + observer_offset` — compensates for observer clock error. **Phase 4 — Trend Analysis:** Linear regression over time-ordered skew samples estimates drift rate in seconds/day. Detects crystal drift vs stable offset vs sudden jumps. ### Severity Classification | Level | Threshold | Meaning | |-------|-----------|---------| | ✅ OK | < 5 min | Normal | | ⚠️ Warning | 5 min – 1 hour | Clock drifting | | 🔴 Critical | 1 hour – 30 days | Likely no time source | | 🟣 Absurd | > 30 days | Firmware default or epoch 0 | ### New API Endpoints - `GET /api/nodes/{pubkey}/clock-skew` — per-node skew data (mean, median, last, drift, severity) - `GET /api/observers/clock-skew` — observer calibration offsets - Clock skew also included in `GET /api/nodes/{pubkey}/analytics` response as `clockSkew` field ### Performance - 30-second compute cache avoids reprocessing on every request - Operates on in-memory `byPayloadType[ADVERT]` index — no DB queries - O(n) in total ADVERT observations, O(m log m) for median calculations ## Tests 15 unit tests covering: - Severity classification at all thresholds - Median/mean math helpers - ISO timestamp parsing - Timestamp extraction from decoded JSON (nested and top-level) - Observer calibration with single and multi-observer scenarios - Observer offset correction direction (verified the sign is `+obsOffset`) - Drift estimation: stable, linear, insufficient data, short time span - JSON number extraction edge cases ## What's NOT in This PR - No UI changes (M2–M4) - No customizer integration (M5) - Thresholds are hardcoded constants (will be configurable in M5) Implements #690 M1. --------- Co-authored-by: you <you@example.com> |
||
|
|
aa84ce1e6a |
fix: correct hash_size detection for transport routes and zero-hop adverts (#747)
## Summary Fixes #744 Fixes #722 Three bugs in hash_size computation caused zero-hop adverts to incorrectly report `hash_size=1`, masking nodes that actually use multi-byte hashes. ## Bugs Fixed ### 1. Wrong path byte offset for transport routes (`computeNodeHashSizeInfo`) Transport routes (types 0 and 3) have 4 transport code bytes before the path byte. The code read the path byte from offset 1 (byte index `RawHex[2:4]`) for all route types. For transport routes, the correct offset is 5 (`RawHex[10:12]`). ### 2. Missing RouteTransportDirect skip (`computeNodeHashSizeInfo`) Zero-hop adverts from `RouteDirect` (type 2) were correctly skipped, but `RouteTransportDirect` (type 3) zero-hop adverts were not. Both have locally-generated path bytes with unreliable hash_size bits. ### 3. Zero-hop adverts not skipped in analytics (`computeAnalyticsHashSizes`) `computeAnalyticsHashSizes()` unconditionally overwrote a node's `hashSize` with whatever the latest advert reported. A zero-hop direct advert with `hash_size=1` could overwrite a previously-correct `hash_size=2` from a multi-hop flood advert. Fix: skip hash_size update for zero-hop direct/transport-direct adverts while still counting the packet and updating `lastSeen`. ## Tests Added - `TestHashSizeTransportRoutePathByteOffset` — verifies transport routes read path byte at offset 5, regular flood reads at offset 1 - `TestHashSizeTransportDirectZeroHopSkipped` — verifies both RouteDirect and RouteTransportDirect zero-hop adverts are skipped - `TestAnalyticsHashSizesZeroHopSkip` — verifies analytics hash_size is not overwritten by zero-hop adverts - Fixed 3 existing tests (`FlipFlop`, `Dominant`, `LatestWins`) that used route_type 0 (TransportFlood) header bytes without proper transport code padding ## Complexity All changes are O(1) per packet — no new loops or data structures. The additional offset computation and zero-hop check are constant-time operations within the existing packet scan loop. Co-authored-by: you <you@example.com> |
||
|
|
84f03f4f41 |
fix: hide undecryptable channel messages by default (#727) (#728)
## Problem Channels page shows 53K 'Unknown' messages — undecryptable GRP_TXT packets with no content. Pure noise. ## Fix - Backend: channels API filters out undecrypted messages by default - `?includeEncrypted=true` param to include them - Frontend: 'Show encrypted' toggle in channels sidebar - Unknown channels grayed out with '(no key)' label - Toggle persists in localStorage Fixes #727 --------- Co-authored-by: you <you@example.com> |
||
|
|
14367488e2 |
fix: TRACE path_json uses path_sz from flags byte, not header hash_size (#732)
## Summary TRACE packets encode their route hash size in the flags byte (`flags & 0x03`), not the header path byte. The decoder was using `path.HashSize` from the header, which could be wrong or zero for direct-route TRACEs, producing incorrect hop counts in `path_json`. ## Protocol Note Per firmware, TRACE packets are **always direct-routed** (route_type 2 = DIRECT, or 3 = TRANSPORT_DIRECT). FLOOD-routed TRACEs (route_type 1) are anomalous — firmware explicitly rejects TRACE via flood. The decoder handles these gracefully without crashing. ## Changes **`cmd/server/decoder.go` and `cmd/ingestor/decoder.go`:** - Read `pathSz` from TRACE flags byte: `(traceFlags & 0x03) + 1` (0→1byte, 1→2byte, 2→3byte) - Use `pathSz` instead of `path.HashSize` for splitting TRACE payload path data into hops - Update `path.HashSize` to reflect the actual TRACE path size - Added `HopsCompleted` field to ingestor `Path` struct for parity with server - Updated comments to clarify TRACE is always direct-routed per firmware **`cmd/server/decoder_test.go` — 5 new tests:** - `TraceFlags1_TwoBytePathSz`: flags=1 → 2-byte hashes via DIRECT route - `TraceFlags2_ThreeBytePathSz`: flags=2 → 3-byte hashes via DIRECT route - `TracePathSzUnevenPayload`: payload not evenly divisible by path_sz - `TraceTransportDirect`: route_type=3 with transport codes + TRACE path parsing - `TraceFloodRouteGraceful`: anomalous FLOOD+TRACE handled without crash All existing TRACE tests (flags=0, 1-byte hashes) continue to pass. Fixes #731 --------- Co-authored-by: you <you@example.com> |
||
|
|
71be54f085 |
feat: DB-backed channel messages for full history (#725 M1) (#726)
## Summary Switches channel API endpoints to query SQLite instead of the in-memory packet store, giving users access to the full message history. Implements #725 (M1 only — DB-backed channel messages). Does NOT close #725 — M2-M5 (custom channels, PSK, persistence, retroactive decryption) remain. ## Problem Channel endpoints (`/api/channels`, `/api/channels/{hash}/messages`) preferred the in-memory packet store when available. The store is bounded by `packetStore.maxMemoryMB` — typically showing only recent messages. The SQLite database has the complete history (weeks/months of channel messages) but was only used as a fallback when the store was nil (never in production). ## Fix Reversed the preference order: DB first, in-memory store fallback. Region filtering added to the DB path. Co-authored-by: you <you@example.com> |
||
|
|
c233c14156 |
feat: CLI tool to decrypt and export hashtag channel messages (#724)
## Summary
Adds `corescope-decrypt` — a standalone CLI tool that decrypts and
exports MeshCore hashtag channel messages from a CoreScope SQLite
database.
### What it does
MeshCore hashtag channels use symmetric encryption with keys derived
from the channel name. The CoreScope ingestor stores **all** GRP_TXT
packets, even those it can't decrypt. This tool enables retroactive
decryption — decrypt historical messages for any channel whose name you
learn after the fact.
### Architecture
- **`internal/channel/`** — Shared crypto package extracted from
ingestor logic:
- `DeriveKey()` — `SHA-256("#name")[:16]`
- `ChannelHash()` — 1-byte packet filter (`SHA-256(key)[0]`)
- `Decrypt()` — HMAC-SHA256 MAC verify + AES-128-ECB
- `ParsePlaintext()` — timestamp + flags + "sender: message" parsing
- **`cmd/decrypt/`** — CLI binary with three output formats:
- `--format json` — Full metadata (observers, path, raw hex)
- `--format html` — Self-contained interactive viewer with search/sort
- `--format irc` (or `log`) — Plain-text IRC-style log, greppable
### Usage
```bash
# JSON export
corescope-decrypt --channel "#wardriving" --db meshcore.db
# Interactive HTML viewer
corescope-decrypt --channel wardriving --db meshcore.db --format html --output wardriving.html
# Greppable log
corescope-decrypt --channel "#wardriving" --db meshcore.db --format irc | grep "KE6QR"
# From Docker
docker exec corescope-prod /app/corescope-decrypt --channel "#wardriving" --db /app/data/meshcore.db
```
### Build & deployment
- Statically linked (`CGO_ENABLED=0`) — zero dependencies
- Added to Dockerfile (available at `/app/corescope-decrypt` in
container)
- CI: builds and tests in go-test job
- CI: attaches linux/amd64 and linux/arm64 binaries to GitHub Releases
on tags
### Testing
- `internal/channel/` — 9 tests: key derivation, encrypt/decrypt
round-trip, MAC rejection, wrong-channel rejection, plaintext parsing
- `cmd/decrypt/` — 7 tests: payload extraction, channel hash
consistency, all 3 output formats, JSON parseability, fixture DB
integration
- Verified against real fixture DB: successfully decrypts 17
`#wardriving` messages
### Limitations
- Hashtag channels only (name-derived keys). Custom PSK channels not
supported.
- No DM decryption (asymmetric, per-peer keys).
- Read-only database access.
Fixes #723
---------
Co-authored-by: you <you@example.com>
|
||
|
|
65482ff6f6 |
fix: cache invalidation tuning — 7% → 50-80% hit rate (#721)
## Cache Invalidation Tuning — 7% → 50-80% Hit Rate Fixes #720 ### Problem Server-side cache hit rate was 7% (48 hits / 631 misses over 4.7 days). Root causes from the [cache audit report](https://github.com/Kpa-clawbot/CoreScope/issues/720): 1. **`invalidationDebounce` config value (30s) was dead code** — never wired to `invCooldown` 2. **`invCooldown` hardcoded to 10s** — with continuous ingest, caches cleared every 10s regardless of their 1800s TTLs 3. **`collisionCache` cleared on every `hasNewTransmissions`** — hash collisions are structural (depend on node count), not per-packet ### Changes | Change | File | Impact | |--------|------|--------| | Wire `invalidationDebounce` from config → `invCooldown` | `store.go` | Config actually works now | | Default `invCooldown` 10s → 300s (5 min) | `store.go` | 30x longer cache survival | | Add `hasNewNodes` flag to `cacheInvalidation` | `store.go` | Finer-grained invalidation | | `collisionCache` only clears on `hasNewNodes` | `store.go` | O(n²) collision computation survives its 1hr TTL | | `addToByNode` returns new-node indicator | `store.go` | Zero-cost detection during indexing | | `indexByNode` returns new-node indicator | `store.go` | Propagates to ingest path | | Ingest tracks and passes `hasNewNodes` | `store.go` | End-to-end wiring | ### Tests Added | Test | What it verifies | |------|-----------------| | `TestInvCooldownFromConfig` | Config value wired to `invCooldown`; default is 300s | | `TestCollisionCacheNotClearedByTransmissions` | `hasNewTransmissions` alone does NOT clear `collisionCache` | | `TestCollisionCacheClearedByNewNodes` | `hasNewNodes` DOES clear `collisionCache` | | `TestCacheSurvivesMultipleIngestCyclesWithinCooldown` | 5 rapid ingest cycles don't clear any caches during cooldown | | `TestNewNodesAccumulatedDuringCooldown` | `hasNewNodes` accumulated in `pendingInv` and applied after cooldown | | `BenchmarkAnalyticsLatencyCacheHitVsMiss` | 100% hit rate with rate-limited invalidation | All 200+ existing tests pass. Both benchmarks show 100% hit rate. ### Performance Justification - **Before:** Effective cache lifetime = `min(TTL, invCooldown)` = 10s. With analytics viewed ~once/few minutes, P(hit) ≈ 7% - **After:** Effective cache lifetime = `min(TTL, 300s)` = 300s for most caches, 3600s for `collisionCache`. Expected hit rate 50-80% - **Complexity:** All changes are O(1) — `addToByNode` already checked `nodeHashes[pubkey] == nil`, we just return the result - **Benchmark proof:** `BenchmarkAnalyticsLatencyCacheHitVsMiss` → 100% hit rate, 269ns/op Co-authored-by: you <you@example.com> |
||
|
|
7af91f7ef6 |
fix: perf page shows tracked memory instead of heap allocation (#718)
## Summary The perf page "Memory Used" tile displayed `estimatedMB` (Go `runtime.HeapAlloc`), which includes all Go runtime allocations — not just packet store data. This made the displayed value misleading: it showed ~2.4GB heap when only ~833MB was actual tracked packet data. ## Changes ### Frontend (`public/perf.js`) - Primary tile now shows `trackedMB` as **"Tracked Memory"** — the self-accounted packet store memory - Added separate **"Heap (debug)"** tile showing `estimatedMB` for runtime visibility ### Backend - **`types.go`**: Added `TrackedMB` field to `HealthPacketStoreStats` struct - **`routes.go`**: Populate `TrackedMB` in `/health` endpoint response from `GetPerfStoreStatsTyped()` - **`routes_test.go`**: Assert `trackedMB` exists in health endpoint's `packetStore` - **`testdata/golden/shapes.json`**: Updated shape fixture with new field ### What was already correct - `/api/perf/stats` already exposed both `estimatedMB` and `trackedMB` - `trackedMemoryMB()` method already existed in store.go - Eviction logic already used `trackedBytes` (not HeapAlloc) ## Testing - All Go tests pass (`go test ./... -count=1`) - No frontend logic changes beyond template string field swap Fixes #717 Co-authored-by: you <you@example.com> |
||
|
|
f95aa49804 |
fix: exclude TRACE packets from multi-byte capability suspected detection (#715)
## Summary Exclude TRACE packets (payload_type 8) from the "suspected" multi-byte capability inference logic. TRACE packets carry hash size in their own flags — forwarding repeaters read it from the TRACE header, not their compile-time `PATH_HASH_SIZE`. Pre-1.14 repeaters can forward multi-byte TRACEs without actually supporting multi-byte hashes, creating false positives. Fixes #714 ## Changes ### `cmd/server/store.go` - In `computeMultiByteCapability()`, skip packets with `payload_type == 8` (TRACE) when scanning `byPathHop` for suspected multi-byte nodes - "Confirmed" detection (from adverts) is unaffected ### `cmd/server/multibyte_capability_test.go` - `TestMultiByteCapability_TraceExcluded`: TRACE packet with 2-byte path does NOT mark repeater as suspected - `TestMultiByteCapability_NonTraceStillSuspected`: Non-TRACE packet with 2-byte path still marks as suspected - `TestMultiByteCapability_ConfirmedUnaffectedByTraceExclusion`: Confirmed status from advert unaffected by TRACE exclusion ## Testing All 7 multi-byte capability tests pass. Full `cmd/server` and `cmd/ingestor` test suites pass. Co-authored-by: you <you@example.com> |
||
|
|
4a7e20a8cb |
fix: redesign memory eviction — self-accounting trackedBytes, watermarks, safety cap (#711)
## Problem `HeapAlloc`-based eviction cascades on large databases — evicts down to near-zero packets because Go runtime overhead exceeds `maxMemoryMB` even with an empty packet store. ## Fix (per Carmack spec on #710) 1. **Self-accounting `trackedBytes`** — running counter maintained on insert/evict, computed from actual struct sizes. No `runtime.ReadMemStats`. 2. **High/low watermark hysteresis** (100%/85%) — evict to 85% of budget, don't re-trigger until 100% crossed again. 3. **25% per-pass safety cap** — never evict more than a quarter of packets in one cycle. 4. **Oldest-first** — evict from sorted head, O(1) candidate selection. `maxMemoryMB` now means packet store budget, not total process heap. Fixes #710 Co-authored-by: you <you@example.com> |
||
|
|
e893a1b3c4 |
fix: index relay hops in byNode for liveness tracking (#708)
## Problem Nodes that only appear as relay hops in packet paths (via `resolved_path`) were never indexed in `byNode`, so `last_heard` was never computed for them. This made relay-only nodes show as dead/stale even when actively forwarding traffic. Fixes #660 ## Root Cause `indexByNode()` only indexed pubkeys from decoded JSON fields (`pubKey`, `destPubKey`, `srcPubKey`). Relay nodes appearing in `resolved_path` were ignored entirely. ## Fix `indexByNode()` now also iterates: 1. `ResolvedPath` entries from each observation 2. `tx.ResolvedPath` (best observation's resolved path, used for DB-loaded packets) A per-call `indexed` set prevents double-indexing when the same pubkey appears in both decoded JSON and resolved path. Extracted `addToByNode()` helper to deduplicate the nodeHashes/byNode append logic. ## Scope **Phase 1 only** — server-side in-memory indexing. No DB changes, no ingestor changes. This makes `last_heard` reflect relay activity with zero risk to persistence. ## Tests 5 new test cases in `TestIndexByNodeResolvedPath`: - Resolved path pubkeys from observations get indexed - Null entries in resolved path are skipped - Relay-only nodes (no decoded JSON match) appear in `byNode` - Dedup between decoded JSON and resolved path - `tx.ResolvedPath` indexed when observations are empty All existing tests pass unchanged. ## Complexity O(observations × path_length) per packet — typically 1-3 observations × 1-3 hops. No hot-path regression. --------- Co-authored-by: you <you@example.com> |
||
|
|
fcba2a9f3d |
fix: set PRAGMA busy_timeout on all RW SQLite connections (#707)
## Problem `SQLITE_BUSY` contention between the ingestor and server's async persistence goroutine drops `resolved_path` and `neighbor_edges` updates. The DSN parameter `_busy_timeout=10000` may not be honored by the modernc/sqlite driver. ## Fix - **`openRW()` now sets `PRAGMA busy_timeout = 5000`** after opening the connection, guaranteeing SQLite retries for up to 5 seconds before returning `SQLITE_BUSY` - **Refactored `PruneOldPackets` and `PruneOldMetrics`** to use `openRW()` instead of duplicating connection setup — all RW connections now get consistent busy_timeout handling - Added test verifying the pragma is set correctly ## Changes | File | Change | |------|--------| | `cmd/server/neighbor_persist.go` | `openRW()` sets `PRAGMA busy_timeout = 5000` after open | | `cmd/server/db.go` | `PruneOldPackets` and `PruneOldMetrics` use `openRW()` instead of inline `sql.Open` | | `cmd/server/neighbor_persist_test.go` | `TestOpenRW_BusyTimeout` verifies pragma is set | ## Performance No performance impact — `PRAGMA busy_timeout` is a connection-level setting with zero overhead on uncontended writes. Under contention, it converts immediate `SQLITE_BUSY` failures into brief retries (up to 5s), which is strictly better than dropping data. Fixes #705 --------- Co-authored-by: you <you@example.com> |
||
|
|
ef8bce5002 |
feat: repeater multi-byte capability inference table (#706)
## Summary Adds a new "Repeater Multi-Byte Capability" section to the Hash Stats analytics tab that classifies each repeater's ability to handle multi-byte hash prefixes (firmware >= v1.14). Fixes #689 ## What Changed ### Backend (`cmd/server/store.go`) - New `computeMultiByteCapability()` method that infers capability for each repeater using two evidence sources: - **Confirmed** (100% reliable): node has advertised with `hash_size >= 2`, leveraging existing `computeNodeHashSizeInfo()` data - **Suspected** (<100%): node's prefix appears as a hop in packets with multi-byte path headers, using the `byPathHop` index. Prefix collisions mean this isn't definitive. - **Unknown**: no multi-byte evidence — could be pre-1.14 or 1.14+ with default settings - Extended `/api/analytics/hash-sizes` response with `multiByteCapability` array ### Frontend (`public/analytics.js`) - New `renderMultiByteCapability()` function on the Hash Stats tab - Color-coded table: green confirmed, yellow suspected, gray unknown - Filter buttons to show all/confirmed/suspected/unknown - Column sorting by name, role, status, evidence, max hash size, last seen - Clickable rows link to node detail pages ### Tests (`cmd/server/multibyte_capability_test.go`) - `TestMultiByteCapability_Confirmed`: advert with hash_size=2 → confirmed - `TestMultiByteCapability_Suspected`: path appearance only → suspected - `TestMultiByteCapability_Unknown`: 1-byte advert only → unknown - `TestMultiByteCapability_PrefixCollision`: two nodes sharing prefix, one confirmed via advert, other correctly marked suspected (not confirmed) ## Performance - `computeMultiByteCapability()` runs once per cache cycle (15s TTL via hash-sizes cache) - Leverages existing `GetNodeHashSizeInfo()` cache (also 15s TTL) — no redundant advert scanning - Path hop scan is O(repeaters × prefix lengths) lookups in the `byPathHop` map, with early break on first match per prefix - Only computed for global (non-regional) requests to avoid unnecessary work --------- Co-authored-by: you <you@example.com> |
||
|
|
922ebe54e7 |
BYOP Advert signature validation (#686)
For BYOP mode in the packet analyzer, perform signature validation on advert packets and display whether successful or not. This is added as we observed many corrupted advert packets that would be easily detectable as such if signature validation checks were performed. At present this MR is just to add this status in BYOP mode so there is minimal impact to the application and no performance penalty for having to perform these checks on all packets. Moving forward it probably makes sense to do these checks on all advert packets so that corrupt packets can be ignored in several contexts (like node lists for example). Let me know what you think and I can adjust as needed. --------- Co-authored-by: you <you@example.com> |
||
|
|
9917d50622 |
fix: resolve neighbor graph duplicate entries from different prefix lengths (#699)
## Problem The neighbor graph creates separate entries for the same physical node when observed with different prefix lengths. For example, a 1-byte prefix `B0` (ambiguous, unresolved) and a 2-byte prefix `B05B` (resolved to Busbee) appear as two separate neighbors of the same node. Fixes #698 ## Solution ### Part 1: Post-build resolution pass (Phase 1.5) New function `resolveAmbiguousEdges(pm, graph)` in `neighbor_graph.go`: - Called after `BuildFromStore()` completes the full graph, before any API use - Iterates all ambiguous edges and attempts resolution via `resolveWithContext` with full graph context - Only accepts high-confidence resolutions (`neighbor_affinity`, `geo_proximity`, `unique_prefix`) — rejects `first_match`/`gps_preference` fallbacks to avoid false positives - Merges with existing resolved edges (count accumulation, max LastSeen) or updates in-place - Phase 1 edge collection loop is **unchanged** ### Part 2: API-layer dedup (defense-in-depth) New function `dedupPrefixEntries()` in `neighbor_api.go`: - Scans neighbor response for unresolved prefix entries matching resolved pubkey entries - Merges counts, timestamps, and observers; removes the unresolved entry - O(n²) on ~50 neighbors per node — negligible cost ### Performance Phase 1.5 runs O(ambiguous_edges × candidates). Per Carmack's analysis: ~50ms at 2K nodes on the 5-min rebuild cycle. Hot ingest path untouched. ## Tests 9 new tests in `neighbor_dedup_test.go`: 1. **Geo proximity resolution** — ambiguous edge resolved when candidate has GPS near context node 2. **Merge with existing** — ambiguous edge merged into existing resolved edge (count accumulation) 3. **No-match preservation** — ambiguous edge left as-is when prefix has no candidates 4. **API dedup** — unresolved prefix merged with resolved pubkey in response 5. **Integration** — node with both 1-byte and 2-byte prefix observations shows single neighbor entry 6. **Phase 1 regression** — non-ambiguous edge collection unchanged 7. **LastSeen preservation** — merge keeps higher LastSeen timestamp 8. **No-match dedup** — API dedup doesn't merge non-matching prefixes 9. **Benchmark** — Phase 1.5 with 500+ edges All existing tests pass (server + ingestor). --------- Co-authored-by: you <you@example.com> |
||
|
|
2e1a4a2e0d |
fix: handle companion nodes without adverts in My Mesh health cards (#696)
## Summary Fixes #665 — companion nodes claimed in "My Mesh" showed "Could not load data" because they never sent an advert, so they had no `nodes` table entry, causing the health API to return 404. ## Three-Layer Fix ### 1. API Resilience (`cmd/server/store.go`) `GetNodeHealth()` now falls back to building a partial response from the in-memory packet store when `GetNodeByPubkey()` returns nil. Returns a synthetic node stub (`role: "unknown"`, `name: "Unknown"`) with whatever stats exist from packets, instead of returning nil → 404. ### 2. Ingestor Cleanup (`cmd/ingestor/main.go`) Removed phantom sender node creation that used `"sender-" + name` as the pubkey. Channel messages don't carry the sender's real pubkey, so these synthetic entries were unreachable from the claiming/health flow — they just polluted the nodes table with unmatchable keys. ### 3. Frontend UX (`public/home.js`) The catch block in `loadMyNodes()` now distinguishes 404 (node not in DB yet) from other errors: - **404**: Shows 📡 "Waiting for first advert — this node has been seen in channel messages but hasn't advertised yet" - **Other errors**: Shows ❓ "Could not load data" (unchanged) ## Tests - Added `TestNodeHealthPartialFromPackets` — verifies a node with packets but no DB entry returns 200 with synthetic node stub and stats - Updated `TestHandleMessageChannelMessage` — verifies channel messages no longer create phantom sender nodes - All existing tests pass (`cmd/server`, `cmd/ingestor`) Co-authored-by: you <you@example.com> |
||
|
|
fcad49594b |
fix: include path.hopsCompleted in TRACE WebSocket broadcasts (#695)
## Summary Fixes #683 — TRACE packets on the live map were showing the full path instead of distinguishing completed vs remaining hops. ## Root Cause Both WebSocket broadcast builders in `store.go` constructed the `decoded` map with only `header` and `payload` keys — `path` was never included. The frontend reads `decoded.path.hopsCompleted` to split trace routes into solid (completed) and dashed (remaining) segments, but that field was always `undefined`. ## Fix For TRACE packets (payload type 9), call `DecodePacket()` on the raw hex during broadcast and include the resulting `Path` struct in `decoded["path"]`. This populates `hopsCompleted` which the frontend already knows how to consume. Both broadcast builders are patched: - `IngestNewFromDB()` — new transmissions path (~line 1419) - `IngestNewObservations()` — new observations path (~line 1680) TRACE packets are infrequent, so the per-packet decode overhead is negligible. ## Testing - Added `TestIngestTraceBroadcastIncludesPath` — verifies that TRACE broadcast maps include `decoded.path` with correct `hopsCompleted` value - All existing tests pass (`cmd/server` + `cmd/ingestor`) Co-authored-by: you <you@example.com> |
||
|
|
34e7366d7c |
test: add RouteTransportDirect zero-hop cases to ingestor decoder tests (#684)
## Summary Closes the symmetry gap flagged as a nit in PR #653 review: > The ingestor decoder tests omit `RouteTransportDirect` zero-hop tests — only the server decoder has those. Since the logic is identical, this is not a blocker, but adding them would make the test suites symmetric. - Adds `TestZeroHopTransportDirectHashSize` — `pathByte=0x00`, expects `HashSize=0` - Adds `TestZeroHopTransportDirectHashSizeWithNonZeroUpperBits` — `pathByte=0xC0` (hash_size bits set, hash_count=0), expects `HashSize=0` Both mirror the equivalent tests already present in `cmd/server/decoder_test.go`. ## Test plan - [ ] `cd cmd/ingestor && go test -run TestZeroHopTransportDirect -v` → both new tests pass - [ ] `cd cmd/ingestor && go test ./...` → no regressions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
22bf33700e |
Fix: filter path-hop candidates by resolved_path to prevent prefix collisions (#658)
## Problem
The "Paths Through This Node" API endpoint (`/api/nodes/{pubkey}/paths`)
returns unrelated packets when two nodes share a hex prefix. For
example, querying paths for "Kpa Roof Solar" (`c0dedad4...`) returns 316
packets that actually belong to "C0ffee SF" (`C0FFEEC7...`) because both
share the `c0` prefix in the `byPathHop` index.
Fixes #655
## Root Cause
`handleNodePaths()` in `routes.go` collects candidates from the
`byPathHop` index using 2-char and 4-char hex prefixes for speed, but
never verifies that the target node actually appears in each candidate's
resolved path. The broad index lookup is intentional, but the
**post-filter was missing**.
## Fix
Added `nodeInResolvedPath()` helper in `store.go` that checks whether a
transmission's `resolved_path` (from the neighbor affinity graph via
`resolveWithContext`) contains the target node's full pubkey. The
filter:
- **Includes** packets where `resolved_path` contains the target node's
full pubkey
- **Excludes** packets where `resolved_path` resolved to a different
node (prefix collision)
- **Excludes** packets where `resolved_path` is nil/empty (ambiguous —
avoids false positives)
The check examines both the best observation's resolved_path
(`tx.ResolvedPath`) and all individual observations, so packets are
included if *any* observation resolved the target.
## Tests
- `TestNodeInResolvedPath` — unit test for the helper with 5 cases
(match, different node, nil, all-nil elements, match in observation
only)
- `TestNodePathsPrefixCollisionFilter` — integration test: two nodes
sharing `aa` prefix, verifies the collision packet is excluded from one
and included for the other
- Updated test DB schema to include `resolved_path` column and seed data
with resolved pubkeys
- All existing tests pass (165 additions, 8 modifications)
## Performance
No impact on hot paths. The filter runs once per API call on the
already-collected candidate set (typically small). `nodeInResolvedPath`
is O(observations × hops) per candidate — negligible since observations
per transmission are typically 1–5.
---------
Co-authored-by: you <you@example.com>
|
||
|
|
7d71dc857b |
feat: expose hopsCompleted for TRACE packets, show real path on live map (#656)
## Summary TRACE packets on the live map previously animated the **full intended route** regardless of how far the trace actually reached. This made it impossible to distinguish a completed route from a failed one — undermining the primary diagnostic purpose of trace packets. ## Changes ### Backend — `cmd/server/decoder.go` - Added `HopsCompleted *int` field to the `Path` struct - For TRACE packets, the header path contains SNR bytes (one per hop that actually forwarded). Before overwriting `path.Hops` with the full intended route from the payload, we now capture the header path's `HashCount` as `hopsCompleted` - This field is included in API responses and WebSocket broadcasts via the existing JSON serialization ### Frontend — `public/live.js` - For TRACE packets with `hopsCompleted < totalHops`: - Animate only the **completed** portion (solid line + pulse) - Draw the **unreached** remainder as a dashed/ghosted line (25% opacity, `6,8` dash pattern) with ghost markers - Dashed lines and ghost markers auto-remove after 10 seconds - When `hopsCompleted` is absent or equals total hops, behavior is unchanged ### Tests — `cmd/server/decoder_test.go` - `TestDecodePacket_TraceHopsCompleted` — partial completion (2 of 4 hops) - `TestDecodePacket_TraceNoSNR` — zero completion (trace not forwarded yet) - `TestDecodePacket_TraceFullyCompleted` — all hops completed ## How it works The MeshCore firmware appends an SNR byte to `pkt->path[]` at each hop that forwards a TRACE packet. The count of these SNR bytes (`path_len`) indicates how far the trace actually got. CoreScope's decoder already parsed the header path, but the TRACE-specific code overwrote it with the payload hops (full intended route) without preserving the progress information. Now we save that count first. Fixes #651 --------- Co-authored-by: you <you@example.com> |
||
|
|
088b4381c3 |
Fix: Hash Stats 'By Repeaters' includes non-repeater nodes (#654)
## Summary The "By Repeaters" section on the Hash Stats analytics page was counting **all** node types (companions, room servers, sensors, etc.) instead of only repeaters. This made the "By Repeaters" distribution identical to "Multi-Byte Hash Adopters", defeating the purpose of the breakdown. Fixes #652 ## Root Cause `computeAnalyticsHashSizes()` in `cmd/server/store.go` built its `byNode` map from advert packet data without cross-referencing node roles from the node store. Both `distributionByRepeaters` and `multiByteNodes` consumed this unfiltered map. ## Changes ### `cmd/server/store.go` - Build a `nodeRoleByPK` lookup map from `getCachedNodesAndPM()` at the start of the function - Store `role` in each `byNode` entry when processing advert packets - **`distributionByRepeaters`**: filter to only count nodes whose role contains "repeater" - **`multiByteNodes`**: include `role` field in output so the frontend can filter/group by node type ### `cmd/server/coverage_test.go` - Add `TestHashSizesDistributionByRepeatersFiltersRole`: verifies that companion nodes are excluded from `distributionByRepeaters` but included in `multiByteNodes` with correct role ### `cmd/server/routes_test.go` - Fix `TestHashAnalyticsZeroHopAdvert`: invalidate node cache after DB insert so role lookup works - Fix `TestAnalyticsHashSizeSameNameDifferentPubkey`: insert node records as repeaters + invalidate cache ## Testing All `cmd/server` tests pass (68 insertions, 3 deletions across 3 files). Co-authored-by: you <you@example.com> |
||
|
|
144e98bcdf |
fix: hide hash size for zero-hop direct adverts (#649) (#653)
## Fix: Zero-hop DIRECT packets report bogus hash_size Closes #649 ### Problem When a DIRECT packet has zero hops (pathByte lower 6 bits = 0), the generic `hash_size = (pathByte >> 6) + 1` formula produces a bogus value (1-4) instead of 0/unknown. This causes incorrect hash size displays and analytics for zero-hop direct adverts. ### Solution **Frontend (JS):** - `packets.js` and `nodes.js` now check `(pathByte & 0x3F) === 0` to detect zero-hop packets and suppress bogus hash_size display. **Backend (Go):** - Both `cmd/server/decoder.go` and `cmd/ingestor/decoder.go` reset `HashSize=0` for DIRECT packets where `pathByte & 0x3F == 0` (hash_count is zero). - TRACE packets are excluded since they use hashSize to parse hop data from the payload. - The condition uses `pathByte & 0x3F == 0` (not `pathByte == 0x00`) to correctly handle the case where hash_size bits are non-zero but hash_count is zero — matching the JS frontend approach. ### Testing **Backend:** - Added 4 tests each in `cmd/server/decoder_test.go` and `cmd/ingestor/decoder_test.go`: - DIRECT + pathByte 0x00 → HashSize=0 ✅ - DIRECT + pathByte 0x40 (hash_size bits set, hash_count=0) → HashSize=0 ✅ - Non-DIRECT + pathByte 0x00 → HashSize=1 (unchanged) ✅ - DIRECT + pathByte 0x01 (1 hop) → HashSize=1 (unchanged) ✅ - All existing tests pass (`go test ./...` in both cmd/server and cmd/ingestor) **Frontend:** - Verified hash size display is suppressed for zero-hop direct adverts --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: you <you@example.com> |
||
|
|
0f5e2db5cf |
feat: auto-generated OpenAPI 3.0 spec endpoint + Swagger UI (#530) (#632)
## Summary
Auto-generated OpenAPI 3.0.3 spec endpoint (`/api/spec`) and Swagger UI
(`/api/docs`) for the CoreScope API.
## What
- **`cmd/server/openapi.go`** — Route metadata map
(`routeDescriptions()`) + spec builder that walks the mux router to
generate a complete OpenAPI 3.0.3 spec at runtime. Includes:
- All 47 API endpoints grouped by tag (admin, analytics, channels,
config, nodes, observers, packets)
- Query parameter documentation for key endpoints (packets, nodes,
search, resolve-hops)
- Path parameter extraction from mux `{name}` patterns
- `ApiKeyAuth` security scheme for API-key-protected endpoints
- Swagger UI served as a self-contained HTML page using unpkg CDN
- **`cmd/server/openapi_test.go`** — Tests for spec endpoint (validates
JSON structure, required fields, path count, security schemes,
self-exclusion of `/api/spec` and `/api/docs`), Swagger UI endpoint, and
`extractPathParams` helper.
- **`cmd/server/routes.go`** — Stores router reference on `Server`
struct for spec generation; registers `/api/spec` and `/api/docs`
routes.
## Design Decisions
- **Runtime spec generation** vs static YAML: The spec walks the actual
router, so it can never drift from registered routes. Route metadata
(summaries, descriptions, tags, auth flags) is maintained in a parallel
map — the test enforces minimum path count to catch drift.
- **No external dependencies**: Uses only stdlib + existing gorilla/mux.
Swagger UI loaded from unpkg CDN (no vendored assets).
- **Security tagging**: Auth-protected endpoints (those behind
`requireAPIKey` middleware) are tagged with `security: [{ApiKeyAuth:
[]}]` in the spec, matching the actual middleware configuration.
## Testing
- `go test -run TestOpenAPI` — validates spec structure, field presence,
path count ≥ 20, security schemes
- `go test -run TestSwagger` — validates HTML response with swagger-ui
references
- `go test -run TestExtractPathParams` — unit tests for path parameter
extraction
---------
Co-authored-by: you <you@example.com>
|
||
|
|
a068e3e086 |
feat: zero-config defaults + deployment docs (M3-M4, #610) (#631)
## Zero-Config Defaults + Deployment Docs
Make CoreScope start with zero configuration — no `config.json`
required. The ingestor falls back to sensible defaults (local MQTT
broker, standard topics, default DB path) when no config file exists.
### What changed
**`cmd/ingestor/config.go`** — `LoadConfig` no longer errors on missing
config file. Instead it logs a message and uses defaults. If no MQTT
sources are configured (from file or env), defaults to
`mqtt://localhost:1883` with `meshcore/#` topic.
**`cmd/ingestor/main.go`** — Removed redundant "no MQTT sources" fatal
(now handled in config layer). Improved the "no connections established"
fatal with actionable hints.
**`README.md`** — Replaced "Docker (Recommended)" section with a
one-command quickstart using the pre-built image. No build step, no
config file, just `docker run`.
**`docs/deployment.md`** — New comprehensive deployment guide covering
Docker, Compose, config reference, MQTT setup, TLS/HTTPS, monitoring,
backup, and troubleshooting.
### Zero-config flow
```
docker run -d -p 80:80 -v corescope-data:/app/data ghcr.io/kpa-clawbot/corescope:latest
```
1. No config.json found → defaults used, log message printed
2. No MQTT sources → defaults to `mqtt://localhost:1883`
3. Internal Mosquitto broker already running in container → connection
succeeds
4. Dashboard shows empty, ready for packets
### Review fixes (commit
|
||
|
|
dc5b5ce9a0 |
fix: reject weak/default API keys + startup warning (#532) (#628)
## Summary Hardens API key security for write endpoints (fixes #532): 1. **Constant-time comparison** — uses `crypto/subtle.ConstantTimeCompare` to prevent timing attacks on API key validation 2. **Weak key blocklist** — rejects known default/example keys (`test`, `password`, `change-me`, `your-secret-api-key-here`, etc.) 3. **Minimum length enforcement** — keys shorter than 16 characters are rejected 4. **Startup warning** — logs a clear warning if the configured key is weak or a known default 5. **Generic error messages** — HTTP 403 response uses opaque "forbidden" message to prevent information leakage about why a key was rejected ### Security Model - **Empty key** → all write endpoints disabled (403) - **Weak/default key** → all write endpoints disabled (403), startup warning logged - **Wrong key** → 401 unauthorized - **Strong correct key** → request proceeds ### Files Changed - `cmd/server/config.go` — `IsWeakAPIKey()` function + blocklist - `cmd/server/routes.go` — constant-time comparison via `constantTimeEqual()`, weak key rejection - `cmd/server/main.go` — startup warning for weak keys - `cmd/server/apikey_security_test.go` — comprehensive test coverage - `cmd/server/routes_test.go` — existing tests updated to use strong keys ### Reviews - ✅ Self-review: all security properties verified - ✅ djb Final Review: timing fix correct, blocklist pragmatic, error messages opaque, tests comprehensive. **Verdict: Ship it.** ### Test Results All existing + new tests pass. Coverage includes: weak key detection (blocklist + length + case-insensitive), empty key handling, strong key acceptance, wrong key rejection, and constant-time comparison. --------- Co-authored-by: you <you@example.com> |
||
|
|
30e7e9ae3c |
docs: document lock ordering for cacheMu and channelsCacheMu (#624)
## Summary Documents the lock ordering for all five mutexes in `PacketStore` (`store.go`) to prevent future deadlocks. ## What changed Added a comment block above the `PacketStore` struct documenting: - All 5 mutexes (`mu`, `cacheMu`, `channelsCacheMu`, `groupedCacheMu`, `regionObsMu`) - What each mutex guards - The required acquisition order (numbered 1–5) - The nesting relationships that exist today (`cacheMu → channelsCacheMu` in `invalidateCachesFor` and `rebuildAnalyticsCaches`) - Confirmation that no reverse ordering exists (no deadlock risk) ## Verification - Grepped all lock acquisition sites to confirm no reverse nesting exists - `go build ./...` passes — documentation-only change Fixes #413 --------- Co-authored-by: you <you@example.com> |
||
|
|
05fbcb09dd |
fix: wire cacheTTL.analyticsHashSizes config to collision cache (#420) (#622)
## Summary Fixes #420 — wires `cacheTTL` config values to server-side cache durations that were previously hardcoded. ## Problem `collisionCacheTTL` was hardcoded at 60s in `store.go`. The config has `cacheTTL.analyticsHashSizes: 3600` (1 hour) but it was never read — the `/api/config/cache` endpoint just passed the raw map to the client without applying values server-side. ## Changes - **`store.go`**: Add `cacheTTLSec()` helper to safely extract duration values from the `cacheTTL` config map. `NewPacketStore` now accepts an optional `cacheTTL` map (variadic, backward-compatible) and wires: - `cacheTTL.analyticsHashSizes` → `collisionCacheTTL` - `cacheTTL.analyticsRF` → `rfCacheTTL` - **Default changed**: `collisionCacheTTL` default raised from 60s → 3600s (1 hour). Hash collision computation is expensive and data changes rarely — 60s was causing unnecessary recomputation. - **`main.go`**: Pass `cfg.CacheTTL` to `NewPacketStore`. - **Tests**: Added `TestCacheTTLFromConfig` and `TestCacheTTLDefaults` in eviction_test.go. Updated existing `TestHashCollisionsCacheTTL` for the new default. ## Audit of other cacheTTL values The remaining `cacheTTL` keys (`stats`, `nodeDetail`, `nodeHealth`, `nodeList`, `bulkHealth`, `networkStatus`, `observers`, `channels`, `channelMessages`, `analyticsTopology`, `analyticsChannels`, `analyticsSubpaths`, `analyticsSubpathDetail`, `nodeAnalytics`, `nodeSearch`, `invalidationDebounce`) are **client-side only** — served via `/api/config/cache` and consumed by the frontend. They don't have corresponding server-side caches to wire to. The only server-side caches (`rfCache`, `topoCache`, `hashCache`, `chanCache`, `distCache`, `subpathCache`, `collisionCache`) all use either `rfCacheTTL` or `collisionCacheTTL`, both now configurable. ## Complexity O(1) config lookup at store init time. No hot-path impact. Co-authored-by: you <you@example.com> |