mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-06-07 20:51:48 +00:00
ea78581eea94caaca03e62cdf7e3500f8031c0ff
1411 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
ea78581eea |
fix(#858): packets/hour chart — bars rendering + x-axis label decimation (#865)
Two bugs in the Overview tab Packets/Hour chart: 1. **Bars not rendering**: `barW` went negative when `data.length` was large (e.g. 720 hours for 30-day range), producing zero-width invisible bars. Fix: `Math.max(1, ...)` floor on bar width. 2. **X-axis labels overlapping**: Every single hour label was emitted (`02h03h04h...`). Fix: decimate labels based on time range — every 6h for ≤24h, every 12h for ≤72h, every 24h beyond. Shows `MM-DD` on midnight boundaries for multi-day ranges. **Scope**: Only touches the Overview tab `Packets / Hour` section and the shared `barChart` floor (one-line change). No modifications to Topology, Channels, Distance, or other tabs. Fixes #858 Co-authored-by: you <you@example.com> |
||
|
|
b5372d6f73 |
fix(#859): remove opacity gradient from Per-Observer Reachability rows (#863)
Fixes #859 ## What The "Per-Observer Reachability" and "Best Path to Each Node" sections in the Topology tab had inline `opacity` styles on each `.reach-ring` row that decreased with hop count (`1 - hops * 0.06`, floored at 0.3). This made text progressively darker/unreadable toward the bottom. ## Fix Removed the inline `opacity:${opacity}` style from both `renderPerObserverReach()` and `renderBestPath()`. The rows now render at full opacity with text colors governed by CSS variables as intended. ## Changed - `public/analytics.js`: removed opacity computation and inline style in two functions (4 lines removed, 2 added) ## Scope Only touches Per-Observer Reachability and Best Path rendering. No changes to Overview, Channels, or shared helpers. Co-authored-by: you <you@example.com> |
||
|
|
5afed0951b |
fix(#860): cap channel timeline chart to top 8 by volume (#864)
## What & Why The "Messages / Hour by Channel" chart on `/#/analytics` Channels tab rendered all channels in both the SVG and legend, causing legend overflow when 20+ channels are present. ## Fix - Sort channels by total message volume (descending) - Render only the top 8 in the chart and legend - Show "+N more" in the legend when channels are truncated - `maxCount` for Y-axis scaling is computed from visible channels only, so the chart uses its full vertical range Single-file change: `public/analytics.js` — only `renderChannelTimeline()` modified. No shared helpers touched. Fixes #860 Co-authored-by: you <you@example.com> |
||
|
|
3630a32310 |
fix(#852): transport-route path_len offset + var(--muted) → var(--text-muted) (#853)
## Problem Two pre-existing bugs found during expert review of #851: ### 1. `hashSize` derivation ignores transport route types `public/packets.js` hardcoded path-length byte at offset 1: ```js const rawPathByte = pkt.raw_hex ? parseInt(pkt.raw_hex.slice(2, 4), 16) : NaN; ``` For transport routes (`route_type` 0 DIRECT or 3 TRANSPORT_ROUTE_FLOOD), bytes 1–4 are `next_hop` + `last_hop` and path-length is at offset 5. Same bug #846 fixed inside the byte-breakdown function. ### 2. `var(--muted)` CSS variable is undefined Used in 6 places in `public/packets.js`. No `--muted` variable is defined anywhere in `public/*.css` — only `--text-muted` exists. Text styled with `var(--muted)` silently falls through to inherited color, making badges/hints invisible. ## Fix ### Fix 1: transport-route path_len offset ```js const plOff = (pkt.route_type === 0 || pkt.route_type === 3) ? 5 : 1; const rawPathByte = pkt.raw_hex ? parseInt(pkt.raw_hex.slice(plOff * 2, plOff * 2 + 2), 16) : NaN; ``` ### Fix 2: `var(--muted)` → `var(--text-muted)` All 6 occurrences replaced. ## Tests (5 new, 572 total) - `hashSize` extraction for flood route (route_type=1, offset 1) - `hashSize` extraction for direct transport route (route_type=0, offset 5) - `hashSize` extraction for transport route flood (route_type=3, offset 5) - `hashSize` returns null for missing raw_hex - Regression guard: no `var(--muted)` in any `public/` JS/CSS file ## Changes - `public/packets.js`: 7 lines changed (1 offset fix + 6 CSS var fixes) - `test-frontend-helpers.js`: 46 lines added (5 tests) Closes #852 --------- Co-authored-by: you <you@example.com> |
||
|
|
ff05db7367 |
ci: fix staging smoke test port — read STAGING_GO_HTTP_PORT, not hardcoded 82 (#854)
## Problem The "Deploy Staging" job's Smoke Test always fails with `Staging /api/stats did not return engine field`. Root cause: the step hardcodes `http://localhost:82/api/stats`, but `docker-compose.staging.yml:21` publishes the container on `${STAGING_GO_HTTP_PORT:-80}:80`. Default is port 80, not 82. curl gets ECONNREFUSED, `-sf` swallows the error, `grep -q engine` sees empty input → failure. Verified on staging VM: `ss -lntp` shows only `:80` listening; `docker ps` confirms `0.0.0.0:80->80/tcp`. A `curl http://localhost:82` returns connection-refused. ## Fix Read `STAGING_GO_HTTP_PORT` (same default as compose) so the smoke test tracks the port the container was actually launched on. Failure message now includes the resolved port to make future port mismatches self-diagnosing. ## Tested Logic only — the curl + grep pattern is unchanged. If any CI env override sets `STAGING_GO_HTTP_PORT`, the smoke test now follows it. Co-authored-by: Kpa-clawbot <agent@corescope.local> |
||
|
|
441409203e |
feat(#845): bimodal_clock severity — surface flaky-RTC nodes instead of hiding as 'No Clock' (#850)
## Problem Nodes with flaky RTC (firmware emitting interleaved good and nonsense timestamps) were classified as `no_clock` because the broken samples poisoned the recent median. Operators lost visibility into these nodes — they showed "No Clock" even though ~60% of their adverts had valid timestamps. Observed on staging: a node with 31K samples where recent adverts interleave good skew (-6.8s, -13.6s) with firmware nonsense (-56M, -60M seconds). Under the old logic, median of the mixed window → `no_clock`. ## Solution New `bimodal_clock` severity tier that surfaces flaky-RTC nodes with their real (good-sample) skew value. ### Classification order (first match wins) | Severity | Good Fraction | Description | |----------|--------------|-------------| | `no_clock` | < 10% | Essentially no real clock | | `bimodal_clock` | 10–80% (and bad > 0) | Mixed good/bad — flaky RTC | | `ok`/`warn`/`critical`/`absurd` | ≥ 80% | Normal classification | "Good" = `|skew| <= 1 hour`; "bad" = likely uninitialized RTC nonsense. When `bimodal_clock`, `recentMedianSkewSec` is computed from **good samples only**, so the dashboard shows the real working-clock value (e.g. -7s) instead of the broken median. ### Backend changes - New constant `BimodalSkewThresholdSec = 3600` - New severity `bimodal_clock` in classification logic - New API fields: `goodFraction`, `recentBadSampleCount`, `recentSampleCount` ### Frontend changes - Amber `Bimodal` badge with tooltip showing bad-sample percentage - Bimodal nodes render skew value like ok/warn/severe (not the "No Clock" path) - Warning line below sparkline: "⚠️ X of last Y adverts had nonsense timestamps (likely RTC reset)" ### Tests - 3 new Go unit tests: bimodal (60% good → bimodal_clock), all-bad (→ no_clock), 90%-good (→ ok) - 1 new frontend test: bimodal badge rendering with tooltip - Existing `TestReporterScenario_789` passes unchanged Builds on #789 (recent-window severity). Closes #845 --------- Co-authored-by: you <you@example.com> |
||
|
|
a371d35bfd |
feat(#847): dedupe Top Longest Hops by pair + add obs count and SNR cues (#848)
## Problem The "Top 20 Longest Hops" RF analytics card shows the same repeater pair filling most slots because the query sorts raw hop records by distance with no pair deduplication. A single long link observed 12+ times dominates the leaderboard. ## Fix Dedupe by unordered `(pk1, pk2)` pair. Per pair, keep the max-distance record and compute reliability metrics: | Column | Description | |--------|-------------| | **Obs** | Total observations of this link | | **Best SNR** | Maximum SNR seen (dB) | | **Median SNR** | Median SNR across all observations (dB) | Tooltip on each row shows the timestamp of the best observation. ### Before | # | From | To | Distance | Type | SNR | Packet | |---|------|----|----------|------|-----|--------| | 1 | NodeX | NodeY | 200 mi | R↔R | 5 dB | abc… | | 2 | NodeX | NodeY | 199 mi | R↔R | 6 dB | def… | | 3 | NodeX | NodeY | 198 mi | R↔R | 4 dB | ghi… | ### After | # | From | To | Distance | Type | Obs | Best SNR | Median SNR | Packet | |---|------|----|----------|------|-----|----------|------------|--------| | 1 | NodeX | NodeY | 200 mi | R↔R | 12 | 8.0 dB | 5.2 dB | abc… | | 2 | NodeA | NodeB | 150 mi | C↔R | 3 | 6.5 dB | 6.5 dB | jkl… | ## Changes - **`cmd/server/store.go`**: Group `filteredHops` by unordered pair key, accumulate obs count / best SNR / median SNR per group, sort by max distance, take top 20 - **`cmd/server/types.go`**: Update `DistanceHop` struct — replace `SNR` with `BestSnr`, `MedianSnr`, add `ObsCount` - **`public/analytics.js`**: Replace single SNR column with Obs, Best SNR, Median SNR; add row tooltip with best observation timestamp - **`cmd/server/store_tophops_test.go`**: 3 unit tests — basic dedupe, reverse-pair merge, nil SNR edge case ## Test Coverage - `TestDedupeTopHopsByPair`: 5 records on pair (A,B) + 1 on (C,D) → 2 results, correct obsCount/dist/bestSnr/medianSnr - `TestDedupeTopHopsReversePairMerges`: (B,A) and (A,B) merge into one entry - `TestDedupeTopHopsNilSNR`: all-nil SNR records → bestSnr and medianSnr both nil - Existing `TestAnalyticsRFEndpoint` and `TestAnalyticsRFWithRegion` still pass Closes #847 --------- Co-authored-by: you <you@example.com> |
||
|
|
7c01a97178 |
fix(#849): Packet Detail dialog — show exact clicked observation, not cross-observer aggregate (#851)
## Problem The Packet Detail dialog summary (Observer, Path, Hops, SNR/RSSI, Timestamp) used the **aggregated cross-observer view** (`_parsedPath` / `getParsedPath(pkt)`), which contradicted the byte breakdown after #844. A packet observed with 2 hops by one observer would show "Path: 7 hops" in the summary because it merged all observers' paths. ## Fix The dialog is now **per-observation**: - `renderDetail` resolves a `currentObservation` from `selectedObservationId` (set when clicking an observation child row) or defaults to `observations[0]` - All summary fields read from the current observation: Observer, SNR/RSSI, Timestamp, Path, Direction - Hop count badge comes from `path_len & 0x3F` of the observation's `raw_hex` (firmware truth, same source as byte breakdown). Cross-checked against `path_json` length — logs a console warning on mismatch - **Observations table** rendered inside the detail panel when multiple observations exist. Clicking a row updates `currentObservation` and re-renders the summary in-place (no dialog close/reopen) - `.observation-current` CSS class highlights the selected observation row ### Cross-observer aggregate (Option B) A read-only "Cross-observer aggregate" section below the observations table shows the longest observed path across all observers. This is **not** the default view — it's always visible as secondary context. ## Tests 8 new tests in `test-frontend-helpers.js`: - Hop count extraction from raw_hex (normal, direct, transport route types) - Inconsistency detection between path_json and raw_hex - Per-observation field override of aggregated packet fields - First observation used when no specific observation selected - Observation row click selects that observation - Null/missing raw_hex handling All 572 tests pass (564 frontend + 62 filter + 29 aging). ## Acceptance - Summary shows per-observation path/hops/SNR/RSSI/timestamp - Switching observations in the detail updates everything - Cross-observer aggregate available as secondary section - Byte breakdown untouched (owned by #846) ## Related - Closes #849 - Related: #844 (#846) — byte breakdown fix (separate PR, different code region) --------- Co-authored-by: you <you@example.com> |
||
|
|
f1eea9ee3c |
fix(#844): Packet Byte Breakdown — derive hop count from path_len, not aggregated _parsedPath (#846)
## Problem The Packet Detail dialog's "Packet Byte Breakdown" section was using the aggregated `_parsedPath` (longest path observed across all observers) to render hop entries, instead of deriving the hop count from the `path_len` byte in `raw_hex`. This caused: - Wrong hop count (e.g., "Path (7 hops)" when `raw_hex` only contains 2) - Hop values from the aggregated path displayed at incorrect byte offsets - Subsequent fields (pubkey, timestamp, signature) rendered at wrong offsets because `off` was advanced by the wrong amount ## Fix In `buildFieldTable()` (packets.js), the Path section now: 1. Derives `hashCountVal` from `path_len & 0x3F` (firmware truth per `Packet.h:79-83`) 2. Derives `hashSize` from `(path_len >> 6) + 1` 3. Reads each hop's hex value directly from `raw_hex` at the correct byte offset 4. Advances `off` by `hashSize * hashCountVal` 5. Skips the Path section entirely when `hashCountVal === 0` (direct advert) The "Path" summary section above the breakdown (which uses the aggregated path for route visualization) is unchanged — only the byte breakdown is fixed. ## Tests 3 new tests in `test-frontend-helpers.js`: - Verifies 2 hops rendered (not 7) when `path_len=0x42` despite 7-hop aggregated path - Verifies pubkey offset is 6 (not 16) after a 2-hop path - Verifies direct advert (`hashCount=0`) skips Path section Also fixed pre-existing `HopDisplay is not defined` failures in the `#765` transport offset test sandbox (added mock). All 559 tests pass. Closes #844 --------- Co-authored-by: you <you@example.com> |
||
|
|
f30e6bef28 |
qa(plan): reconcile §8.2/§5.3/§6.2 + add §8.7 (Recent Packets readability) (#838)
Doc-only reconciliation of v3.6.0-rc plan with what actually shipped. ## Changes - **§8.2** — desktop deep link now opens full-screen view (post-#823/#824), not split panel as the plan still asserted. - **§5.3** — pin that severity now derives from `recentMedianSkewSec` (#789), not the all-time `medianSkewSec` — a re-tester needs to know which field drives the badge. - **§6.2** — pin the existing observer-graph element location (`public/analytics.js:2048-2051`). - **New §8.7** — side-panel "Recent Packets" entries must navigate to a valid packet detail (DB-fallback per #827) AND text must be readable in the current theme (explicit color per #829). Both bugs surfaced this session. No CI gates. Co-authored-by: Kpa-clawbot <agent@corescope.local> |
||
|
|
20f456da58 |
fix(#840): map popup 'Show Neighbors' link does nothing on iOS Safari (#841)
Closes #840 ## What Switch the map-popup "Show Neighbors" link from `<a href="#">` to `<a href="javascript:void(0)" role="button">` so iOS Safari doesn't navigate when the document-level click delegation fails to fire. ## Why On iOS Safari, when a user taps the link inside a Leaflet popup: - The document-level click delegation at `public/map.js:927` calls `e.preventDefault()` and triggers `selectReferenceNode`. - BUT inside a Leaflet popup, `L.DomEvent.disableClickPropagation()` is internally applied to popup content — on iOS Safari the click sometimes doesn't bubble to `document`. - When that happens, the browser's default `<a href="#">` action runs: - hash becomes empty (`#`) - `navigate()` in `app.js:458` sees empty hash → defaults to `'packets'` - map page is destroyed mid-tap → user perceives "nothing happened" (or a brief flash if they back-button) `href="javascript:void(0)"` removes the navigation fall-through entirely. The `role="button"` keeps a11y semantics, `cursor:pointer` keeps the visual cue. ## Tested - Headless Chromium desktop + iPhone 13 emulation: tap fires `/api/nodes/{pk}/neighbors?min_count=3`, marker count drops from 441 → 44, `#mcNeighbors` checkbox toggles on, URL stays on `/#/map`. Same as before. - Frontend helpers: 556/0 - Real iOS Safari fix verification needs a physical-device test post-deploy ## Out of scope (follow-up) - Same `<a href="#">` pattern exists for the topright "Close route" control at `public/map.js:389` — uses `L.DomEvent.preventDefault` so should work, but worth auditing if the symptom recurs. Co-authored-by: Kpa-clawbot <agent@corescope.local> |
||
|
|
e31e14cae9 |
qa(plan): apply v3.6.0-rc QA findings (#832/#833/#836) (#837)
Apply v3.6.0-rc QA learnings to the plan. ## Changes - **§1.1** — 1 GB cap is unrealistic on real DBs without `GOMEMLIMIT` + bounded cold-load. Raised target to 3 GB and pointed to follow-up **#836**. (Investigation showed cold-load transient blows past any sub-2GB cap regardless of `maxMemoryMB` setting because `runtime.MemStats.NextGC` ignores cgroup ceilings.) - **§1.4** — `trackedBytes`/`trackedMB` is in-store packet bytes only and under-reports RSS by ~3–5× (no indexes, caches, runtime overhead, cgo). Switched the assertion to use `processRSSMB` exposed by **#832** (PR **#835**). - **§11.1** — noted the Playwright deep-link E2E assertion was updated by **#833** (PR **#834**) to match the post-#823 full-screen behavior. ## Why Three real findings from the QA ops sweep ([§1.4 fail comment](https://github.com/Kpa-clawbot/CoreScope/issues/809#issuecomment-4286113141)). Updating the plan so the next run doesn't replay the same false-fail/false-pass conditions. Co-authored-by: Kpa-clawbot <agent@corescope.local> |
||
|
|
bb0f816a6b |
fix(channels): only show lock for confirmed-encrypted #channel deep links (#825) (#826)
Closes #825 ## Root cause PR #815 added a `#`-prefix branch in `selectChannel` that unconditionally rendered the lock affordance whenever the channel object wasn't in the loaded `channels` list. With the encrypted toggle off, unencrypted channels like `#test` are also absent from the list, so the new branch wrongly locked them instead of falling through to the REST fetch. ## Fix When no stored key matches, refetch `/channels?includeEncrypted=true` and check `ch.encrypted` before locking. Only render the lock when we positively know the channel is encrypted; otherwise fall through to the existing REST messages fetch. This regresses #815's behavior **only for the unencrypted case** (which is the bug). The encrypted-no-key (#811) and encrypted-with-stored-key (#815) paths are preserved. ## Tests 3 new regression tests in `test-frontend-helpers.js`: - `#test` (unencrypted) deep link → REST fetched, no lock - `#private` (encrypted, no key) deep link → lock, no REST (#811 preserved) - `#private` (encrypted, with stored key) deep link → decrypt path (#815 preserved) `node test-frontend-helpers.js` → 556 passed, 0 failed. ## Perf One extra REST call per cold deep link to a `#`-named channel that's not in the toggle-off list — same endpoint already cached via `CLIENT_TTL.channels`, so subsequent navigations are free. --------- Co-authored-by: you <you@example.com> |
||
|
|
3f26dc7190 |
obs: surface real RSS alongside tracked store bytes in /api/stats (#832) (#835)
Closes #832. ## Root cause confirmed \`trackedMB\` (\`s.trackedBytes\` in \`store.go\`) only sums per-packet struct + payload sizes recorded at insertion. It excludes the index maps (\`byHash\`, \`byTxID\`, \`byNode\`, \`byObserver\`, \`byPathHop\`, \`byPayloadType\`, hash-prefix maps, name lookups), the analytics LRUs (rfCache/topoCache/hashCache/distCache/subpathCache/chanCache/collisionCache), WS broadcast queues, and Go runtime overhead. It's \"useful packet bytes,\" not RSS — typically 3–5× off on staging. ## Fix (Option C from the issue) Expose four memory fields on \`/api/stats\` from a single cached snapshot: | Field | Source | Semantics | |---|---|---| | \`storeDataMB\` | \`s.trackedBytes\` | in-store packet bytes; eviction watermark input | | \`goHeapInuseMB\` | \`runtime.MemStats.HeapInuse\` | live Go heap | | \`goSysMB\` | \`runtime.MemStats.Sys\` | total Go-managed memory | | \`processRSSMB\` | \`/proc/self/status VmRSS\` (Linux), falls back to \`goSysMB\` | what the kernel sees | \`trackedMB\` is retained as a deprecated alias for \`storeDataMB\` so existing dashboards/QA scripts keep working. Field invariants are documented on \`MemorySnapshot\`: \`processRSSMB ≥ goSysMB ≥ goHeapInuseMB ≥ storeDataMB\` (typical). ## Performance Single \`getMemorySnapshot\` call cached for 1s — \`runtime.ReadMemStats\` (stop-the-world) and the \`/proc/self/status\` read are amortized across burst polling. \`/proc\` read is bounded to 8 KiB, parsed with \`strconv\` only — no shell-out, no untrusted input. \`cgoBytesMB\` is omitted: the build uses pure-Go \`modernc.org/sqlite\`, so there is no cgo allocator to measure. Documented in code comment. ## Tests \`cmd/server/stats_memory_test.go\` asserts presence, types, sign, and ordering invariants. Avoids the flaky \"matches RSS to ±X%\" pattern. \`\`\` $ go test ./... -count=1 -timeout 180s ok github.com/corescope/server 19.410s \`\`\` ## QA plan §1.4 now compares \`processRSSMB\` against procfs RSS (the right invariant); threshold stays at 0.20. --------- Co-authored-by: MeshCore Agent <meshcore-agent@openclaw.local> |
||
|
|
886aabf0ae |
fix(#827): /api/packets/{hash} falls back to DB when in-memory store misses (#831)
Closes #827. ## Problem `/api/packets/{hash}` only consulted the in-memory `PacketStore`. When a packet aged out of memory, the handler 404'd — even though SQLite still had it and `/api/nodes/{pubkey}` `recentAdverts` (which reads from the DB) was actively surfacing the hash. Net effect: the **Analyze →** link on older adverts in the node detail page led to a dead "Not found". Two-store inconsistency: DB has the packet, in-memory doesn't, node detail surfaces it from DB → packet detail can't serve it. ## Fix In `handlePacketDetail`: - After in-memory miss, fall back to `db.GetPacketByHash` (already existed) for hash lookups, and `db.GetTransmissionByID` for numeric IDs. - Track when the result came from the DB; if so and the store has no observations, populate from DB via a new `db.GetObservationsForHash` so the response shows real observations instead of the misleading `observation_count = 1` fallback. ## Tests - `TestPacketDetailFallsBackToDBWhenStoreMisses` — insert a packet directly into the DB after `store.Load()`, confirm store doesn't have it, assert 200 + populated observations. - `TestPacketDetail404WhenAbsentFromBoth` — neither store nor DB → 404 (no false positives). - `TestPacketDetailPrefersStoreOverDB` — both have it; store result wins (no double-fetch). - `TestHandlePacketDetailNoStore` updated: it previously asserted the old buggy 404 behavior; now asserts the correct DB-fallback 200. All `go test ./... -run "PacketDetail|Packet|GetPacket"` and the full `cmd/server` suite pass. ## Out of scope The `/api/packets?hash=` filter is the live in-memory list endpoint and intentionally store-only for performance. Not touched here — happy to file a follow-up if you'd rather harmonise. ## Repro context Verified against prod with a recently-adverting repeater whose recent advert hash lives in `recentAdverts` (DB) but had been evicted from the in-memory store; pre-fix 404, post-fix 200 with full observations. Co-authored-by: you <you@example.com> |
||
|
|
a0fddb50aa |
fix(#789): severity from recent samples; Theil-Sen drift with outlier rejection (#828)
Closes #789. ## The two bugs 1. **Severity from stale median.** `classifySkew(absMedian)` used the all-time `MedianSkewSec` over every advert ever recorded for the node. A repeater that was off for hours and then GPS-corrected stayed pinned to `absurd` because hundreds of historical bad samples poisoned the median. Reporter's case: `medianSkewSec: -59,063,561.8` while `lastSkewSec: -0.8` — current health was perfect, dashboard said catastrophic. 2. **Drift from a single correction jump.** Drift used OLS over every `(ts, skew)` pair, with no outlier rejection. A single GPS-correction event (skew jumps millions of seconds in ~30s) dominated the regression and produced `+1,793,549.9 s/day` — physically nonsense; the existing `maxReasonableDriftPerDay` cap then zeroed it (better than absurd, but still useless). ## The two fixes 1. **Recent-window severity.** New field `recentMedianSkewSec` = median over the last `N=5` samples or last `1h`, whichever is narrower (more current view). Severity now derives from `abs(recentMedianSkewSec)`. `MeanSkewSec`, `MedianSkewSec`, `LastSkewSec` are preserved unchanged so the frontend, fleet view, and any external consumers continue to work. 2. **Theil-Sen drift with outlier filter.** Drift now uses the Theil-Sen estimator (median of all pairwise slopes — textbook robust regression, ~29% breakdown point) on a series pre-filtered to drop samples whose skew jumps more than `maxPlausibleSkewJumpSec = 60s` from the previous accepted point. Real µC drift is fractions of a second per advert; clock corrections fall well outside. Capped at `theilSenMaxPoints = 200` (most-recent) so O(n²) stays bounded for chatty nodes. ## What stays the same - Epoch-0 / out-of-range advert filter (PR #769). - `minDriftSamples = 5` floor. - `maxReasonableDriftPerDay = 86400` hard backstop. - API shape: only additions (`recentMedianSkewSec`); no fields removed or renamed. ## Tests All in `cmd/server/clock_skew_test.go`: - `TestSeverityUsesRecentNotMedian` — 100 bad samples (-60s) + 5 good (-1s) → severity = `ok`, historical median still huge. - `TestDriftRejectsCorrectionJump` — 30 min of clean linear drift + one 1000s jump → drift small (~12 s/day). - `TestTheilSenMatchesOLSWhenClean` — clean linear data, Theil-Sen within ~1% of OLS. - `TestReporterScenario_789` — exact reproducer: 1662 samples, 1657 @ -683 days then 5 @ -1s → severity `ok`, `recentMedianSkewSec ≈ 0`, drift bounded; legacy `medianSkewSec` preserved as historical context. `go test ./... -count=1` (cmd/server) and `node test-frontend-helpers.js` both pass. --------- Co-authored-by: clawbot <bot@corescope.local> Co-authored-by: you <you@example.com> |
||
|
|
bb09123f34 |
test(#833): update deep-link Playwright assertion for full-screen desktop view (#834)
Closes #833 ## What Update Playwright E2E assertion for desktop deep link to `/#/nodes/{pubkey}`. Now expects `.node-fullscreen` to be present (matches the spec set by PR #824 / issue #823). ## Why The previous assertion encoded the old pre-#823 behavior — "split panel on desktop deep link." PR #824 intentionally removed the `window.innerWidth <= 640` gate so desktop deep links open the full-screen view (matching the Details link path that #779/#785/#824 ultimately made work). The test failed on every PR that rebased onto master, blocking `Deploy Staging`. ## Verified - 1-test diff, no other behavior change - Mobile-viewport assertions elsewhere already exercise the same `.node-fullscreen` selector Co-authored-by: Kpa-clawbot <agent@corescope.local> |
||
|
|
31a0a944f9 |
fix(#829): node-detail side panel Recent Packets text invisible (#830)
Closes #829 ## What Add explicit `color: var(--text)` to `.advert-info` (and `var(--accent)` to its links) so the side-panel "Recent Packets" entries stay readable in all themes. ## Why `.advert-info` had only `font-size` + `line-height` rules — text inherited from ancestors. In default light/dark themes the inherited color happens to differ enough from `--card-bg`. Under custom themes where they collide, text becomes invisible — only the colored `.advert-dot` shows. Operator screenshot confirmed the symptom. Same class of bug as the existing fix at `style.css:660` ("Bug 7 fix: neighbor table text inherits accent color — force readable text") which forced `color: var(--text)` on `.node-detail-section .data-table td`. The advert timeline doesn't use a data-table, so it fell through. ## Verified - DOM contains correct text — only the rendered color was wrong - `getComputedStyle(.advert-info).color` previously matched `--card-bg` under affected themes - After fix: `.advert-info` resolves to `var(--text)` regardless of inherited chain - Frontend helpers: 553/0 - Full-screen `node-full-card` view (separate `.node-activity-item` markup) unaffected Co-authored-by: Kpa-clawbot <agent@corescope.local> |
||
|
|
cad1f11073 |
fix: bypass IATA filter for status messages, fill SNR on duplicate obs (#694) (#802)
## Problems Two independent ingestor bugs identified in #694: ### 1. IATA filter drops status messages from out-of-region observers The IATA filter ran at the top of `handleMessage()` before any message-type discrimination. Status messages carrying observer metadata (`noise_floor`, battery, airtime) from observers outside the configured IATA regions were silently discarded before `UpsertObserver()` and `InsertMetrics()` ran. **Impact:** Observers running `meshcoretomqtt/1.0.8.0` in BFL and LAX — the only client versions that include `noise_floor` in status messages — had their health data dropped entirely on prod instances filtering to SJC. **Fix:** Moved the IATA filter to the packet path only (after the `parts[3] == "status"` branch). Status messages now always populate observer health data regardless of configured region filter. ### 2. `INSERT OR IGNORE` discards SNR/RSSI on late arrival When the same `(transmission_id, observer_idx, path_json)` observation arrived twice — first without RF fields, then with — `INSERT OR IGNORE` silently discarded the SNR/RSSI from the second arrival. **Fix:** Changed to `ON CONFLICT(...) DO UPDATE SET snr = COALESCE(excluded.snr, snr), rssi = ..., score = ...`. A later arrival with SNR fills in a `NULL`; a later arrival without SNR does not overwrite an existing value. ## Tests - `TestIATAFilterDoesNotDropStatusMessages` — verifies BFL status message is processed when IATA filter includes only SJC, and that BFL packet is still filtered - `TestInsertObservationSNRFillIn` — verifies SNR fills in on second arrival, and is not overwritten by a subsequent null arrival ## Related Partially addresses #694 (upstream client issue of missing SNR in packet messages is out of scope) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
7f024b7aa7 |
fix(#673): replace raw JSON text search with byNode index for node packet queries (#803)
## Summary Fixes #673 - GRP_TXT packets whose message text contains a node's pubkey were incorrectly counted as packets for that node, inflating packet counts and type breakdowns - Two code paths in `store.go` used `strings.Contains` on the full `DecodedJSON` blob — this matched pubkeys appearing anywhere in the JSON, including inside chat message text - `filterPackets` slow path (combined node + other filters): replaced substring search with a hash-set membership check against `byNode[nodePK]` - `GetNodeAnalytics`: removed the full-packet-scan + text search branch entirely; always uses the `byNode` index (which already covers `pubKey`/`destPubKey`/`srcPubKey` via structured field indexing) ## Test Plan - [x] `TestGetNodeAnalytics_ExcludesGRPTXTWithPubkeyInText` — verifies a GRP_TXT packet with the node's pubkey in its text field is not counted in that node's analytics - [x] `TestFilterPackets_NodeQueryDoesNotMatchChatText` — verifies the combined-filter slow path of `filterPackets` returns only the indexed ADVERT, not the chat packet Both tests were written as failing tests against the buggy code and pass after the fix. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
ddd18cb12f |
fix(nodes): Details link opens full-screen on desktop (#823) (#824)
Closes #823 ## What Remove the `window.innerWidth <= 640` gate on the `directNode` full-screen branch in `init()` so the 🔍 Details link works on desktop. ## Why - #739 (`e6ace95`) gated full-screen to mobile so desktop **deep links** would land on the split panel. - But the same gate broke the **Details link** flow (#779/#785): the click handler calls `init(app, pubkey)` directly. On desktop the gated branch was skipped, the list re-rendered with `selectedKey = pubkey`, and the side panel was already open → no visible change. - Dropping the gate makes the directNode branch the single, unambiguous path to full-screen for both the Details link and any deep link. ## Why the desktop split-panel UX is still preserved Row clicks call `selectNode()`, which uses `history.replaceState` — no `hashchange` event, no router re-init, no `directNode` set. Only the Details link handler (which calls `init()` explicitly) and a fresh deep-link load reach this branch. ## Repro / verify 1. Desktop, viewport > 640px, open `/#/nodes`. 2. Click a node row → split panel opens (unchanged). 3. Click 🔍 Details inside the panel → full-screen single-node view (was broken; now works). 4. Back button / Escape → back to list view. 5. Paste `/#/nodes/{pubkey}` directly → full-screen on both desktop and mobile. ## Tests `node test-frontend-helpers.js` → 553 passed, 0 failed. Co-authored-by: you <you@example.com> |
||
|
|
997bf190ce |
fix(mobile): close button accessible + toolbar scrollable (#797) (#805)
## Summary - **Node detail `top: 60px` → `64px`**: aligns with other overlay panels, gives proper clearance from the 52px fixed nav bar - **Mobile bottom sheet `z-index: 1050`**: node detail now renders above the VCR bar (`z-index: 1000`), close button never obscured - **Mobile `max-height: 60vh` → `60dvh`**: respects iOS Safari browser chrome correctly - **`.live-toggles` horizontal scroll**: `overflow-x: auto; flex-wrap: nowrap` — all 8 checkboxes reachable via horizontal swipe Fixes #797 ## Test plan - [x] Mobile portrait (<640px): tap a map node → bottom sheet slides up, close button (✕) visible and tappable above VCR bar - [x] Mobile portrait: scroll the live-header toggles horizontally → all checkboxes reachable - [x] Desktop/tablet (>640px): node detail panel top-right corner fully below the nav bar - [x] Desktop: close button functional, panel hides correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
5ff4b75a07 |
qa: automate §10.1/§10.2 nodeBlacklist test (#822)
Automates QA plan §10.1 (nodeBlacklist hide) and §10.2 (DB retain), flipping both rows from `human` to `auto`. Stacks on top of #808. **What** - New `qa/scripts/blacklist-test.sh` — env-driven harness: - Args: `BASELINE_URL TARGET_URL TEST_PUBKEY` - Env: `TARGET_SSH_HOST`, `TARGET_SSH_KEY` (default `/root/.ssh/id_ed25519`), `TARGET_CONFIG_PATH`, `TARGET_CONTAINER`, optional `TARGET_DB_PATH` / `ADMIN_API_TOKEN`. - Edits `nodeBlacklist` on target via remote `jq` (python3 fallback), atomic move with preserved perms. - Restarts container, waits up to 120 s for `/api/stats == 200`. - §10.1 asserts `/api/nodes/{pk}` is 404 **or** absent from `/api/nodes` listing, and `/api/topology` does not reference the pubkey. - §10.2 prefers `/api/admin/transmissions` if `ADMIN_API_TOKEN` set, else falls back to `sqlite3` inside the container (and host as last resort). - **Teardown is mandatory** (`trap … EXIT INT TERM`): removes pubkey, restarts, verifies the node is visible again. Teardown failures count toward exit code. - Exit code = number of failures; per-step ✅/❌ with classified failure modes (`ssh-failed`, `restart-stuck`, `hide-failed`, `retain-failed`, `teardown-failed`). - `qa/plans/v3.6.0-rc.md` §10.1 / §10.2 mode → `auto (qa/scripts/blacklist-test.sh)`. **Why** Manual blacklist verification was the slowest item in the §10 block and the easiest to get wrong (forgetting teardown leaks state into the next QA pass). Now it's a single command, public-repo-safe (zero PII / hardcoded hosts), and the trap guarantees the target is restored. `bash -n` passes locally. Live run requires staging credentials. --------- Co-authored-by: meshcore-agent <agent@meshcore> Co-authored-by: meshcore-agent <meshcore@openclaw.local> |
||
|
|
2460e33f94 |
fix(#810): /health.recentPackets resolved_path falls back to longest sibling obs (#821)
## What + why
`fetchResolvedPathForTxBest` (used by every API path that fills the
top-level `resolved_path`, including
`/api/nodes/{pk}/health.recentPackets`) picked the observation with the
longest `path_json` and queried SQL for that single obs ID. When the
longest-path obs had `resolved_path` NULL but a shorter sibling had one,
the helper returned nil and the top-level field was dropped — even
though the data exists. QA #809 §2.1 caught it on the health endpoint
because that page surfaces it per-tx.
Fix: keep the LRU-friendly fast path (try the longest-path obs), then
fall back to scanning all observations of the tx and picking the longest
`path_json` that actually has a stored `resolved_path`.
## Changes
- `cmd/server/resolved_index.go`: extend `fetchResolvedPathForTxBest`
with a fallback through `fetchResolvedPathsForTx`.
- `cmd/server/issue810_repro_test.go`: regression test — seeds a tx
whose longest-path obs lacks `resolved_path` and a shorter sibling has
it, then asserts `/api/packets` and
`/api/nodes/{pk}/health.recentPackets` agree.
## Tests
`go test ./... -count=1` from `cmd/server` — PASS (full suite, ~19s).
## Perf
Fast path unchanged (single LRU/SQL lookup, dominant case). Fallback
only runs when the longest-path obs has NULL `resolved_path` — one
indexed query per affected tx, bounded by observations-per-tx (small).
Closes #810
---------
Co-authored-by: you <you@example.com>
|
||
|
|
f701121672 |
Add qa/ — project-specific QA artifacts for the qa-suite skill (#808)
Adds the CoreScope-side artifacts that pair with the generic [`qa-suite` skill](https://github.com/Kpa-clawbot/ai-sdlc/pull/1). ## Layout ``` qa/ ├── README.md ├── plans/ │ └── v3.6.0-rc.md # 34-commit test plan since v3.5.1 └── scripts/ └── api-contract-diff.sh # CoreScope-tuned API contract diff ``` The skill ships the reusable engine + qa-engineer persona + an example plan. This PR adds the CoreScope-tuned plan and the CoreScope-tuned script (correct seed lookups for our `{packets, total}` response shape, our endpoint list, our `resolved_path` requirement). Read by the parent agent at runtime. ## How to use From chat: - `qa staging` — runs the latest `qa/plans/v*-rc.md` against staging, files a fresh GH issue with the report - `qa pr <N>` — uses `qa/plans/pr-<N>.md` if present, else latest RC plan; comments on the PR - `qa v3.6.0-rc` — runs that specific plan The qa-engineer subagent walks every step, classifying each as `auto` (script) / `browser` (UI assertion) / `human` (manual) / `browser+auto`. Quantified pass criteria are mandatory — banned phrases: 'visually aligned' / 'fast' / 'no regression'. ## Plan v3.6.0-rc contents Covers the 34 commits since v3.5.1: - §1 Memory & Load (#806, #790, #807) — heap thresholds, sawtooth pattern - §2 API contract (#806) — every endpoint that should carry `resolved_path`, auto-checked by `api-contract-diff.sh` - §3 Decoder & hashing (#787, #732, #747, #766, #794, #761) - §4 Channels (#725 series M1–M5) - §5 Clock skew (#690 series M1–M3) - §6 Observers (#764, #774) - §7 Multi-byte hash adopters (#758, #767) - §8 Frontend nav & deep linking (#739, #740, #779, #785, #776, #745) - §9 Geofilter (#735, #734) - §10 Node blacklist (#742) - §11 Deploy/ops Release blockers: §1.2, §2, §3. §4 is the headline-feature gate. ## Adding new plans Per release: copy `plans/v<last>-rc.md` to `plans/v<new>-rc.md` and update commit-range header, new sections, GO criteria. Per PR: create `plans/pr-<N>.md` with the bare minimum for that PR's surface area. Co-authored-by: you <you@example.com> |
||
|
|
d7fe24e2db |
Fix channel filter on Packets page (UI + API) — #812 (#816)
Closes #812 ## Root causes **Server (`/api/packets?channel=…` returned identical totals):** The handler in `cmd/server/routes.go` never read the `channel` query parameter into `PacketQuery`, so it was silently ignored by both the SQLite path (`db.go::buildTransmissionWhere`) and the in-memory path (`store.go::filterPackets`). The codebase already had everything else in place — the `channel_hash` column with an index from #762, decoded `channel` / `channelHashHex` fields on each packet — it just wasn't wired up. **UI (`/#/packets` had no channel filter):** `public/packets.js` rendered observer / type / time-window / region filters but no channel control, and didn't read `?channel=` from the URL. ## Fix ### Server - New `Channel` field on `PacketQuery`; `handlePackets` reads `r.URL.Query().Get("channel")`. - DB path filters by the indexed `channel_hash` column (exact match). - In-memory path: helper `packetMatchesChannel` matches `decoded.channel` (plaintext, e.g. `#test`, `public`) or `enc_<HEX>` against `channelHashHex` for undecryptable GRP_TXT. Uses cached `ParsedDecoded()` so it's O(1) after first parse. Fast-path index guards and the grouped-cache key updated to include channel. - Regression test (`channel_filter_test.go`): `channel=#test` returns ≥1 GRP_TXT packet and fewer than baseline; `channel=nonexistentchannel` returns `total=0`. ### UI - New `<select id="fChannel">` populated from `/api/channels`. - Round-trips via `?channel=…` on the URL hash (read on init, written on change). - Pre-seeds the current value as an option so encrypted hashes not in `/api/channels` still display as selected on reload. - On change, calls `loadPackets()` so the server-side filter applies before pagination. ## Perf Filter adds at most one cached map lookup per packet (DB path uses indexed column, store path uses `ParsedDecoded()` cache). Staging baseline 149–190 ms for `?channel=#test&limit=50`; the new comparison is negligible. Target ≤ 500 ms preserved. ## Tests `cd cmd/server && go test ./... -count=1 -timeout 120s` → PASS. --------- Co-authored-by: you <you@example.com> |
||
|
|
a9732e64ae |
fix(nodes): render clock-skew section in side panel (#813) (#814)
Closes #813 ## Root cause The Node detail **side panel** (`renderDetail()`, `public/nodes.js:1145`) was missing both the `#node-clock-skew` placeholder div and the `loadClockSkew()` IIFE loader. Those exist only in the **full-screen** detail page (`loadFullNode`, lines 498 / 632), so any node opened via deep link or click in the listing — which uses the side panel — showed no clock-skew UI even when `/api/nodes/{pk}/clock-skew` returned rich data. ## Fix Mirror the full-screen template branch and IIFE in `renderDetail`: - Add `<div class="node-detail-section skew-detail-section" id="node-clock-skew" style="display:none">` to the side-panel template (right above Observers). - Add an async `loadClockSkewPanel()` IIFE after the panel `innerHTML` is set, using the same severity/badge/drift/sparkline rendering and the `severity === 'no_clock'` branch the full-screen view uses. No new helpers — reuses existing window globals (`formatSkew`, `formatDrift`, `renderSkewBadge`, `renderSkewSparkline`). ## Verification - Syntax check: `node -c public/nodes.js` ✓ - `node test-frontend-helpers.js` → 553/553 ✓ - Browser: staging runs master so I couldn't validate the deployed UI yet. Manual repro after deploy: 1. Open `https://analyzer.00id.net/#/nodes`, click any node with a known skew (e.g. Puppy Solar `a8dde6d7…` shows `⏰ -23d 8h` in listing). 2. Side panel should show a **⏰ Clock Skew** section with median skew, severity badge, drift line, and sparkline. 3. For `severity === 'no_clock'` (e.g. SKCE_RS `14531bd2…`), section shows "No Clock" instead of skew value. --------- Co-authored-by: you <you@example.com> |
||
|
|
60be48dc5e |
fix(channels): lock affordance on deep link to encrypted channel without key (#815)
Closes #811 ## What Deep linking to `/#/channels/%23private` (encrypted channel, no key configured) now shows the existing 🔒 lock affordance instead of an empty "No messages in this channel yet" pane. ## Why `selectChannel` only rendered the lock message inside the `if (ch && ch.encrypted)` branch. On a cold deep link: - `loadChannels` omits encrypted channels unless the toggle is on, so `ch` is `undefined`. - The hash isn't `user:`-prefixed, so that branch is skipped too. - Code falls through to the REST fetch, returns 0 messages, and `renderMessages` shows the generic empty state. ## Fix Add a `#`-prefixed-hash branch immediately before the REST fetch: - If a stored key matches the channel name → decrypt and render. - Otherwise → reuse the existing 🔒 "encrypted and no decryption key is configured" message. ## Trace (URL → render) 1. `#/channels/%23private` → `init(routeParam='#private')` 2. `loadChannels()` → `channels` has no `#private` entry (toggle off) 3. `selectChannel('#private')` → `ch` undefined → skips encrypted branches → **new check fires** → lock message 4. With key stored: same check → `decryptAndRender` ## Validation - `node test-frontend-helpers.js` → 553 passed, 0 failed - Manual trace above; change is a 15-line localized guard before the REST fetch, no hot-path or perf impact. Co-authored-by: meshcore-agent <agent@corescope.local> |
||
|
|
9e90548637 |
perf(#800): remove per-StoreTx ResolvedPath, replace with membership index + on-demand decode (#806)
## Summary Remove `ResolvedPath []*string` field from `StoreTx` and `StoreObs` structs, replacing it with a compact membership index + on-demand SQL decode. This eliminates the dominant heap cost identified in profiling (#791, #799). **Spec:** #800 (consolidated from two rounds of expert + implementer review on #799) Closes #800 Closes #791 ## Design ### Removed - `StoreTx.ResolvedPath []*string` - `StoreObs.ResolvedPath []*string` - `TransmissionResp.ResolvedPath`, `ObservationResp.ResolvedPath` struct fields ### Added | Structure | Purpose | Est. cost at 1M obs | |---|---|---:| | `resolvedPubkeyIndex map[uint64][]int` | FNV-1a(pubkey) → []txID forward index | 50–120 MB | | `resolvedPubkeyReverse map[int][]uint64` | txID → []hashes for clean removal | ~40 MB | | `apiResolvedPathLRU` (10K entries) | FIFO cache for on-demand API decode | ~2 MB | ### Decode-window discipline `resolved_path` JSON decoded once per packet. Consumers fed in order, temp slice dropped — never stored on struct: 1. `addToByNode` — relay node indexing 2. `touchRelayLastSeen` — relay liveness DB updates 3. `byPathHop` resolved-key entries 4. `resolvedPubkeyIndex` + reverse insert 5. WebSocket broadcast map (raw JSON bytes) 6. Persist batch (raw JSON bytes for SQL UPDATE) ### Collision safety When the forward index returns candidates, a batched SQL query confirms exact pubkey presence using `LIKE '%"pubkey"%'` on the `resolved_path` column. ### Feature flag `useResolvedPathIndex` (default `true`). Off-path is conservative: all candidates kept, index not consulted. For one-release rollback safety. ## Files changed | File | Changes | |---|---| | `resolved_index.go` | **New** — index structures, LRU cache, on-demand SQL helpers, collision safety | | `store.go` | Remove RP fields, decode-window discipline in Load/Ingest, on-demand txToMap/obsToMap/enrichObs, eviction cleanup via SQL, memory accounting update | | `types.go` | Remove RP fields from TransmissionResp/ObservationResp | | `routes.go` | Replace `nodeInResolvedPath` with `nodeInResolvedPathViaIndex`, remove RP from mapSlice helpers | | `neighbor_persist.go` | Refactor backfill: reverse-map removal → forward+reverse insert → LRU invalidation | ## Tests added (27 new) **Unit:** - `TestStoreTx_ResolvedPathFieldAbsent` — reflection guard - `TestResolvedPubkeyIndex_BuildFromLoad` — forward+reverse consistency - `TestResolvedPubkeyIndex_HashCollision` — SQL collision safety - `TestResolvedPubkeyIndex_IngestUpdate` — maps reflect new ingests - `TestResolvedPubkeyIndex_RemoveOnEvict` — clean removal via reverse map - `TestResolvedPubkeyIndex_PerObsCoverage` — non-best obs pubkeys indexed - `TestAddToByNode_WithoutResolvedPathField` - `TestTouchRelayLastSeen_WithoutResolvedPathField` - `TestWebSocketBroadcast_IncludesResolvedPath` - `TestBackfill_InvalidatesLRU` - `TestEviction_ByNodeCleanup_OnDemandSQL` - `TestExtractResolvedPubkeys`, `TestMergeResolvedPubkeys` - `TestResolvedPubkeyHash_Deterministic` - `TestLRU_EvictionOnFull` **Endpoint:** - `TestPathsThroughNode_NilResolvedPathFallback` - `TestPacketsAPI_OnDemandResolvedPath` - `TestPacketsAPI_OnDemandResolvedPath_LRUHit` - `TestPacketsAPI_OnDemandResolvedPath_Empty` **Feature flag:** - `TestFeatureFlag_OffPath_PreservesOldBehavior` - `TestFeatureFlag_Toggle_NoStateLeak` **Concurrency:** - `TestReverseMap_NoLeakOnPartialFailure` - `TestDecodeWindow_LockHoldTimeBounded` - `TestLivePolling_LRUUnderConcurrentIngest` **Regression:** - `TestRepeaterLiveness_StillAccurate` **Benchmarks:** - `BenchmarkLoad_BeforeAfter` - `BenchmarkResolvedPubkeyIndex_Memory` - `BenchmarkPathsThroughNode_Latency` - `BenchmarkLivePolling_UnderIngest` ## Benchmark results ``` BenchmarkResolvedPubkeyIndex_Memory/pubkeys=50K 429ms 103MB 777K allocs BenchmarkResolvedPubkeyIndex_Memory/pubkeys=500K 4205ms 896MB 7.67M allocs BenchmarkLoad_BeforeAfter 65ms 20MB 202K allocs BenchmarkPathsThroughNode_Latency 3.9µs 0B 0 allocs BenchmarkLivePolling_UnderIngest 5.4µs 545B 7 allocs ``` Key: per-obs `[]*string` overhead completely eliminated. At 1M obs with 3 hops average, this saves ~72 bytes/obs × 1M = ~68 MB just from the slice headers + pointers, plus the JSON-decoded string data (~900 MB at scale per profiling). ## Design choices - **FNV-1a instead of xxhash**: stdlib availability, no external dependency. Performance is equivalent for this use case (pubkey strings are short). - **FIFO LRU instead of true LRU**: simpler implementation, adequate for the access pattern (mostly sequential obs IDs from live polling). - **Grouped packets view omits resolved_path**: cold path, not worth SQL round-trip per page render. - **Backfill pending check uses reverse-map presence** instead of per-obs field: if a tx has any indexed pubkeys, its observations are considered resolved. Closes #807 --------- Co-authored-by: you <you@example.com> |
||
|
|
a8e1cea683 |
fix: use payload type bits only in content hash (not full header byte) (#787)
## Problem The firmware computes packet content hash as: ``` SHA256(payload_type_byte + [path_len for TRACE] + payload) ``` Where `payload_type_byte = (header >> 2) & 0x0F` — just the payload type bits (2-5). CoreScope was using the **full header byte** in its hash computation, which includes route type bits (0-1) and version bits (6-7). This meant the same logical packet produced different content hashes depending on route type — breaking dedup and packet lookup. **Firmware reference:** `Packet.cpp::calculatePacketHash()` uses `getPayloadType()` which returns `(header >> PH_TYPE_SHIFT) & PH_TYPE_MASK`. ## Fix - Extract only payload type bits: `payloadType := (headerByte >> 2) & 0x0F` - Include `path_len` byte in hash for TRACE packets (matching firmware behavior) - Applied to both `cmd/server/decoder.go` and `cmd/ingestor/decoder.go` ## Tests Added - **Route type independence:** Same payload with FLOOD vs DIRECT route types produces identical hash - **TRACE path_len inclusion:** TRACE packets with different `path_len` produce different hashes - **Firmware compatibility:** Hash output matches manual computation of firmware algorithm ## Migration Impact Existing packets in the DB have content hashes computed with the old (incorrect) formula. Options: 1. **Recompute hashes** via migration (recommended for clean state) 2. **Dual lookup** — check both old and new hash on queries (backward compat) 3. **Accept the break** — old hashes become stale, new packets get correct hashes Recommend option 1 (migration) as a follow-up. The volume of affected packets depends on how many distinct route types were seen for the same logical packet. Fixes #786 --------- Co-authored-by: you <you@example.com> |
||
|
|
bf674ebfa2 |
feat: validate advert signatures on ingest, reject corrupt packets (#794)
## Summary
Validates ed25519 signatures on ADVERT packets during MQTT ingest.
Packets with invalid signatures are rejected before storage, preventing
corrupt/truncated adverts from polluting the database.
## Changes
### Ingestor (`cmd/ingestor/`)
- **Signature validation on ingest**: After decoding an ADVERT, checks
`SignatureValid` from the decoder. Invalid signatures → packet dropped,
never stored.
- **Config flag**: `validateSignatures` (default `true`). Set to `false`
to disable validation for backward compatibility with existing installs.
- **`dropped_packets` table**: New SQLite table recording every rejected
packet with full attribution:
- `hash`, `raw_hex`, `reason`, `observer_id`, `observer_name`,
`node_pubkey`, `node_name`, `dropped_at`
- Indexed on `observer_id` and `node_pubkey` for investigation queries
- **`SignatureDrops` counter**: New atomic counter in `DBStats`, logged
in periodic stats output as `sig_drops=N`
- **Retention**: `dropped_packets` pruned alongside metrics on the same
`retention.metricsDays` schedule
### Server (`cmd/server/`)
- **`GET /api/dropped-packets`** (API key required): Returns recent
drops with optional `?observer=` and `?pubkey=` filters, `?limit=`
(default 100, max 500)
- **`signatureDrops`** field added to `/api/stats` response (count from
`dropped_packets` table)
### Tests (8 new)
| Test | What it verifies |
|------|-----------------|
| `TestSigValidation_ValidAdvertStored` | Valid advert passes validation
and is stored |
| `TestSigValidation_TamperedSignatureDropped` | Tampered signature →
dropped, recorded in `dropped_packets` with correct fields |
| `TestSigValidation_TruncatedAppdataDropped` | Truncated appdata
invalidates signature → dropped |
| `TestSigValidation_DisabledByConfig` | `validateSignatures: false`
skips validation, stores tampered packet |
| `TestSigValidation_DropCounterIncrements` | Counter increments
correctly across multiple drops |
| `TestSigValidation_LogContainsFields` | `dropped_packets` row contains
hash, reason, observer, pubkey, name |
| `TestPruneDroppedPackets` | Old entries pruned, recent entries
retained |
| `TestShouldValidateSignatures_Default` | Config helper returns correct
defaults |
### Config example
```json
{
"validateSignatures": true
}
```
Fixes #793
---------
Co-authored-by: you <you@example.com>
|
||
|
|
d596becca3 |
feat: bounded cold load — limit Load() by memory budget (#790)
## Implements #748 M1 — Bounded Cold Load ### Problem `Load()` pulls the ENTIRE database into RAM before eviction runs. On a 1GB database, this means 3+ GB peak memory at startup, regardless of `maxMemoryMB`. This is the root cause of #743 (OOM on 2GB VMs). ### Solution Calculate the maximum number of transmissions that fit within the `maxMemoryMB` budget and use a SQL subquery LIMIT to load only the newest packets. **Two-phase approach** (avoids the JOIN-LIMIT row count problem): ```sql SELECT ... FROM transmissions t LEFT JOIN observations o ON ... WHERE t.id IN (SELECT id FROM transmissions ORDER BY first_seen DESC LIMIT ?) ORDER BY t.first_seen ASC, o.timestamp DESC ``` ### Changes - **`estimateStoreTxBytesTypical(numObs)`** — estimates memory cost of a typical transmission without needing an actual `StoreTx` instance. Used for budget calculation. - **Budget calculation in `Load()`** — `maxPackets = (maxMemoryMB * 1048576) / avgBytesPerPacket` with a floor of 1000 packets. - **Subquery LIMIT** — loads only the newest N transmissions when bounded. - **`oldestLoaded` tracking** — records the oldest packet timestamp in memory so future SQL fallback queries (M2+) know where in-memory data ends. - **Perf stats** — `oldestLoaded` exposed in `/api/perf/store-stats`. - **Logging** — bounded loads show `Loaded X/Y transmissions (limited by ZMB budget)`. ### When `maxMemoryMB=0` (unlimited) Behavior is completely unchanged — no LIMIT clause, all packets loaded. ### Tests (6 new) | Test | Validates | |------|-----------| | `TestBoundedLoad_LimitedMemory` | With 1MB budget, loads fewer than total (hits 1000 minimum) | | `TestBoundedLoad_NewestFirst` | Loaded packets are the newest, not oldest | | `TestBoundedLoad_OldestLoadedSet` | `oldestLoaded` matches first packet's `FirstSeen` | | `TestBoundedLoad_UnlimitedWithZero` | `maxMemoryMB=0` loads all packets | | `TestBoundedLoad_AscendingOrder` | Packets remain in ascending `first_seen` order after bounded load | | `TestEstimateStoreTxBytesTypical` | Estimate grows with observation count, exceeds floor | Plus benchmarks: `BenchmarkLoad_Bounded` vs `BenchmarkLoad_Unlimited`. ### Perf justification On a 5000-transmission test DB with 1MB budget: - Bounded: loads 1000 packets (the minimum) in ~1.3s - The subquery uses SQLite's index on `first_seen` — O(N log N) for the LIMIT, then indexed JOIN for observations - No full table scan needed when bounded ### Next milestones - **M2**: Packet list/search SQL fallback (uses `oldestLoaded` boundary) - **M3**: Node analytics SQL fallback - **M4-M5**: Remaining endpoint fallbacks + live-only memory store --------- Co-authored-by: you <you@example.com> |
||
|
|
b9ba447046 |
feat: add nodeBlacklist config to hide abusive/troll nodes (#742)
## Problem
Some mesh participants set offensive names, report deliberately false
GPS positions, or otherwise troll the network. Instance operators
currently have no way to hide these nodes from public-facing APIs
without deleting the underlying data.
## Solution
Add a `nodeBlacklist` array to `config.json` containing public keys of
nodes to exclude from all API responses.
### Blacklisted nodes are filtered from:
- `GET /api/nodes` — list endpoint
- `GET /api/nodes/search` — search results
- `GET /api/nodes/{pubkey}` — detail (returns 404)
- `GET /api/nodes/{pubkey}/health` — returns 404
- `GET /api/nodes/{pubkey}/paths` — returns 404
- `GET /api/nodes/{pubkey}/analytics` — returns 404
- `GET /api/nodes/{pubkey}/neighbors` — returns 404
- `GET /api/nodes/bulk-health` — filtered from results
### Config example
```json
{
"nodeBlacklist": [
"aabbccdd...",
"11223344..."
]
}
```
### Design decisions
- **Case-insensitive** — public keys normalized to lowercase
- **Whitespace trimming** — leading/trailing whitespace handled
- **Empty entries ignored** — `""` or `" "` do not cause false positives
- **Nil-safe** — `IsBlacklisted()` on nil Config returns false
- **Backward-compatible** — empty/missing `nodeBlacklist` has zero
effect
- **Lazy-cached set** — blacklist converted to `map[string]bool` on
first lookup
### What this does NOT do (intentionally)
- Does **not** delete or modify database data — only filters API
responses
- Does **not** block packet ingestion — data still flows for analytics
- Does **not** filter `/api/packets` — only node-facing endpoints are
affected
## Testing
- Unit tests for `Config.IsBlacklisted()` (case sensitivity, whitespace,
empty entries, nil config)
- Integration tests for `/api/nodes`, `/api/nodes/{pubkey}`,
`/api/nodes/search`
- Full test suite passes with no regressions
|
||
|
|
b8846c2db2 |
fix: show lock message for encrypted channels without key on deep link (#783)
## Problem Deep-linking to an encrypted channel (e.g. `#/channels/42`) when the user has no client-side decryption key falls through to the plaintext API fetch, displaying gibberish base64/binary content instead of a meaningful message. ## Root Cause In `selectChannel()`, the encrypted channel key-matching loop iterates all stored keys. If none match, execution falls through to the normal plaintext message fetch — which returns raw encrypted data rendered as gibberish. ## Fix After the key-matching loop for encrypted channels, return early with the lock message instead of falling through. **3 lines added** in `public/channels.js`, **108 lines** regression test in `test-frontend-helpers.js`. ## Investigation: Sidebar Display The sidebar filtering is already correct: - DB path: SQL filters out `enc_` prefix channel hashes - In-memory path: Only returns `type: CHAN` (server-decrypted) channels, with `hasGarbageChars` validation - Server-side decryption: MAC verification (2-byte HMAC) + UTF-8 + non-printable character validation prevents false-positive decryptions - Encrypted channels only appear when the toggle is explicitly enabled ## Testing - All existing tests pass - New regression test verifies: lock message shown, messages API NOT called for encrypted channels without key Fixes #781 --------- Co-authored-by: you <you@example.com> |
||
|
|
34b8dc8961 |
fix: improve #778 detail link — call init() directly instead of router teardown (#785)
Improves the fix for #778 (replaces #779's approach). ## Problem When clicking "Details" in the node side panel, the hash is already `#/nodes/{pubkey}` (set by `replaceState` in `selectNode`). The link targets the same hash → no `hashchange` event → router never fires → detail view never renders. ## What was wrong with #779 PR #779 used `replaceState('#/')` + `location.hash = target` synchronously, which forces a full SPA router teardown/rebuild cycle just to re-render the same page. This is wasteful and can cause visual flicker. ## This fix **Detail link** (`#/nodes/{pubkey}`): Calls `init(app, pubkey)` directly — no router teardown, no page flash. The `init()` function already handles rendering the detail view when `routeParam` is set. **Analytics link** (`#/nodes/{pubkey}/analytics`): Uses `setTimeout` to ensure reliable `hashchange` firing, since this routes to a different page (`node-analytics`) that requires the full SPA router. ## Testing - Frontend helper tests: 552/552 ✅ - Packet filter tests: 62/62 ✅ - Aging tests: 29/29 ✅ - Go server tests: pass ✅ - Go ingestor tests: pass ✅ --------- Co-authored-by: you <you@example.com> |
||
|
|
fa3f623bd6 |
feat: add observer retention — remove stale observers after configurable days (#764)
## Summary
Observers that stop actively sending data now get removed after a
configurable retention period (default 14 days).
Previously, observers remained in the `observers` table forever. This
meant nodes that were once observers for an instance but are no longer
connected (even if still active in the mesh elsewhere) would continue
appearing in the observer list indefinitely.
## Key Design Decisions
- **Active data requirement**: `last_seen` is only updated when the
observer itself sends packets (via `stmtUpdateObserverLastSeen`). Being
seen by another node does NOT update this field. So an observer must
actively send data to stay listed.
- **Default: 14 days** — observers not seen in 14 days are removed
- **`-1` = keep forever** — for users who want observers to never be
removed
- **`0` = use default (14 days)** — same as not setting the field
- **Runs on startup + daily ticker** — staggered 3 minutes after metrics
prune to avoid DB contention
## Changes
| File | Change |
|------|--------|
| `cmd/ingestor/config.go` | Add `ObserverDays` to `RetentionConfig`,
add `ObserverDaysOrDefault()` |
| `cmd/ingestor/db.go` | Add `RemoveStaleObservers()` — deletes
observers with `last_seen` before cutoff |
| `cmd/ingestor/main.go` | Wire up startup + daily ticker for observer
retention |
| `cmd/server/config.go` | Add `ObserverDays` to `RetentionConfig`, add
`ObserverDaysOrDefault()` |
| `cmd/server/db.go` | Add `RemoveStaleObservers()` (server-side, uses
read-write connection) |
| `cmd/server/main.go` | Wire up startup + daily ticker, shutdown
cleanup |
| `cmd/server/routes.go` | Admin prune API now also removes stale
observers |
| `config.example.json` | Add `observerDays: 14` with documentation |
| `cmd/ingestor/coverage_boost_test.go` | 4 tests: basic removal, empty
store, keep forever (-1), default (0→14) |
| `cmd/server/config_test.go` | 4 tests: `ObserverDaysOrDefault` edge
cases |
## Config Example
```json
{
"retention": {
"nodeDays": 7,
"observerDays": 14,
"packetDays": 30,
"_comment": "observerDays: -1 = keep forever, 0 = use default (14)"
}
}
```
## Admin API
The `/api/admin/prune` endpoint now also removes stale observers (using
`observerDays` from config) and reports `observers_removed` in the
response alongside `packets_deleted`.
## Test Plan
- [x] `TestRemoveStaleObservers` — old observer removed, recent observer
kept
- [x] `TestRemoveStaleObserversNone` — empty store, no errors
- [x] `TestRemoveStaleObserversKeepForever` — `-1` keeps even year-old
observers
- [x] `TestRemoveStaleObserversDefault` — `0` defaults to 14 days
- [x] `TestObserverDaysOrDefault` (ingestor) —
nil/zero/positive/keep-forever
- [x] `TestObserverDaysOrDefault` (server) —
nil/zero/positive/keep-forever
- [x] Both binaries compile cleanly (`go build`)
- [ ] Manual: verify observer count decreases after retention period on
a live instance
|
||
|
|
dfe383cc51 |
fix: node detail panel Details/Analytics links don't navigate (#779)
Fixes #778 ## Problem The Details and Analytics links in the node side panel don't navigate when clicked. This is a regression from #739 (desktop node deep linking). **Root cause:** When a node is selected, `selectNode()` uses `history.replaceState()` to set the URL to `#/nodes/{pubkey}`. The Details link has `href="#/nodes/{pubkey}"` — the same hash. Clicking an anchor with the same hash as the current URL doesn't fire the `hashchange` event, so the SPA router never triggers navigation. ## Fix Added a click handler on the `nodesRight` panel that intercepts clicks on `.btn-primary` navigation links: 1. `e.preventDefault()` to stop the default anchor behavior 2. If the current hash already matches the target, temporarily clear it via `replaceState` 3. Set `location.hash` to the target, which fires `hashchange` and triggers the SPA router This handles both the Details link (`#/nodes/{pubkey}`) and the Analytics link (`#/nodes/{pubkey}/analytics`). ## Testing - All frontend helper tests pass (552/552) - All packet filter tests pass (62/62) - All aging tests pass (29/29) - Go server tests pass --------- Co-authored-by: you <you@example.com> |
||
|
|
fa348efe2a |
fix: force-remove staging container before deploy — handles both compose and docker-run containers
The deploy step used only 'docker compose down' which can't remove containers created via 'docker run'. Now explicitly stops+removes the named container first, then runs compose down as cleanup. Permanent fix for the recurring CI deploy failure. |
||
|
|
a9a18ff051 |
fix: neighbor graph slider persists to localStorage, default 0.7 (#776)
## Summary The neighbor graph min score slider didn't persist its value to localStorage, resetting to 0.10 on every page load. This was a poor default for most use cases. ## Changes - **Default changed from 0.10 to 0.70** — more useful starting point that filters out low-confidence edges - **localStorage persistence** — slider value saved on change, restored on page load - **3 new tests** in `test-frontend-helpers.js` verifying default value, load behavior, and save behavior ## Testing - `node test-frontend-helpers.js` — 547 passed, 0 failed - `node test-packet-filter.js` — 62 passed, 0 failed - `node test-aging.js` — 29 passed, 0 failed --------- Co-authored-by: you <you@example.com> |
||
|
|
ceea136e97 |
feat: observer graph representation (M1+M2) (#774)
## Summary Fixes #753 — Milestones M1 and M2: Observer nodes in the neighbor graph are now correctly labeled, colored, and filterable. ### M1: Label + color observers **Backend** (`cmd/server/neighbor_api.go`): - `buildNodeInfoMap()` now queries the `observers` table after building from `nodes` - Observer-only pubkeys (not already in the map as repeaters etc.) get `role: "observer"` and their name from the observers table - Observer-repeaters keep their repeater role (not overwritten) **Frontend**: - CSS variable `--role-observer: #8b5cf6` added to `:root` - `ROLE_COLORS.observer` was already defined in `roles.js` ### M2: Observer filter checkbox (default unchecked) **Frontend** (`public/analytics.js`): - Observer checkbox added to the role filter section, **unchecked by default** - Observers create hub-and-spoke patterns (one observer can have 100+ edges) that drown out the actual repeater topology — hiding them by default keeps the graph clean - Fixed `applyNGFilters()` which previously always showed observers regardless of checkbox state ### Tests - Backend: `TestBuildNodeInfoMap_ObserverEnrichment` — verifies observer-only pubkeys get name+role from observers table, and observer-repeaters keep their repeater role - All existing Go tests pass - All frontend helper tests pass (544/544) --------- Co-authored-by: you <you@example.com> |
||
|
|
99dc4f805a |
fix: E2E neighbor test — use hash evaluation instead of page.goto for reliable SPA navigation
page.goto with hash-only change may not reliably trigger hashchange in Playwright, causing the mobile full-screen node view to never render. Use page.evaluate to set location.hash directly, which guarantees the SPA router fires. Also increase timeout from 10s to 15s for CI margin. |
||
|
|
ba7cd0fba7 |
fix: clock skew sanity checks — filter epoch-0, cap drift, min samples (#769)
Nodes with dead RTCs show -690d skew and -3 billion s/day drift. Fix: 1. **No Clock severity**: |skew| > 365d → `no_clock`, skip drift 2. **Drift cap**: |drift| > 86400 s/day → nil (physically impossible) 3. **Min samples**: < 5 samples → no drift regression 4. **Frontend**: 'No Clock' badge, '–' for unreliable drift Fixes the crazy stats on the Clock Health fleet view. --------- Co-authored-by: you <you@example.com> |
||
|
|
6a648dea11 |
fix: multi-byte adopters — all node types, role column, advert precedence (#754) (#767)
## Fix: Multi-Byte Adopters Table — Three Bugs (#754) ### Bug 1: Companions in "Unknown" `computeMultiByteCapability()` was repeater-only. Extended to classify **all node types** (companions, rooms, sensors). A companion advertising with 2-byte hash is now correctly "Confirmed". ### Bug 2: No Role Column Added a **Role** column to the merged Multi-Byte Hash Adopters table, color-coded using `ROLE_COLORS` from `roles.js`. Users can now distinguish repeaters from companions without clicking through to node detail. ### Bug 3: Data Source Disagreement When adopter data (from `computeAnalyticsHashSizes`) shows `hashSize >= 2` but capability only found path evidence ("Suspected"), the advert-based adopter data now takes precedence → "Confirmed". The adopter hash sizes are passed into `computeMultiByteCapability()` as an additional confirmed evidence source. ### Changes - `cmd/server/store.go`: Extended capability to all node types, accept adopter hash sizes, prioritize advert evidence - `public/analytics.js`: Added Role column with color-coded badges - `cmd/server/multibyte_capability_test.go`: 3 new tests (companion confirmed, role populated, adopter precedence) ### Tests - All 10 multi-byte capability tests pass - All 544 frontend helper tests pass - All 62 packet filter tests pass - All 29 aging tests pass --------- Co-authored-by: you <you@example.com> |
||
|
|
29157742eb |
feat: show collision details in Hash Usage Matrix for all hash sizes (#758)
## Summary Shows which prefixes are colliding in the Hash Usage Matrix, making the "PREFIX COLLISIONS: N" count actionable. Fixes #757 ## Changes ### Frontend (`public/analytics.js`) - **Clickable collision count**: When collisions > 0, the stat card is clickable and scrolls to the collision details section. Shows a `▼` indicator. - **3-byte collision table**: The collision risk section and `renderCollisionsFromServer` now render for all hash sizes including 3-byte (was previously hidden/skipped for 3-byte). - **Helpful hint**: 3-byte panel now says "See collision details below" when collisions exist. ### Backend (`cmd/server/collision_details_test.go`) - Test that collision details include correct prefix and node name/pubkey pairs - Test that collision details are empty when no collisions exist ### Frontend Tests (`test-frontend-helpers.js`) - Test clickable stat card renders `onclick` and `cursor:pointer` when collisions > 0 - Test non-clickable card when collisions = 0 - Test collision table renders correct node links (`#/nodes/{pubkey}`) - Test no-collision message renders correctly ## What was already there The backend already returned full collision details (prefix, nodes with pubkeys/names/coords, distance classification) in the `hash-collisions` API. The frontend already had `renderCollisionsFromServer` rendering a rich table with node links. The gap was: 1. The 3-byte tab hid the collision risk section entirely 2. No visual affordance to navigate from the stat count to the details ## Perf justification No new computation — collision data was already computed and returned by the API. The only change is rendering it for 3-byte (same as 1-byte/2-byte). The collision list is already limited by the backend sort+slice pattern. --------- Co-authored-by: you <you@example.com> |
||
|
|
ed19a19473 |
fix: correct field table offsets for transport routes (#766)
## Summary Fixes #765 — packet detail field table showed wrong byte offsets for transport routes. ## Problem `buildFieldTable()` hardcoded `path_length` at byte 1 for ALL packet types. For `TRANSPORT_FLOOD` (route_type=0) and `TRANSPORT_DIRECT` (route_type=3), transport codes occupy bytes 1-4, pushing `path_length` to byte 5. This caused: - Wrong offset numbers in the field table for transport packets - Transport codes displayed AFTER path length (wrong byte order) - `Advertised Hash Size` row referenced wrong byte ## Fix - Use dynamic `offset` tracking that accounts for transport codes - Render transport code rows before path length (matching actual wire format) - Store `pathLenOffset` for correct reference in ADVERT payload section - Reuse already-parsed `pathByte0` for hash size calculation in path section ## Tests Added 4 regression tests in `test-frontend-helpers.js`: - FLOOD (route_type=1): path_length at byte 1, no transport codes - TRANSPORT_FLOOD (route_type=0): transport codes at bytes 1-4, path_length at byte 5 - TRANSPORT_DIRECT (route_type=3): same offsets as TRANSPORT_FLOOD - Field table row order matches byte layout for transport routes All existing tests pass (538 frontend helpers, 62 packet filter, 29 aging). Co-authored-by: you <you@example.com> |
||
|
|
d27a7a653e |
fix case on channel key so Public decode/display works right (#761)
Simple change. Before this change Public wasn't showing up in the channels display due to the case issue. |
||
|
|
0e286d85fd |
fix: channel query performance — add channel_hash column, SQL-level filtering (#762) (#763)
## Problem Channel API endpoints scan entire DB — 2.4s for channel list, 30s for messages. ## Fix - Added `channel_hash` column to transmissions (populated on ingest, backfilled on startup) - `GetChannels()` rewrites to GROUP BY channel_hash (one row per channel vs scanning every packet) - `GetChannelMessages()` filters by channel_hash at SQL level with proper LIMIT/OFFSET - 60s cache for channel list - Index: `idx_tx_channel_hash` for fast lookups Expected: 2.4s → <100ms for list, 30s → <500ms for messages. Fixes #762 --------- Co-authored-by: you <you@example.com> |
||
|
|
bffcbdaa0b |
feat: add channel UX — visible button, hint, status feedback (#760)
## Fixes #759 The "Add Channel" input was a bare text field with no visible submit button and no feedback — users didn't know how to submit or whether it worked. ### Changes **`public/channels.js`** - Replaced bare `<input>` with structured form: label, input + button row, hint text, status div - Added `showAddStatus()` helper for visual feedback during/after channel add - Status messages: loading → success (with decrypted message count) / warning (no messages) / error - Auto-hide status after 5 seconds - Fallback click handler on the `+` button for browsers that don't fire form submit **`public/style.css`** - `.ch-add-form` — form container - `.ch-add-label` — bold 13px label - `.ch-add-row` — flex row for input + button - `.ch-add-btn` — 32×32 accent-colored submit button - `.ch-add-hint` — muted helper text - `.ch-add-status` — feedback with success/warn/error/loading variants **`test-channel-add-ux.js`** — 20 tests validating HTML structure, CSS classes, and feedback logic ### Before / After **Before:** Bare input field, no button, no hint, no feedback **After:** Labeled section with visible `+` button, format hint, and status messages showing decryption results --------- Co-authored-by: you <you@example.com> |
||
|
|
3bdf72b4cf |
feat: clock skew UI — node badges, detail sparkline, fleet analytics (#690 M2+M3) (#752)
## Summary Frontend visualizations for clock skew detection. Implements #690 M2 and M3. Does NOT close #690 — M4+M5 remain. ### M2: Node badges + detail sparkline - Severity badges (⏰ green/yellow/orange/red) on node list next to each node - Node detail: Clock Skew section with current value, severity, drift rate - Inline SVG sparkline showing skew history, color-coded by severity zones ### M3: Fleet analytics view - 'Clock Health' section on Analytics page - Sortable table: Name | Skew | Severity | Drift | Last Advert - Filter buttons by severity (OK/Warning/Critical/Absurd) - Summary stats: X nodes OK, Y warning, Z critical - Color-coded rows ### Changes - `public/nodes.js` — badge rendering + detail section - `public/analytics.js` — fleet clock health view - `public/roles.js` — severity color helpers - `public/style.css` — badge + sparkline + fleet table styles - `cmd/server/clock_skew.go` — added fleet summary endpoint - `cmd/server/routes.go` — wired fleet endpoint - `test-frontend-helpers.js` — 11 new tests --------- Co-authored-by: you <you@example.com> |
||
|
|
401fd070f8 |
fix: improve trackedBytes accuracy for memory estimation (#751)
## Problem Fixes #743 — High memory usage / OOM with relatively small dataset. `trackedBytes` severely undercounted actual per-packet memory because it only tracked base struct sizes and string field lengths, missing major allocations: | Structure | Untracked Cost | Scale Impact | |-----------|---------------|--------------| | `spTxIndex` (O(path²) subpath entries) | 40 bytes × path combos | 50-150MB | | `ResolvedPath` on observations | 24 bytes × elements | ~25MB | | Per-tx maps (`obsKeys`, `observerSet`) | 200 bytes/tx flat | ~11MB | | `byPathHop` index entries | 50 bytes/hop | 20-40MB | This caused eviction to trigger too late (or not at all), leading to OOM. ## Fix Expanded `estimateStoreTxBytes` and `estimateStoreObsBytes` to account for: - **Per-tx maps**: +200 bytes flat for `obsKeys` + `observerSet` map headers - **Path hop index**: +50 bytes per hop in `byPathHop` - **Subpath index**: +40 bytes × `hops*(hops-1)/2` combinations for `spTxIndex` - **Resolved paths**: +24 bytes per `ResolvedPath` element on observations Updated the existing `TestEstimateStoreTxBytes` to match new formula. All existing eviction tests continue to pass — the eviction logic itself is unchanged. Also exposed `avgBytesPerPacket` in the perf API (`/api/perf`) so operators can monitor per-packet memory costs. ## Performance Benchmark confirms negligible overhead (called on every insert): ``` BenchmarkEstimateStoreTxBytes 159M ops 7.5 ns/op 0 B/op 0 allocs BenchmarkEstimateStoreObsBytes 1B ops 1.0 ns/op 0 B/op 0 allocs ``` ## Tests - 6 new tests in `tracked_bytes_test.go`: - Reasonable value ranges for different packet sizes - 10-hop packets estimate significantly more than 2-hop (subpath cost) - Observations with `ResolvedPath` estimate more than without - 15 observations estimate >10x a single observation - `trackedBytes` matches sum of individual estimates after batch insert - Eviction triggers correctly with improved estimates - 2 benchmarks confirming sub-10ns estimate cost - Updated existing `TestEstimateStoreTxBytes` for new formula - Full test suite passes --------- Co-authored-by: you <you@example.com> |