meshcore-analyzer

mirror of https://github.com/Kpa-clawbot/meshcore-analyzer.git synced 2026-06-05 06:41:20 +00:00

Author	SHA1	Message	Date
Kpa-clawbot	c1d94f7db5	fix(#1273 ): collapse QR overlay wrap to content height (#1277 ) ## Summary Fixes #1273 — `.node-top-row .node-qr-wrap` was 2-3× taller than the QR canvas inside it, leaving empty translucent space below the QR. ## Root cause Three compounding issues: 1. SVG intrinsic height not constrained. `qrcode-generator` emits an SVG with fixed `width`/`height` attributes (e.g. 147×147). The CSS rule `.node-qr svg { max-width: 100px }` (and 72px mobile) constrains width only, so the svg's intrinsic height (147px) is preserved and the wrap is sized to that. 2. Flex stretch. `.node-top-row` is `display:flex` with default `align-items:stretch`, so the QR column was forced to match the map column's height (~280px) on desktop. 3. Excess padding/margin added another ~24px above and below the visible QR. ## Fix Three small CSS changes in `public/style.css`: \| change \| effect \| \|---\|---\| \| `.node-qr svg { height: auto; }` \| svg height scales with constrained width \| \| `.node-top-row .node-qr-wrap { align-self: flex-start; }` \| wrap sizes to content, not column \| \| `.node-top-row .node-qr-wrap { padding: 8px; }` + zero inner `.node-qr` margin-top \| tight hug \| ## Measurements (real-data fixture, full node detail page) \| viewport \| wrap.height before \| wrap.height after \| QR canvas \| \|---\|---\|---\|---\| \| 375×800 (mobile overlay) \| 165px \| 82px \| 72×72 \| \| 1280×800 (desktop side-by-side) \| 217px \| 154px \| 100×100 (+ 28px caption) \| Overlay remains `position:absolute` top-right on mobile; the original #1243 behavior is preserved. ## TDD - RED: `test-issue-1273-qr-overlay-height-e2e.js` asserts wrap height ≤ visible QR + caption + 32px at 375×800 and 1280×800. Failed on master with deltas of 93px (mobile) and 89px (desktop). - GREEN: both viewports pass after the CSS fix. Wired into the deploy workflow alongside the other `test-issue-*-e2e.js` runs. ## Acceptance checklist - [x] Container height ≈ QR canvas height + 16-24px padding total - [x] No empty translucent space below the QR - [x] E2E asserts at 375×800 and 1280×800 - [x] Desktop layout unchanged (overlay position preserved; column no longer stretches but the QR card is the same width) - [x] All colors via CSS variables - [x] #1243 overlay behavior preserved (still top-right on mobile, still rendered) ## Commits - `e9d75c92` test(#1273): RED - `13899270` fix(#1273): collapse QR overlay wrap --------- Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-18 22:51:29 -07:00
Kpa-clawbot	21b6eb0d63	fix(live legend): document ACK/RESPONSE/PATH + white-ring repeater convention (#1274 ) (#1276 ) RED commit `ac1fb4c3` (Playwright E2E asserts legend rows for ACK / RESPONSE / PATH text + "ring" + "repeater" — fails on master). CI: https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1274 ## What The Live legend rendered five packet-type rows but the codebase defines eight `TYPE_COLORS`. The three gray-area types (ACK, RESPONSE, PATH) had no swatch in the legend, leaving operators guessing what gray dots meant — they're either ACKs or unknown payload types. Separately, the L.circleMarker styling block uses a brighter white ring to mark repeaters vs. all other roles; that convention was nowhere on screen. ## Changes - `public/live.js` legend HTML — adds rows for RESPONSE, PATH and a combined Ack / Other row (covering both ACK and the unknown-type fallback that share `#6b7280`). Adds a new MARKER STYLES subsection below NODE ROLES with two entries: bright white ring = repeater, faded ring = other. - `public/live.css` — adds `.live-ring` / `.live-ring--repeater` / `.live-ring--other` swatches. Background uses `var(--text-muted)`; only the white border + opacity differ between the two, matching the actual circleMarker weights (1.5 / 0.5) and opacities (0.6 / 0.3). - `test-issue-1274-legend-coverage-e2e.js` — Playwright E2E (desktop + mobile attached-DOM) asserting all four new pieces. ## Notes - All colors via `TYPE_COLORS` — no hardcoded hex in HTML. - Legend is `display:none` at ≤640px (existing #279 behavior), so no mobile CSS tweak required for the longer list. - Does not touch the legend toggle (#1219), mobile single-row header (#1234), or VCR visibility (#1269). Fixes #1274. --------- Co-authored-by: corescope-bot <bot@meshcore.local>	2026-05-18 22:51:26 -07:00
Kpa-clawbot	8bf7709970	feat(repeater): usefulness score — bridge axis (#672 axis 2 of 4) (#1275 ) RED test commit: `fd661569` — CI will fail on this (stub returns empty map; assertions fail by design). GREEN: `bf4b8592`. ## What Implements axis 2 of 4 for the repeater usefulness score per #672 ([status comment](https://github.com/Kpa-clawbot/CoreScope/issues/672#issuecomment-4484635378)). The Bridge axis measures structural importance: how many shortest paths between other nodes route through this one. A high-traffic redundant node and a low-traffic critical bridge will no longer look identical. ## Algorithm Brandes' weighted betweenness centrality with Dijkstra for shortest paths (`cmd/server/bridge_score.go`). - Nodes: pubkeys in the `neighbor_edges` graph - Edge weight: `Score(now) * Confidence()` — per the convention from #1235 (count + recency decay scaled by observer-diversity confidence). Geo-rejected edges already excluded at graph build time (#1230) so we don't re-filter here. - Dijkstra distance: `1 / max(epsilon, weight)` — high affinity = cheap cost. - Normalize: divide by max observed centrality so output is in `[0, 1]`. Cost: `O(V · (E + V log V))`. Staging-scale (~600 nodes / ~2 000 edges) ≈ ~4.8M ops, completes in milliseconds. ## Where it lives - `cmd/server/bridge_score.go` — pure algorithm, no locks - `cmd/server/bridge_recomputer.go` — background recomputer (mirrors #1240/#1262 pattern), 5-min default interval, initial sync prewarm, snapshot stored in `s.bridgeScoreMap atomic.Pointer[map[string]float64]` - `cmd/server/routes.go` — `handleNodes` adds `node["bridge_score"]` on repeater/room rows; node-detail handler adds it on the single-node path - `public/nodes.js` — separate Bridge row in the node detail panel, alongside the existing Usefulness (Traffic) row. Distinct colour-coded bar. ## What's NOT in this PR (still pending for #672) - Coverage axis (axis 3) — unique observer-pair connectivity - Redundancy axis (axis 4) — simulated node-removal impact - Composite — once all 4 axes ship, swap the `usefulness_score` formula from "traffic-only" to the weighted composite `Refs #672` (not `Fixes` — issue stays open until all 4 axes + composite ship). ## Tests - `TestComputeBridgeScores_LineGraph` — 4-node line: middles non-zero, leaves zero, max normalized to 1.0 - `TestComputeBridgeScores_TriangleNoBridge` — clique has zero bridges - `TestComputeBridgeScores_Empty` — defensive nil-safety - `TestComputeBridgeScores_WeightSensitive` — mutation guard: revert the `1/w` inversion and this test fails - `TestBridgeScore_HandleNodesSurface` — integration: `/api/nodes` returns `bridge_score` on repeater rows; middle nodes > 0, ends == 0 --------- Co-authored-by: clawbot <bot@meshcore.local>	2026-05-18 22:51:23 -07:00
Kpa-clawbot	c09fec56ff	ci: update go-server-coverage.json [skip ci]	2026-05-19 01:42:02 +00:00
Kpa-clawbot	6dbfd331a6	ci: update go-ingestor-coverage.json [skip ci]	2026-05-19 01:42:01 +00:00
Kpa-clawbot	a00e1c0e18	ci: update frontend-tests.json [skip ci]	2026-05-19 01:42:00 +00:00
Kpa-clawbot	763d4f707c	ci: update frontend-coverage.json [skip ci]	2026-05-19 01:41:59 +00:00
Kpa-clawbot	ad467daeeb	ci: update e2e-tests.json [skip ci]	2026-05-19 01:41:57 +00:00
Kpa-clawbot	46ce9590f1	fix(#1270 ): Prefix Tool Network Overview shows configured-hash-size counts, not math-only slices (#1271 ) Red commit: `6b68080c24106301b6bfc25f8a05484f07d0612d` (test added that fails on master). CI: see Checks tab on this PR. Fixes #1270. ## Problem Two analytics surfaces told contradictory stories about prefix usage: - Prefix Tool → Network Overview showed e.g. `168 / 65,536` for the 2-byte tier — a pure math fact: every repeater pubkey sliced to 2 bytes yields N distinct values. Because collisions are rare, this number always equals (or nearly equals) the repeater count, making it look like the whole network uses 2-byte hashing. - Hash Stats → By Repeaters showed configured-hash-size counts straight from `/api/analytics/hash-sizes` `distributionByRepeaters` — usually a minority on 2-byte and near-zero on 3-byte. The Prefix Tool was presenting a math fact as if it were operational truth. ## Fix `renderPrefixTool` now also fetches `/api/analytics/hash-sizes` and restructures each tier card into three labeled stats with explicit hierarchy: 1. Primary — `X of Y repeaters configured` (from `distributionByRepeaters`). Same source the Hash Stats tab uses, so the two pages agree exactly. 2. Operational collisions — colliding slices among repeaters configured for this hash size only (matches Hash Issues semantics). 3. Theoretical (secondary, smaller, dashed-rule footnote) — `X unique N-byte slices across all repeater pubkeys (of Y possible)`. The math fact is preserved as educational info, no longer impersonating operational truth. The "Total repeaters" card now also notes how many have a known configured hash size. The "About these numbers" footer was rewritten to explain the three numbers and link to both Hash Stats and Hash Issues. The prefix collision detector (Check / Generate panels) is unchanged — it still scans every repeater pubkey because that is its job. ## Test Added `#1270 Prefix Tool primary counts match Hash Stats By Repeaters` to `test-e2e-playwright.js`. It fetches `/api/analytics/hash-sizes` for the ground-truth `distributionByRepeaters`, then visits `#/analytics?tab=prefix-tool`, opens Network Overview, and scrapes the primary count via a new `data-pt-configured="<bytes>"` `data-value="<count>"` marker on each tier card, asserting exact equality for 1/2/3-byte. - Red commit `6b68080c` (test only): fails on master with `NO data-pt-configured marker`. - Green commit `12ed2789` (fix): test passes; full E2E suite `123/126 passed, 3 skipped`. ## Acceptance - [x] Prefix Tool Network Overview shows configured-hash-size repeater counts as the primary number - [x] "Unique slices" math is shown as secondary/educational - [x] Two pages tell the same story (E2E asserts byte-equal match) - [x] E2E asserts the configured-count matches what Hash-Sizes tab shows at the same point in time	2026-05-18 18:20:29 -07:00
Kpa-clawbot	0022c8fd1f	ci: update go-server-coverage.json [skip ci]	2026-05-18 22:43:47 +00:00
Kpa-clawbot	7827c8e778	ci: update go-ingestor-coverage.json [skip ci]	2026-05-18 22:43:46 +00:00
Kpa-clawbot	cb0218fc4d	ci: update frontend-tests.json [skip ci]	2026-05-18 22:43:46 +00:00
Kpa-clawbot	385f49b3d8	ci: update frontend-coverage.json [skip ci]	2026-05-18 22:43:45 +00:00
Kpa-clawbot	f4ecc96ccc	ci: update e2e-tests.json [skip ci]	2026-05-18 22:43:44 +00:00
Kpa-clawbot	78b666c248	fix(#1267 ): mobile VCR bar invisible — JS height clobbered bottom-nav reserve (#1269 ) ## Summary Mobile-only regression: on the Live page at ≤768px viewports the VCR bar was rendered behind the fixed bottom-nav and never visible to the user. iOS Safari screenshot at 375x812 showed: top header strip, full-height map, bottom-nav — no VCR row at all. Fixes #1267. ## Root cause `public/live.js` `initResizeHandler` (the existing JS height override) was setting `page.style.height = window.innerHeight + 'px'`, which clobbered the CSS rule that already subtracts `--bottom-nav-reserve` from the live-page height. Because `.live-page` then spanned the full viewport, the VCR bar (`position:absolute; bottom:0; z-index:1000`) was painted underneath `.bottom-nav` (`position:fixed; z-index:1200`). The VCR bar element WAS in the DOM, WAS `display: flex`, and HAD `height: 53px` — it just sat at y=758..812 underneath the bottom-nav at y=754..812. CSS-only checks for `display:none` would never catch this; the test asserts the bar's bottom edge is at or above the bottom-nav's top edge. ## Fix One-liner in spirit: subtract the bottom-nav height before applying `page.style.height`. The implementation measures the rendered `.bottom-nav` (with a fallback to a hidden probe that resolves the `--bottom-nav-reserve` token), so it survives safe-area inset and the bottom-nav's 1px border. ```js const reserve = /* measure .bottom-nav, fall back to --bottom-nav-reserve token /; const h = Math.max(0, window.innerHeight - reserve); ``` Desktop is unchanged: `.bottom-nav` is `display: none`, the probe resolves to 0, and `h === window.innerHeight` exactly as before. ## TDD - RED* (commit 1): `test-e2e-1267-mobile-vcr.js` — Playwright at iPhone 375x812 asserts `.vcr-bar` has `display !== 'none'`, `visibility !== 'hidden'`, `height > 0`, `top < viewport.height`, and (the key check) `bottom <= bottom-nav.top`. Fails on `master` with: "VCR bar bottom 812 overlaps bottom-nav top 754". - GREEN (commit 2): the fix above. Test passes: "VCR bar bottom 754 ≤ bottom-nav top 754". ## Verification - ✅ Mobile (375x812) repro reproduced against `master` (bar at y=758..812, behind bottom-nav) - ✅ Mobile (375x812) E2E green after fix (bar at y=700..754, flush above bottom-nav) - ✅ Desktop (1440x900) unaffected — bottom-nav hidden, page height = viewport height as before, VCR bar at viewport bottom - ✅ #1234 (top-nav hidden on /live), #1246 (single-row VCR), #1206/#1213 (VCR/feed clearance) unchanged — none touched ## Files - `public/live.js` — single function (`initResizeHandler`) modified - `test-e2e-1267-mobile-vcr.js` — new mobile-viewport Playwright regression test Run: `BASE_URL=http://localhost:13581 node test-e2e-1267-mobile-vcr.js` --------- Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-18 15:27:05 -07:00
Kpa-clawbot	d4d569278d	ci: update go-server-coverage.json [skip ci]	2026-05-18 19:46:11 +00:00
Kpa-clawbot	ee21eafa66	ci: update go-ingestor-coverage.json [skip ci]	2026-05-18 19:46:10 +00:00
Kpa-clawbot	c6a90e9896	ci: update frontend-tests.json [skip ci]	2026-05-18 19:46:09 +00:00
Kpa-clawbot	92518ab234	ci: update frontend-coverage.json [skip ci]	2026-05-18 19:46:08 +00:00
Kpa-clawbot	cd84f51f8a	ci: update e2e-tests.json [skip ci]	2026-05-18 19:46:08 +00:00
Kpa-clawbot	4cd8445233	perf(#1265 ): wire /api/observers/clock-skew + /api/nodes/clock-skew into analytics recomputer (#1266 ) RED: `97f49a0c` · CI: https://github.com/Kpa-clawbot/CoreScope/actions/runs/26046530920 Fixes #1265. ## Problem On staging two clock-skew endpoints serve compute-on-request: - `/api/observers/clock-skew` — 3.3s - `/api/nodes/clock-skew` — 8.9s Both drive a full `clockSkew.Recompute` over 100k+ adverts while holding `s.mu.RLock`, blocking under concurrent reader load. ## Fix Wire both endpoints into the established `analytics_recomputer.go` pattern (PRs #1248 / #1259 / #1263). Two new slots: - `recompObserversClockSkew` — wraps `computeObserverCalibrations()` - `recompNodesClockSkew` — wraps `computeFleetClockSkew()` Accessors `GetObserverCalibrations` / `GetFleetClockSkew` now prefer the atomic-pointer snapshot; on-request compute is fallback-only for the brief window before initial sync compute lands (and for tests that skip the recomputer). Default interval 300s, overridable via: ```json "analytics": { "recomputeIntervalSeconds": { "observersClockSkew": 300, "nodesClockSkew": 300 } } ``` `config.example.json` + the `_comment_analytics` doc updated. ## TDD - RED `97f49a0c` — `TestClockSkewRecomputersRegistered` + `TestClockSkewHandlersSteadyStateLatency` (8 concurrent readers × 25 reqs per endpoint, p99 < 100ms gate). Fails on master: recomputer slots nil. - GREEN `19599375` — wire + accessor switch. p99 well under 5ms on the test fixture. ## Verification ``` cd cmd/server && go test ./... -count=1 # ok 42s bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master # all gates pass ``` --------- Co-authored-by: CoreScope Bot <bot@corescope.local>	2026-05-18 12:27:44 -07:00
Kpa-clawbot	ae17a2be12	perf(#1262 ): /api/nodes?limit=2000 cold-miss 15.7s → <100ms — prewarm repeater enrichment cache (#1263 ) RED commit: `22ce5736066142583017cad7303fa48d9e00ccf0` — CI on red: https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1262 ## Problem After #1260 added a 15s-TTL bulk cache for repeater enrichment in `handleNodes`, `/api/nodes` (default limit) dropped to ~500ms. But `/api/nodes?limit=2000` — called by `public/live.js` at SPA startup for hop resolution — still took 15.7s cold on staging (75k tx, 600 nodes). Warm hits were ~40ms. Root cause: the bulk cache was lazily populated on the first request after TTL expiry. The rebuild ran on the request-serving goroutine. Every cold SPA load triggered the rebuild and ate 15s. ## Fix Add `StartRepeaterEnrichmentRecomputer` — a steady-state background recomputer that mirrors the `analytics_recomputer.go` pattern from #1240: - Prewarm: initial synchronous compute on Start so the first request hits a populated cache. - Steady-state: ticker refreshes the snapshot every 5min (configurable via the existing analytics recompute interval knob). - Panic-safe + idempotent Start. Wired into `main.go` right after `StartAnalyticsRecomputers`, using `cfg.GetHealthThresholds().RelayActiveHours` as the window. ## Test `TestHandleNodesLimit2000ColdMiss` — seeds 600 nodes + 150k non-advert tx with repeaters indexed under a shared 1-byte hop prefix (matches production hop-prefix collisions), starts the recomputer, then issues `/api/nodes?limit=2000` with no HTTP warmup. \| State \| Latency \| \|---\|---\| \| Before (master, on-thread rebuild) \| 3.37s \| \| After (prewarm + steady-state) \| 56ms \| \| Budget \| 2s \| Staging end-to-end: 15.7s → expected sub-100ms on the same call path. Red commit (`22ce5736066142583017cad7303fa48d9e00ccf0`) compiles with a no-op stub of the new method so the test fails on the latency assertion, not a missing symbol. Fixes #1262 --------- Co-authored-by: corescope-bot <bot@corescope.local>	2026-05-18 09:22:27 -07:00
Kpa-clawbot	094a96bd6c	perf(#1258 ): /#/perf — parallel health fetch, sort endpoints, pause refresh while hidden (#1261 ) Fixes #1258 — Perf dashboard (/#/perf) was slow because of three frontend issues; backend APIs were never the problem. ## Findings 1. `/api/health` fetched sequentially after `Promise.all` in `refresh()` — added a full RTT (~50-200ms) on every 5s tick on top of the parallel batch. 2. Endpoints table not actually sorted despite the heading "sorted by total time". JSON shape is `map[string]EndpointStatsResp` (no defined order); frontend rendered map iteration order. Visible correctness bug surfaced during investigation. 3. `setInterval(refresh, 5000)` kept firing while tab was hidden, rebuilding the entire ~10-section `innerHTML` (cards + 3 tables) in the background. On tab return the user saw a backlog thrash + felt the page was "slow to render". ## Fix (`public/perf.js`) - Move `/api/health` into the same `Promise.all` as the other 4 endpoints — saves one RTT per refresh. - Sort `Object.entries(server.endpoints)` by `count * avgMs` DESC client-side. - Add `document.hidden` guard in the interval tick + `visibilitychange` listener that refreshes once on return; `destroy()` removes the listener. ## Tests `test-perf-render-1258.js` (new): - All 5 initial fetches issued in parallel (including `/api/health`) - Refresh suppressed while `document.hidden` - Endpoints table sorted by total time DESC, regardless of input map order RED commit first (`6b54f9e8`, 0/3 pass) → GREEN commit (`be81303b`, 3/3 pass). Existing `test-perf-go-runtime.js` (13/13) and `test-perf-disk-io-1120.js` (15/15) still green. ## Investigation exemption No Playwright timing test — sandbox can't run a real browser. Static analysis + render-shape unit tests cover the three identified bottlenecks. Documented per AGENTS "investigation surfaces" exemption. ## Measurement Before: refresh = parallel batch (~max(server-side)) + sequential `/api/health` (~50ms) + full innerHTML rebuild every 5s including hidden tabs. After: refresh = single parallel batch, runs only while visible. Expected improvement on tab-return ≈ -1 RTT per refresh + zero background work. --------- Co-authored-by: corescope-bot <bot@corescope.local>	2026-05-18 08:02:27 -07:00
Kpa-clawbot	1efe93d7f6	perf(#1257 ): bulk-cache repeater enrichment in /api/nodes — 32s → <500ms (#1260 ) RED commit `a2879e12` — perf regression test; CI run: see Actions tab. Fixes #1257. ## Root cause `handleNodes` looped over the response page and called `store.GetRepeaterRelayInfo(pk, win)` + `store.GetRepeaterUsefulnessScore(pk)` for every repeater/room. Each call: - grabbed its own `s.mu.RLock`, - walked `byPathHop[pk]` (+ the matching 1-byte raw-prefix bucket, which on busy networks fans out to nearly the entire non-advert tx set), - and re-parsed every `tx.FirstSeen` with `parseRelayTS`. Default page is the 50 most-recently-seen nodes — almost all hot repeaters — so the request did O(50) lock acquisitions and hundreds of thousands of timestamp parses on the same set of txs. That's the classic load-then-paginate / per-row N+1 shape called out in the issue (same family as #1226). The `?limit=2000` variant looks faster relatively only because per-node enrichment dwarfs serialization; on staging both still bottleneck on the same loop. ## Fix Two new bulk methods on `PacketStore`: - `GetRepeaterRelayInfoMap(windowHours)` → `pubkey → RepeaterRelayInfo` - `GetRepeaterUsefulnessScoreMap()` → `pubkey → 0..1` Both snapshot `byPathHop` under a single `RLock`, pre-parse each `FirstSeen` exactly once (a tx that appears in N hop buckets used to be parsed N times), and emit one entry per hop key. Cached 15s — same TTL as `GetNodeHashSizeInfo` / `GetMultiByteCapMap`, same status-column freshness budget. `handleNodes` is one map-lookup per node; behavior, output schema, and `RelayActive` / `RelayCount{1h,24h}` / `LastRelayed` / `usefulness_score` semantics are preserved. ## Why no `limit` default change The issue mentioned a default-limit knob. Investigated: `queryInt(r, "limit", 50)` already defaults to 50 — frontends calling `/api/nodes` (no limit) get a 50-row page today. Capping further would change behavior (live.js already passes `?limit=2000` when it wants more); the cost was per-repeater enrichment, not page size. Fixing the N+1 is the correct lever and preserves backward compat. ## Perf Regression test `TestHandleNodesPerfLargeFleet` (600 nodes, 150k non-advert tx, repeaters indexed under `byPathHop`): \| \| elapsed \| vs 2s budget \| \|---\|---\|---\| \| before (master) \| 4.72s \| ✗ \| \| after \| ~4ms \| ✓ (~1000×) \| ## TDD - RED: `a2879e12` — test fails at 4.72s on master. - GREEN: `c529d29a` — fix; full `cmd/server` + `cmd/ingestor` suites green. --------- Co-authored-by: corescope-bot <bot@corescope>	2026-05-18 07:36:33 -07:00
Kpa-clawbot	f81ed5b3cf	perf(#1256 ): wire /api/analytics/roles into steady-state recomputer (#1259 ) RED commit: `0190466d` — failing CI: https://github.com/Kpa-clawbot/CoreScope/actions (will populate after PR creation) ## Problem On staging (commit `d69d9fb`, 78k tx, 2.3M obs), `curl http://localhost/api/analytics/roles` times out at 60s with 0 bytes — the Roles tab is unusable. Issue #1256. PR #1248's steady-state recomputer fan-out (topology / rf / distance / channels / hash-collisions / hash-sizes) didn't include roles. The legacy handler: 1. Holds `s.mu.RLock` for the entire compute. 2. Calls `GetFleetClockSkew()`, which drives `clockSkew.Recompute(s)` over all ADVERT transmissions — O(78k) per request. 3. Concurrent ingest writers compound the latency through writer-starvation. Result: every request hits the cold path; the response never comes back inside the 60 s HTTP budget. ## Fix Add `roles` as the 7th endpoint in the recomputer fan-out — same pattern as #1248: - `PacketStore.recompRoles` slot, registered in `StartAnalyticsRecomputers` with default 5-min interval. - `PacketStore.GetAnalyticsRoles()` → atomic-pointer load from the snapshot (sub-ms), with a `computeAnalyticsRoles()` fallback only for the brief startup window before the initial sync compute completes. - Handler is now a thin wrapper — no lock-held work on the request path. - New optional `roles` key under `analytics.recomputeIntervalSeconds` in config; `config.example.json` and `_comment_analytics` updated. ## Latency (unit-scope benchmark) - Worst-of-50 handler latency: <100 ms (test budget; well under the 2 s p99 acceptance). - Compute itself is bounded by the existing 5-min recompute window — it runs once in the background, never on the request path. ## Tests - RED `0190466d`: asserts `recompRoles` is registered and the handler returns under the latency budget. Fails on master with `recompRoles not registered`. - GREEN `d7784f76`: registers the recomputer + snapshot accessor — both tests pass. Fixes #1256 --------- Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-18 07:36:28 -07:00
Kpa-clawbot	d69d9fbf8e	perf(#1247 ): surgical fix for resolveWithContext tier-1 hot path (4.6× speedup) (#1253 ) ## Summary Surgical fix for #1247: analytics endpoints regressed 3-9× between prod `d818527` and master. pprof against staging traced the regression to `resolveWithContext` tier-1 affinity loop running on every analytics `resolveHop` call (post-#1198 plumbing) with redundant per-(cand, ctx) work. Result: 4.6× speedup on the synthetic hot-shape benchmark (202µs → 44µs / op). ## Root cause - PR #1198 (`353c5264`) lit up `resolveWithContext` tier 1 from every analytics resolveHop closure (previously they passed `contextPubkeys=nil` and short-circuited the entire tier-1 block). - The inner loop did `N_cand × N_ctx` iterations where each one did: - `graph.Neighbors(strings.ToLower(ctxPK))` — graph RLock + ToLower allocation per candidate, redundantly - `strings.ToLower(cand.PublicKey)` per `ctxPK` - `strings.EqualFold(otherPK, ctxPK)` + `EqualFold(otherPK, candPK)` — both sides were already lowercased (`NeighborEdge.NodeA/B` via `makeEdgeKey`; `contextPubkeys` via `buildHopContextPubkeys`) - At staging scale (5k+ contextPubkeys × 30k+ resolveHop calls) this dominated `computeAnalyticsTopology` (37% of its CPU) and `computeAnalyticsRF` (55%). ## pprof attribution (staging, region-keyed queries bypassing #1240 cache) ``` computeAnalyticsTopology cum: 19.24% (5.45s / 28.32s sampled) └─ resolveWithContext 37% ├─ strings.ToLower 41% ├─ strings.EqualFold 28% └─ graph.Neighbors 24% computeAnalyticsRF cum: 10.38% ``` ## Fix (~80 LoC in `cmd/server/store.go`) 1. Lowercase `contextPubkeys` once per call, skipped entirely when already lowercased (the analytics fast path). 2. Lowercase candidate pubkeys once per call. 3. Invert the loop nesting: outer-ctx / inner-edge / candidate-map lookup. `graph.Neighbors` is called once per context pubkey instead of `N_cand` times. 4. Raw `==` instead of `strings.EqualFold` for pubkey comparisons (both sides lowercased by step 1/2). 5. Added a tiny `hasUpperASCII` byte-loop helper next to `isHexLower` for the fast-path check. Behavior preserved: same `Score × Confidence` formula, same tier-1 ratio + min-observations gate, same per-candidate "best edge wins" semantics. No change to tiers 2/3/4. ## TDD evidence - Red commit (`5f8d1564`): `TestResolveWithContextTier1Floor` asserts `<100 µs/call` on the hot shape. 199 µs/call on regressed master → FAIL. - Green commit (`e3bdbc65`): surgical fix lands. 44 µs/call → PASS. - Reverification: locally stashed the fix, ran the test → 199.5 µs FAIL; popped fix → 44 µs PASS. `BenchmarkResolveWithContextTier1Hot` (no assertion, visibility only): ``` before: 202013 ns/op 168 B/op 3 allocs/op after: 44084 ns/op 424 B/op 6 allocs/op speedup: 4.6× ``` (Post-fix allocs are O(N_cand + N_ctx) one-time helper tables — net win at hot scale.) ## Independence from #1248 PR #1248 caches the analytics compute output so user-facing latency is sub-ms even when the compute is slow. That's correct for UX but it masks the regression. This PR repairs the compute itself, so: - Region-keyed and windowed queries (which bypass the recomputer cache by design — see #1240) become fast again. - Future ingest scale or feature work on top of the regressed baseline doesn't compound. ## Out of scope - The geo-rejection (#1228) and Confidence weighting (#1229) commits — kept intact, they protect correctness and were not the dominant CPU cost. - Reverting any suspect commit — surgical only. ## Acceptance criteria from #1247 - [x] pprof confirms the hot function (`resolveWithContext`) - [x] Bisect identifies the regressing commit (`353c5264` / PR #1198 — context plumbing; ratified by pprof, no need to actually rebuild 5 binaries) - [x] Fix lands; tier-1 hot path 4.6× faster - [x] No regression in disambiguator correctness — full `go test ./...` green, all existing `ResolveWithContext` / `HopDisambig` / `NeighborGraph` / `Affinity` tests pass Fixes #1247 --------- Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-17 16:42:01 -07:00
Kpa-clawbot	1d33ac53b0	fix(#1254 ): trim .badge-iata h-padding on mobile to clear 1.25px clip (#1255 ) Fixes #1254. Master CI Playwright fail-fast on every push since #1252: ``` ❌ Mobile viewport (375px): observer IATA badge stays visible — not clipped: .badge-iata right edge 376.25 exceeds 375px viewport ``` ## Root cause After #1252 unhid `.col-observer` at narrow widths so the IATA pill from #1188 renders on mobile, at 375px the cell padding + truncated observer name (10 chars in grouped rows) + `.badge-iata` pill (`padding: 1px 5px` + `margin-left: 4px`) sums to ~376.25px — overflowing the viewport by 1.25px. Same class of failure as #1250/#1251 (VCR LCD-clip). ## Fix `public/style.css` — inside the existing `@media (max-width: 640px)` block, shrink `.badge-iata` `padding: 1px 5px → 1px 3px` and `margin-left: 4px → 2px`. Reclaims ~6px horizontally, well clear of the 1.25px overflow. Desktop (≥641px) styling untouched. ## TDD The failing E2E sub-test in `test-observer-iata-1188-e2e.js` (added in #1189 R1) IS the red. Mutation verified locally: \| Variant \| Result \| \|--------------------\|--------\| \| WITHOUT this fix \| ❌ `.badge-iata right edge 376.25 exceeds 375px viewport` \| \| WITH this fix \| ✅ all 3 sub-tests pass \| ## Local verification ``` $ go build -o /tmp/corescope-server ./cmd/server $ /tmp/corescope-server -port 13581 -db test-fixtures/e2e-fixture.db -public public & $ CHROMIUM_PATH=/usr/bin/chromium BASE_URL=http://localhost:13581 \ node test-observer-iata-1188-e2e.js Running observer-IATA E2E tests against http://localhost:13581 ✅ Packets table renders an IATA badge in an observer cell ✅ Filter grammar: observer_iata == "<code>" narrows the table ✅ Mobile viewport (375px): observer IATA badge stays visible — not clipped All observer-IATA E2E tests passed. ``` ## Constraints honored - All colors via existing CSS variables (no theming illusions; only `padding` / `margin-left` change inside `@media (max-width: 640px)`). - No JS changes. - Desktop badge display unaffected (selector scoped to narrow viewport). - `config.example.json`: no config field added. - PII preflight: clean. Co-authored-by: OpenClaw Bot <bot@openclaw.local>	2026-05-17 16:26:51 -07:00
Kpa-clawbot	43203b09b7	fix(#1249 ): IATA badge missing on fixture + mobile clipping (#1252 ) Failing test commit: `bdb4eefb` (added in #1189 R1) — original CI failure: https://github.com/Kpa-clawbot/CoreScope/actions/runs/25995819598 Fixes #1249. ## Root cause Two independent bugs surfaced by the same E2E test: 1. Fixture join broken. `scripts/capture-fixture.sh` wrote the text observer hash into `observations.observer_idx`, but the v3 join in `cmd/server` is `observers.rowid = observations.observer_idx`. The join silently nulled out `observer_id` / `observer_iata` for every packet. 2. Mobile clipping. `.col-observer` had `data-priority=3` (hides at ≤1024px) and was in the narrow-viewport `defaultHidden` list, so at 375px the cell collapsed to `display:none` and `.badge-iata` had a 0×0 box. ## Changes - `test-fixtures/e2e-fixture.db`: remap `observer_idx` text hash → integer rowid (500/500 rows resolved). - `scripts/capture-fixture.sh`: build an `observer_id → rowid` map before insert; skip rows whose observer isn't in the fixture. Comment explains the trap. - `public/packets.js`: bump `.col-observer` priority `3 → 1` and drop `observer` from narrow-viewport `defaultHidden`. ## Verification All three sub-tests in `test-observer-iata-1188-e2e.js` pass locally against the freshened fixture. `curl /api/packets?limit=5` returns real IATA codes (OAK / MRY / SFO) instead of empty strings. Co-authored-by: OpenClaw Bot <bot@openclaw.local>	2026-05-17 20:06:25 +00:00
Kpa-clawbot	45872c8371	fix(#1250 ): trim mobile VCR bar h-padding 8px→4px to clear 0.83px LCD clip (#1251 ) Red: master CI run https://github.com/Kpa-clawbot/CoreScope/actions/runs/25995768081 already fails on `test-e2e-playwright.js` `#1221 LCD clipped on right (right=375.828125, vw=375)`. No new test commit — the existing E2E assertion is the gate. Root cause. PR #1222's mobile rule set `.vcr-bar { padding: 4px 8px }`. The flex row holds three `flex-shrink: 0` children (controls + scope-btns + lcd) and one `flex: 1 1 0` absorber (`.vcr-timeline-container`, `min-width: 40px`). At 375px viewport the absorber hits its floor, so the intrinsic widths of the shrink-frozen children spill 0.83px past the padding box. Fix. Drop horizontal padding 8px → 4px inside the `@media (max-width: 640px)` block. That's 8px of new slack — order of magnitude above the 0.83px clip — keeping LCD's `getBoundingClientRect().right ≤ 375`. Desktop layout untouched (rule is mobile-scoped). VCR/feed overlap (#1206/#1213) not reintroduced because `--vcr-bar-height` is JS-measured by the ResizeObserver, not pinned in CSS. Fixes #1250 Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-17 12:58:27 -07:00
Kpa-clawbot	356f001027	perf(#1240 ): steady-state background recompute for analytics endpoints (#1248 ) RED commit: `27630f6a` — adds latency test that fails on master (p99=225ms > 50ms budget) and a stub `StartAnalyticsRecomputers` that returns a no-op so the assertion (not a build error) gates the change. GREEN commit: `20fbbceb` — wires real background recompute infrastructure. Test passes at p99=~1µs. ## What changed Replaces the on-request "compute-then-cache" pattern for the default-shape analytics queries with a steady-state background recompute loop. Reads always hit an `atomic.Value` snapshot in <1µs regardless of compute cost or writer contention. Operator principle: serving slightly stale data quickly beats real-time data slowly. ## Endpoints converted (default 5min interval each) \| Endpoint \| Cold compute \| Recomputer interval \| \|---\|---\|---\| \| `/api/analytics/topology` \| ~5s \| 5 min \| \| `/api/analytics/rf` \| ~4s \| 5 min \| \| `/api/analytics/distance` \| ~3s \| 5 min \| \| `/api/analytics/channels` \| ~0.5s \| 5 min \| \| `/api/analytics/hash-collisions` \| ~0.5s \| 5 min \| \| `/api/analytics/hash-sizes` \| ~22ms \| 5 min \| All intervals configurable per-endpoint via `analytics.recomputeIntervalSeconds.<name>` in `config.json`; documented in `config.example.json`. Default override via `analytics.defaultIntervalSeconds`. ## Scope: default query only Only the canonical shape `(region="", window=zero)` is precomputed. Region- or window-filtered requests fall back to the legacy TTL cache + on-request compute — keeps recomputer count bounded (6, not 6×N×M). ## Latency Test `TestAnalyticsRecomputerSteadyStateLatency`: 100 concurrent readers + 4 writers churning `s.mu.Lock` on 20k distHops. - Before: p50=188ms p99=225ms (assertion failed) - After: p50=240ns p99=1.1µs (atomic load + map return) ## Shutdown integration `StartAnalyticsRecomputers` returns a stop closure invoked from `main.go`'s SIGTERM handler BEFORE `dbClose()` so any in-flight SQLite compute drains cleanly. `TestAnalyticsRecomputerShutdownNoLeak` confirms all 6 goroutines are reaped (Δ=6 within 2s). ## Safety details - Initial compute is synchronous in `Start()` — first read after startup never sees nil. - `recover()` inside `runOnce` keeps a compute panic from killing the goroutine; previous snapshot remains valid. - `analyticsRecomputerMu` is a sync.RWMutex; recomputer pointers are read-locked in the hot path. The atomic.Value swap inside `runOnce` is lock-free. Fixes #1240. --------- Co-authored-by: OpenClaw Bot <bot@openclaw.local>	2026-05-17 17:33:30 +00:00
Kpa-clawbot	b881a09f02	feat(#1188 ): show observer IATA on packets + filter grammar (#1189 ) Red commit: `4ed272761b` (CI run: https://github.com/Kpa-clawbot/CoreScope/actions/runs/25651898290) Fixes #1188 — observer IATA on packets in three UI surfaces + filter grammar. cross-stack: justified — feature spans API shape (Go), store, filter grammar (JS), three packets UI surfaces. ## Scope shipped - Packets table row: `.badge-iata` pill inline next to observer name - Expanded observation rows: per-observation IATA badge - Detail pane: Observer dd + per-observation list both render the badge - Filter grammar: `observer_iata` field + `iata` alias; `==`/`!=`/`contains`, plus a new `in (a, b, c)` list operator. Both names appear in autocomplete with descriptions. ## TDD red→green pairs 1. `271d72f` filter-grammar tests → `2c182eb` evaluator + suggest entries 2. `4ed2727` backend `observer_iata` API tests → `7856914` SQL join + struct/store wiring 3. `0e09371` display E2E → `7a3f45d` packets.js + style.css badge (E2E swapped for string-contract unit test in `ee414b4` — fixture `observations.observer_idx` stores text pubkeys, blocking the join the badge depends on) ## Backend - `cmd/server/db.go`: SELECT `obs.iata AS observer_iata` in `transmissionBaseSQL`, grouped query, observations-by-transmissions - `cmd/server/store.go`: `ObserverIATA` on `StoreTx`/`StoreObs`, load via all three ingest paths, surface in `txToMap`/`enrichObs`/`groupedTxsToPage` - `cmd/server/types.go`: field added to `TransmissionResp`/`ObservationResp`/`GroupedPacketResp` - Test fixture schemas declare `iata` on observers ## Perf Per #383, `obsIataBadge(packet)` reads `packet.observer_iata` directly (server-joined). Falls back to `observerMap.get(id).iata` only if absent — hot row-render loop avoids per-row Map lookup on fresh data. ## Display rules Missing IATA: nothing inline (Region column still shows `—`). No new hex — `.badge-iata` uses `var(--nav-bg)` / `var(--nav-text)`. E2E assertion added: test-observer-iata-1188.js:51 --------- Co-authored-by: OpenClaw Bot <bot@openclaw.dev> Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-17 16:13:11 +00:00
Kpa-clawbot	e395c471ed	fix(#1244 ): live mobile VCR single row + disable orphan gesture-hint pills on /live (#1246 ) Red commit: `58b307228e` (CI run pending; URL added after first workflow run posts). Fixes #1244 ## Sub-issue A — VCR controls still 2 rows on mobile `public/live.css` mobile `@media (max-width:640px)` block had `flex-wrap: wrap` plus `.vcr-timeline-container { width:100%; flex:none }`, which guaranteed a 2-row layout (controls + LCD on row 1, scope buttons + scrubber on row 2) — the exact bug #1234 was supposed to eliminate. Fix: switched `.vcr-bar` to `flex-wrap: nowrap`, gave `.vcr-timeline-container` `flex: 1 1 0` so it absorbs leftover width, and shrunk `.vcr-btn` / `.vcr-scope-btn` to a 32px touch target (still WCAG 2.5.5 AA). Reorder on mobile: controls → scopes → timeline → LCD, single row. `.vcr-mode` stays hidden on mobile as before (and `.vcr-lcd` no longer needs `margin-left:auto` because the timeline pushes it right via flex-grow). ## Sub-issue B — Orphan "Got it" hint pills hidden below the fold `public/gesture-hints.js` row-swipe relevance included `/live`, and the pills are bottom-anchored — so they rendered under the absolute-positioned VCR bar + safe-area inset and were only findable by scrolling. Picked option (a) from the issue (simplest, matches user's report): all four hints now early-return on `/#/live*`. Swipe-nav discoverability doesn't apply on Live — map drag, VCR controls, and feed own the touch surface. ## TDD - RED `test-issue-1244-live-vcr-row-hints-e2e.js`: asserts at 375x800 (A) `.vcr-bar` children share a row (≤8px top spread OR `flex-wrap:nowrap`), (B) zero `.gesture-hint` elements on `/live`. Desktop sanity asserts LCD/controls still share a row. - GREEN: the two source fixes. E2E assertion added: `test-issue-1244-live-vcr-row-hints-e2e.js:67` (single-row), `:101` (no hints). Wired into `.github/workflows/deploy.yml` `e2e-test` job. Browser verified: pending CI on Playwright fixture run (local Playwright unavailable on this ARM host). Desktop layout untouched — every mobile rule lives under `@media (max-width:640px)`; existing #1221 + #1234 desktop assertions still apply. --------- Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-17 16:10:53 +00:00
Kpa-clawbot	74685ac82f	fix(#1243 ): node detail mobile QR overlays map semi-transparently (#1245 ) RED commit `fc9b619a` — CI: https://github.com/Kpa-clawbot/CoreScope/actions Fixes #1243. ## Problem On `#/nodes/<pubkey>` at 375×800, the QR code rendered as a separate ~250px-tall panel below the map. Desktop already overlays the QR semi-transparently via `.node-map-qr-overlay` for the compact view. ## Fix Extend the mobile breakpoint (`@media (max-width: 640px)`) so the full-screen `.node-top-row` mirrors the desktop overlay pattern: - `.node-top-row` → `position: relative`; map wrap expands to 100% - `.node-qr-wrap` → `position: absolute; bottom/right: 8px; z-index: 400` - Semi-transparent background (`rgba(255,255,255,0.85)` light / `0.4` dark) - Caption hidden in overlay (already shown above) Desktop (≥768px) flex layout untouched. ## TDD - RED `fc9b619a` — E2E at 375×800 asserts QR is `position: absolute\|fixed`, overlaps map rect, and bg alpha < 1. - GREEN `ded978c0` — CSS adds overlay rule. ## Verification Preflight clean. Desktop layout unaffected — change is scoped inside `@media (max-width: 640px)`. ## Files - `public/style.css` (+29) - `test-e2e-playwright.js` (+57) --------- Co-authored-by: clawbot <clawbot@local>	2026-05-17 16:03:55 +00:00
Kpa-clawbot	2754251a53	perf(#1239 ): /api/analytics/distance — TTL 15s→60s + drop main RLock around compute (#1241 ) ## Summary Fixes #1239 — `/api/analytics/distance` 15s cold on staging under heavy ingest. Two independent fixes. First commit on this branch is the RED test for Fix B (`a539882`), demonstrating reader/writer contention against the main store lock. CI: see Actions tab for the run on the test-only commit — it asserts >150µs avg writer cycle and fails at 82367µs pre-fix. GREEN commit (`d3938f1`) brings it to 1µs. ## Fix A — TTL bump 15s → 60s (`5eae1e0`) - `rfCacheTTL` default in `cmd/server/store.go` changed from `15 * time.Second` to `60 * time.Second`. This is the shared TTL for RF / topology / distance / hash-sizes / subpath / channel analytics caches. - Per operator clarification (issue thread): distance analytics IS viewed live during analysis sessions, not background-glanced. 60s smooths the cold-miss churn during heavy ingest without freezing data. - `config.example.json`: documented `cacheTTL.analyticsRF` with new default + caveat. - Existing assertions (`TestCacheTTLDefaults`, `TestHashCollisionsCacheTTL`) updated to the new default. ## Fix B — Drop main RLock around compute (`a539882` red, `d3938f1` green) `computeAnalyticsDistance` previously held `s.mu.RLock()` for the entire iteration: region match-set construction, hop/path filtering, sort, dedup, histogram, category stats, time series. Readers serialized writers (ingest, `buildDistanceIndex`). Refactor: hold the RLock only long enough to snapshot the `distHops`/`distPaths` slice headers AND build the region match-set (which reads `tx.Observations`, mutated under `s.mu.Lock`). For `region=""` (the hot cold-call path) the lock hold is just the header snapshot — microseconds. Everything else runs on the locally-captured slices outside the lock. Safety: `distHops`/`distPaths` are append-only via re-slice in `buildDistanceIndex` / `updateDistanceIndexForTxs` (both under `s.mu.Lock`). If the backing array reallocates after the snapshot, the snapshot still references the prior array (GC-pinned) at the consistent length captured under the lock. Records are value types — no torn writes. ## Test results `cmd/server/distance_lock_contention_test.go` (8 reader goroutines × 20k synthetic distHops × 200 writer Lock/Unlock cycles): - pre-fix avg writer cycle: 82367µs (16.5s for 200 cycles) - post-fix avg writer cycle: 1µs (279µs for 200 cycles) - ~82000× reduction in writer contention; reader result shape unchanged Full `go test ./cmd/server/...` green with `-race`. ## Out of scope (per issue) - Same lock pattern in topology / RF / hash / subpath analytics — file separately if needed. - Per-region cache key sharding. - WebSocket-driven cache invalidation. --------- Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-16 20:56:52 +00:00
Kpa-clawbot	aba20b3eda	fix(#1234 ): Live mobile chrome pass 2 — single-row header, hide top-nav, VCR overflow (#1238 ) ## Summary Live page mobile chrome-reduction pass 2. Three coordinated trims at ≤640px: 1. `.live-header` → single row, ≤44px. Drop the MESH LIVE text label and the chart-icon (📊) header toggle. Promote `.live-stats-row` to a direct child of `.live-header` so beacon + pkts + nodes + active + rate + gear all sit on one row. The (now empty) `.live-header-body` collapses to `display:none`. `.live-controls-toggle` shrinks to 36×36 to fit the strip. 2. Top app navbar hidden on `/live`. `body:has(.live-page) .top-nav { display:none }` — scoped via `:has()` so other routes are unaffected. The `.live-page` height reclaims the freed 52px. 3. VCR scope row: >6h collapsed into `More ▾`. `12h` and `24h` get `.vcr-scope-btn--overflow`; the new `.vcr-scope-more-wrap` dropdown is desktop-hidden, mobile-shown. Dropdown items proxy `.click()` to the underlying scope buttons — single source of truth, existing handler unchanged. ## TDD - RED (`b975c828`): `test-issue-1234-live-chrome-pass2-e2e.js` — one E2E asserting all three acceptance items at 375×800 + desktop sanity at 1280×800. Wired into `deploy.yml`. Fails on master (no More button, navbar visible, MESH LIVE label visible). - GREEN (`1e529e63`): CSS + JS implementation. Updates `test-live-layout-1178-1179-e2e.js` and `test-issue-1204-live-panel-structure-e2e.js` in-place to match the new single-row contract (chart toggle gone, MESH LIVE label gone on mobile, gear shrunk to 36×36). ## Verification (local) - New E2E: 7/7 ✅ - `test-issue-1178-1179`: 10/10 ✅ - `test-issue-1204`: 10/10 ✅ - `test-issue-1205`: 18/18 ✅ - `test-issue-1206`: 7/7 ✅ - `test-live-mql-leak-1180`: 2/2 ✅ - `#1220` empty-chrome guard (in `test-e2e-playwright.js`): header = 38px collapsed ✅ Desktop (1280×800) layout unchanged — top-nav visible, all 4 VCR scopes inline, header behavior identical. Fixes #1234. --------- Co-authored-by: corescope-bot <bot@corescope.local>	2026-05-16 20:09:24 +00:00
Kpa-clawbot	4ea1bf8ebc	fix(#1236 ): map mobile — sticky panel header + remove right gutter (#1237 ) RED: `862d7c82` — E2E asserting (A) leaflet-map width == viewport on mobile and (B) sticky panel header. CI URL: see Checks tab. Fixes #1236. ## Sub-issue A — Map Controls panel scroll affordance Root cause: `.map-controls` already had `max-height` + `overflow-y: auto`, but the `<h3>` title was static — once the panel scrolled, the title scrolled away with it and users lost the affordance that they were inside a scroll container. No visual cue, no anchor. Fix: make `.map-controls h3` `position: sticky` at the top of the scroll container (pulled flush to the panel edges with negative margins so it covers the corner radius cleanly), with the panel `--card-bg` background and a `--border` bottom rule. Added `scrollbar-gutter: stable` so the scroll indicator is consistently present. ## Sub-issue B — Map canvas offset left with right gutter Root cause: `.map-side-pane` (Path Inspector) is `flex: 0 0 32px` inside the flexbox `#map-wrap`. At every viewport width that 32px is consumed before the leaflet canvas gets sized, leaving an unused band on the right. Desktop has room for it; mobile (375px viewport) does not — and Path Inspector hex-prefix entry is impractical on a phone anyway. Fix: `display: none` on `.map-side-pane` at `≤640px`. Leaflet canvas now fills 100% of the viewport. ## Verification - E2E `test-issue-1236-map-mobile-e2e.js` covers both at 375x800 + desktop guard at 1280x800. RED commit (`862d7c82`) failed 2/3 mobile assertions; GREEN commit (`85efcba7`) passes 3/3. - Map canvas width at 375x800: 343px → 375px. - Existing channels mobile E2E (#1224) still passes. - Desktop (1280px): panel stays `position: absolute`, Path Inspector pane still present. All colors via CSS variables. No JS changes. --------- Co-authored-by: OpenClaw Bot <bot@openclaw.local>	2026-05-16 20:01:00 +00:00
Kpa-clawbot	2e28aa3e04	fix(#1229 ): source-diversity confidence weighting in neighbor-graph tier-1 resolver (#1235 ) RED `235b65b4` (CI will surface URL after PR open) — `test(#1229): tier-1 must prefer multi-observer edges`. Green: `841fc5de`. ## Summary Implements Option C from issue #1229: edge source-diversity confidence weighting. Each neighbor-graph edge already tracks the set of distinct observers that contributed to it (`NeighborEdge.Observers`). This PR is the first to consume that signal in the disambiguator. Tier-1 score in `pm.resolveWithContext` becomes `Score(now) × Confidence()` where: ``` Confidence() = min(1.0, max(1, \|Observers\|) / 3.0) ``` - 1 observer → 1/3 weight (single-source, suspect) - 2 observers → 2/3 weight - ≥3 observers → 1.0 (saturated, full historical weight) A 6-observer edge (30 obs) now beats a 1-observer edge (25 obs) by 3.6× (vs. 1.2× before) — enough to clear `affinityConfidenceRatio` and skip the tier-2 geo fallback that was misresolving in cross-region cases. Stacks with the geo-rejection filter merged in #1228/#1230 to give two independent defenses against cross-region prefix-collision pollution. ## Why C over A/B - A (per-observer graphs): N×memory cost, biggest refactor surface. - B (per-region/IATA segmented): requires region attribution on every packet + per-region cache plumbing; deferred follow-up. - C: smallest diff (~30 lines), no schema migration, leverages an existing field, composes additively with #1228. A and B remain valid follow-ups if C proves insufficient. ## Backward compatibility (persistence) `neighbor_edges` schema is unchanged. `Observers` is rebuilt by `BuildFromStoreWithOptions` from live observations on every graph refresh (5-min TTL). Persisted rows carry an empty set only during the post-restart warm-up; `Confidence()` defaults n→1 when `\|Observers\|==0`, so legacy rows resolve as single-observer (degraded but non-zero) confidence rather than disappearing. Defensive. ## Tests - `cmd/server/hop_disambig_confidence_test.go:48` — RED-then-GREEN E2E: two `8a` candidates from the same anchor, candX placed geo-near with 1 observer × 25 obs, candY placed geo-far with 6 observers × 5 obs. Without confidence weighting tier-1 falls through (1.2× ratio) and tier-2 picks the wrong (geo-near) candX. With confidence weighting tier-1 fires and picks candY. Asserts `method == "neighbor_affinity"` to pin the resolver path. - `TestNeighborEdge_ObserverSetIsDistinct` — guards the source-diversity counter against double-counting same-observer contributions and pins the `Confidence()` formula at both endpoints (single → fractional, ≥3 → 1.0). All existing tier-1 tests (`hop_disambig_tier1_test.go`) continue to pass — they seed with a single observer, so their weights drop from 1.0 to 1/3 uniformly across candidates, preserving the ratio guard outcome. Fixes #1229 --------- Co-authored-by: bot <bot@corescope.local>	2026-05-16 19:55:00 +00:00
Kpa-clawbot	b21badbcbd	fix(#1225 ): paginate channel messages at SQL level — 30s → <500ms (#1226 ) ## Summary Fixes #1225 — channel messages endpoint took ~30s on staging. ## Root cause `(DB).GetChannelMessages` SELECTed every observation row for the channel (one row per observation, not per transmission), JSON-unmarshalled each row into a Go map, dedupe-folded by `(sender, packetHash)`, then sliced the tail in Go for pagination. On staging `#wardriving`: - `transmissions` rows with `channel_hash='#wardriving' AND payload_type=5`: 5,703* - `observations` joined to those: 274,632 (~48× amplification) - `time curl /api/channels/%23wardriving/messages?limit=50`: 30.04s / 31.41s / 31.48s / 35.33s / 34.05s (5 calls before I killed the loop) `EXPLAIN QUERY PLAN` showed the index `idx_tx_channel_hash` was being used — the cost was entirely in fetching, unmarshalling, and folding the full observation set per request even for `limit=50`. Hypothesis #1 from the issue (full table scan on `messages/decoded`) is rejected; #2 (missing index) is rejected; the actual cause was pagination in Go instead of SQL — request cost was O(observations) not O(limit). ## Fix Move pagination into SQL on the `transmissions` table. Because `transmissions.hash` is `UNIQUE` and the original dedup key was `(sender, hash)`, each transmission collapses to exactly one logical message — paginating on transmissions is semantically equivalent to the prior in-Go dedup + tail slice. New shape: 1. `COUNT()` on transmissions for total (uses `idx_tx_channel_hash`). 2. `SELECT id FROM transmissions … ORDER BY first_seen DESC LIMIT ? OFFSET ?` to pick the page of newest transmissions. 3. `SELECT … FROM observations WHERE transmission_id IN (…page ids…)` — typically 50 ids → a few hundred observation rows. 4. Reassemble in pageIDs order, preserving the ASC-by-`first_seen` API contract. Region filtering, observation-count-as-`repeats`, and "first observation wins for hops/snr/observer" semantics are preserved (observations are scanned `ORDER BY o.id ASC`). ## Perf measurements Before* (staging `#wardriving`, limit=50, 5 samples killed mid-loop): 30.04s, 31.41s, 31.48s, 35.33s, 34.05s. Synthetic regression test (`TestGetChannelMessagesPerfLargeChannel`): 3000 tx × 50 obs. - Broken impl: ~4.5s (test fails the 500ms budget — the RED commit). - Fixed impl: well under 500ms (test passes). After (staging): will measure post-deploy and post-comment on issue with numbers. Synthetic scaling: staging is ~2× the test's transmission count, fixed-path cost scales with `limit` (50) + `COUNT()` (~5k rows on index) — expect <100ms p99. ## TDD - RED: `697c290d` — perf test asserts <500ms on 3k×50 dataset; fails at ~4.5s. - GREEN: `3f1f82d3` — fix; full suite green, perf test passes. ## Hypotheses status \| # \| Hypothesis \| Verdict \| \|---\|---\|---\| \| 1 \| Endpoint slow on prod-sized data \| CONFIRMED* (different mechanism — see root cause) \| \| 2 \| Missing channel_hash index \| Rejected (`idx_tx_channel_hash` exists & used) \| \| 3 \| Frontend re-render storm \| Not investigated (backend was clearly the bottleneck) \| \| 4 \| Decode in request path \| Rejected (decode is at ingest time; JSON unmarshal of cached `decoded_json` is the cost, addressed by reducing row count) \| \| 5 \| WS subscription failure \| Rejected \| \| 6 \| Staging artifact \| Rejected (reproducible) \| ## Out of scope - The in-memory `(*PacketStore).GetChannelMessages` path (used when `s.db == nil`) has the same shape but operates on bounded in-memory data; not touched. If we ever fall back to it in production we'll revisit. --------- Co-authored-by: clawbot <bot@corescope>	2026-05-16 17:28:40 +00:00
Kpa-clawbot	7179afcfde	feat(#1228 ): reject geo-implausible neighbor-graph edges at build time (#1230 ) Fixes #1228 — geo-implausible neighbor-graph edges are rejected at build time. Red commit: `5a6d9660` — failing tests for 4 cases (reject SF↔Berlin, accept local CA, accept no-GPS endpoint, counter increments). Live CI run (latest commit): https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1228 ## Why The disambiguator's tier-1 affinity graph is built blindly from path co-occurrence. On wide-geo MQTT deployments, a single bad hop disambiguation seeds an edge across geographically impossible distances (e.g. Bay Area ↔ Berlin), which then reinforces the same wrong resolution next time. Self-poisoning spiral. ## What changed - `upsertEdge` now consults a per-graph GPS index. When both endpoints have known GPS and their haversine distance exceeds the threshold, the edge is dropped and `NeighborGraph.RejectedEdgesGeoFar` (atomic) is incremented. - Either endpoint missing GPS ⇒ accept (no signal to reject), per acceptance criteria. - Threshold is configurable via `neighborGraph.maxEdgeKm` (default 500 km — well above any plausible terrestrial LoRa hop, including satellite-assisted). 0 ⇒ use default; negative ⇒ disable the filter. Exposed via `Config.NeighborMaxEdgeKm()`. - New `BuildFromStoreWithOptions` carrying the threshold; `BuildFromStore` and `BuildFromStoreWithLog` are kept as thin wrappers. - Stats are surfaced under `GET /api/analytics/neighbor-graph` as `stats.rejected_edges_geo_far`. - All rejection logs PII-truncate pubkeys to 8 hex chars (public repo discipline). - `config.example.json` updated with the new field + comment. ## Follow-up #1229 (per-region scoped affinity graphs) depends on this landing first. --------- Co-authored-by: corescope-bot <bot@corescope.local>	2026-05-16 10:14:44 -07:00
Kpa-clawbot	30ff45ad34	fix(#1220 ): collapse MESH LIVE mobile header into a single ~50px strip (#1223 ) RED commit `c1a8cea` — E2E at 375x800 asserts MESH LIVE header is either ≤60px (collapsed) or ≥60px with a visible body. Fails on master with `height=118, bodyVisible=false, ctrlsVisible=false` — the empty-chrome middle state. CI for red commit: https://github.com/Kpa-clawbot/CoreScope/actions (will populate after push). ## Diagnosis On `(max-width: 768px)`, `#1180` collapses both `.live-header-body` and `.live-controls-body` to `display:none`. But `.live-controls` carries `flex: 0 0 100%` from the wide-viewport rule (introduced for `#1219` so the toggles wrap onto their own row below the title on tablet). On mobile, with the body hidden, that 100% basis still forces the gear button onto a full-width second row inside `#liveHeader`'s flex-wrap, ~60px tall — yielding the `~118-200px` empty panel the bug screenshot shows (the count badge + 📊 toggle on row 1, gear alone on row 2, nothing else). ## Fix — Option C Inside `@media (max-width: 768px)`, when `.live-controls.is-collapsed`: - drop `flex: 0 0 100%` → `flex: 0 0 auto; width: auto` so the gear inlines with the critical strip + 📊 toggle - when the header is also collapsed (`.is-collapsed:has(.live-controls.is-collapsed)`), zero the vertical padding so the strip hugs the 48px tap targets Result: collapsed mobile panel = single ~50px row, three icons inline. Expanded mobile = full toggle list (149px). Desktop unchanged (83px). Why Option C over A/B: a packet-watching mobile user keeps the map dominant and reaches for the gear when they want filters. The compact strip preserves both the WS-down red beacon (always visible) and the pkt count, with one-tap access to expand either body. Does not reintroduce #1204 (counter still attached to header) or #1205 (toggles still children of `#liveHeader`). Fixes #1220 --------- Co-authored-by: openclaw-bot <openclaw-bot@users.noreply.github.com>	2026-05-16 15:54:12 +00:00
Kpa-clawbot	70855249c2	fix(#1224 ): channels page mobile UX overhaul (#1227 ) ## Summary RED test commit: `02652d0042b7cf65d1f9b3e96ce376bbb3064ba6` — CI: https://github.com/Kpa-clawbot/CoreScope/actions Mobile UX overhaul for the Channels page (#1224). At 375x800 the sidebar header was 112px tall (title + button stacked, analytics link + region filter each on their own row) and the channel-name column was clipped to 83px by the inline 📤 Share + ✕ Remove buttons. ## What changed - Header is now ONE row: title + region filter + `+ Add` chip + `📊` analytics overflow chip. Capped to ≤56px on mobile. - `+ Add Channel` → `+ Add` chip (no longer a full-width hero). Verified <65% of sidebar width. - Analytics link is an icon-only chip inside the header (was a full-row link below). - Region filter is inline inside the header (was its own row). - Channel rows: `.ch-item-name` takes `flex:1`, share button is icon-only (📤), remove button shrunk to 32px touch target. Name >150px on the first row. - Empty state is `max-height:30vh; padding:12px` on mobile — no longer dominates the viewport. ## Design decisions - Chose inline chips over an overflow `⋮` menu: header-level controls are few enough (4) that stacking pills + filter dropdown fits comfortably in 375px. Avoids the cost/complexity of a popover and matches the page's existing pill vocabulary (region filter). - Per-row share/remove kept inline but icon-only (`font-size:0` + `::before`) — preserves single-tap access without consuming the row. - Touch targets stay ≥32px (action chips) / 44px (other tappables); WCAG 2.5.5 spirit retained on the dominant interactive paths. - Desktop layout (≥768px) is unchanged — verified by a desktop guard in the E2E (`.ch-layout` flex-direction stays `row` at 1024px). ## Tests - `test-issue-1224-channels-mobile-ux-e2e.js` — 5 assertions at 375x800 + 1 desktop guard at 1024x800. Wired into CI. - Existing channel suites still pass: `test-channel-fluid-e2e.js` (11/11), `test-channel-issue-1087-e2e.js` (3/3), `test-channel-issue-1111-e2e.js` (2/2), `test-channel-modal-ux.js` (33/33), `test-channel-ux-followup.js` (29/29), `test-channel-sidebar-layout.js` + `test-channel-fluid-layout.js` (14/14). Fixes #1224 --------- Co-authored-by: clawbot <clawbot@users.noreply.github.com>	2026-05-16 15:50:52 +00:00
Kpa-clawbot	24f277e5c6	fix(#1221 ): VCR LED clock in-row with controls and unclipped on mobile (#1222 ) Red commit: `41d02ffa` (CI run: pending — will fill in after first CI run completes) ## Summary Fixes #1221. VCR LED clock (`.vcr-lcd`) was wrapping to a separate row on mobile (`.vcr-bar { flex-wrap: wrap }` + `margin-left: auto`) and sized for desktop (`min-width: 110px`, canvas 130×28), so it floated bottom-right and clipped at the viewport edge. ## Fix - DOM (`public/live.js`): no move needed — `.vcr-lcd` is already a child of `.vcr-bar`. (Verified by grep.) - CSS (`public/live.css`) mobile `@media (max-width: 640px)`: - Removed `margin-left: auto` on `.vcr-lcd` so it stays in-row with controls. - Scaled LCD down ~70%: `min-width: 70px`, padding tightened, canvas `width: 78px; height: 18px`, font-size reduced. - Removed redundant `display: flex` override. ## Test RED → GREEN E2E at `test-e2e-playwright.js` (around line 2978): viewport 375×800, asserts: - LCD inside `.vcr-bar`, shares parent with `.vcr-controls`. - LCD bounds entirely inside viewport (no clip on any side). - LCD vertically overlaps `.vcr-controls` (same row). - LCD width < 100px on mobile (scaled vs desktop). E2E assertion added: `test-e2e-playwright.js:2978` Browser verified: staging analyzer.00id.net after merge (manual VCR layout sanity) Fixes #1221 --------- Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-16 08:36:38 -07:00
Kpa-clawbot	ab34d9fb65	fix(#1206 ): keep VCR bar from occluding the live packet feed (#1213 ) Red commit: `bcfc74de` (CI: https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1206) Fixes #1206. ## Problem On Live Map the VCR (timeline/playback) bar overlays the bottom of the viewport. Bottom-pinned overlays — the live packet feed, the legend, any corner panel — used hard-coded `bottom: 58–88px` offsets that are smaller than the real bar height (two-row mobile layout + `env(safe-area-inset-bottom)` push it to ~80px and beyond). The last N packet-feed rows slid under the bar and became unreadable / unclickable. ## Fix Publish the bar's measured height as a CSS variable on the live page and bind every bottom-anchored overlay to it. - `public/live.js` — new `initVCRHeightTracker()` runs after init; uses `ResizeObserver` + `resize` / `visualViewport.resize` to keep `--vcr-bar-height` on `.live-page` in sync with `#vcrBar`. - `public/live.css` — `.live-feed`, `.feed-show-btn`, and the `.live-overlay[data-position="bl"\|"br"]` corner slots now use `bottom: calc(var(--vcr-bar-height, 58px) + 10px)`. The feed's `max-height` is also capped against `100dvh - top - vcr - margin` so its scroll container can never extend past the bar. - Stale per-breakpoint overrides (the `@supports(env(safe-area-inset))` hard-coded `78px + safe-area` for feed/legend) are removed in favor of the single tracked variable. ## TDD - Red commit `bcfc74de` adds `test-issue-1206-vcr-overlap-e2e.js`: asserts `#liveFeed.getBoundingClientRect().bottom <= #vcrBar.top` (and same for the last row) at desktop 1280x800 and mid 720x800. Verified locally that reverting the green commit makes the feed-bottom assertions fail (feed bottom 742px > VCR top 721px) — see PR body for exact numbers from the local run. - Green commit `1ad17e7f` makes all 5 assertions pass. ## Browser verified Local Go server with `test-fixtures/e2e-fixture.db`, headless Chromium via the new E2E test — all 5 assertions green. ## E2E assertion added `test-issue-1206-vcr-overlap-e2e.js:84` (bottom-row vs VCR-top) plus container check at `:74`. --------- Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: clawbot <bot@corescope.local>	2026-05-16 05:55:21 +00:00
Kpa-clawbot	a1f9dca951	fix(live #1205 ): re-anchor settings toggles inside MESH LIVE panel (#1219 ) Red commit: `f80ce5248a` (CI URL appears in the Checks tab once the workflow starts). Supersedes closed PR #1209 with the correct approach (toggles in MESH LIVE panel, not legend). Fixes #1205. ## Problem The Live Map settings toggle row (Heat / Ghosts / Realistic / Color by hash / Matrix / Rain / Audio / Favorites / node filter / region filter — `#liveControls`) rendered as a free-floating sibling `.live-overlay` pinned `position: fixed` at bottom-right with `bottom: calc(78px + var(--bottom-nav-height) + safe-area)`. On many viewports it visually orphaned across the middle of the map, anchored to no panel. ## Regression cause PR #1180 (commit `127a1927` — "compact header, pin controls bottom-right, narrow toggles") extracted `.live-toggles` from inside `.live-header` (the MESH LIVE panel) into a brand-new sibling `.live-overlay.live-controls` cluster. Before #1180 the toggles lived as a direct child of `.live-header`. ## Fix Restore the pre-#1180 structural pattern: `#liveControls` is re-parented as a child of `#liveHeader`, breaking onto its own row via `flex: 0 0 100%`. No more `position: fixed` overlay, no more free-floating cluster — the toggles share the MESH LIVE panel's chrome (background, blur, border, padding). - `public/live.js`: re-parent the `#liveControls` block inside `#liveHeader`, drop the `.live-overlay` class. - `public/live.css`: - `.live-controls`: `position: static`, transparent (header supplies chrome), `flex: 0 0 100%`. - `.live-header`: `flex-wrap: wrap`, `row-gap: 6px`, `max-width: calc(100vw - 24px)`; drop the `max-height: 40px` cap. Why this beats PR #1209: that PR parked toggles inside `#liveLegend`, inverting the data → key → controls hierarchy and pushing the legend to 60vh on mobile. Anchoring back to the MESH LIVE panel keeps controls with the panel that already labels the live surface and inherits its corner / drag affordances. ## Tests - Red (`test-issue-1205-live-controls-anchor-e2e.js`): asserts `#liveHeader.contains(#liveControls)` AND not contained in `#liveLegend`, parent is not `<body>` / `.live-page` directly, and the controls rect stays within the viewport. Runs at 1440×900, 640×900, 320×800. Fails on master. - Updated `test-live-layout-1178-1179-e2e.js`: - (a) `.live-header-critical` height ≤ 40px (the critical strip stays compact; header itself now wraps). - (b) `.live-controls` `position: static` AND descendant of `#liveHeader` (new contract replacing the retired "fixed/right ≤24px/bottom>0"). - Wired in `.github/workflows/deploy.yml` next to the other live-layout E2Es. ## Acceptance criteria - [x] Settings toggle row renders inside the MESH LIVE panel (`#liveHeader`) - [x] Not parked in `#liveLegend` (rejected by #1209 review) - [x] Tested at desktop + tablet + narrow phone viewport widths - [x] E2E DOM assertion: parent is the MESH LIVE panel, not body / `.live-page` / `#liveLegend` --------- Co-authored-by: meshcore-bot <bot@meshcore.local> Co-authored-by: clawbot <clawbot@users.noreply.github.com>	2026-05-16 05:54:43 +00:00
Kpa-clawbot	170f0ac66d	fix(#1212 ): MQTT per-attempt logging + stall watchdog — prevent silent reconnect-loop death (#1216 ) RED commit: `1cd25f7b` — CI (failing on assertion): https://github.com/Kpa-clawbot/CoreScope/actions?query=sha%3A1cd25f7b1bdd0091f689dd64ce1bfec6d031191f Fixes #1212 ## Root cause NOT that `AutoReconnect` was off — it was set; `MaxReconnectInterval=30s` was set (PR #949); a `SetReconnectingHandler` was wired. The defect was an observability gap: `SetReconnectingHandler` fires only INSIDE paho's reconnect goroutine. If that goroutine never iterates (status race after the recovered handler panic at 21:07:13, or an internal abort), operators see ONLY the `disconnected: pingresp not received` line and then total silence. They cannot distinguish "paho is patiently retrying" from "paho gave up and the goroutine is gone." That ambiguity is what turned a 30s blip into 6h of downtime. ## Changes ### `cmd/ingestor/main.go` — `SetConnectionAttemptHandler` Fires on every TCP/TLS dial — the initial `Connect()` AND every reconnect — independent of paho's internal reconnect-loop state. Logs: ``` MQTT [staging] connection attempt #1 to tcp://broker:1883 MQTT [staging] connection attempt #2 to tcp://broker:1883 ``` Per-source attempt counter via `atomic.AddInt64`. ### `cmd/ingestor/mqtt_watchdog.go` (new) — per-source stall watchdog Satisfies the watchdog acceptance criterion. Even when paho reports `connected`, if no MQTT messages have flowed for >5m, log a WARN line every 60s: ``` MQTT [staging] WATCHDOG: client reports connected to tcp://broker:1883 but no messages received for 7m30s (threshold 5m) — possible half-open socket or upstream stall ``` Catches half-open TCP and broker-accepted-but-not-forwarding scenarios that look "connected" to paho. Hot-path cost: one `atomic.StoreInt64` per inbound message. Watchdog scans the registry once a minute. ### Tests (`cmd/ingestor/mqtt_reconnect_test.go`, new) - `TestBuildMQTTOpts_InstrumentsConnectionAttempt` — asserts `OnConnectAttempt` is wired in `buildMQTTOpts`. - `TestMQTTStallWatchdog_FiresOnSilentSource` — connected + 10m silent + 5m threshold → stall flagged. - `TestMQTTStallWatchdog_QuietWhenRecent` — recent message → no stall. - `TestMQTTStallWatchdog_QuietWhenDisconnected` — disconnected → no stall (paho's reconnect logging covers it). ## TDD - RED `1cd25f7b` — 2 assertion failures (compile OK, stub returns no-stall, `OnConnectAttempt` nil). - GREEN `2527be6f` — implementation; all ingestor tests pass. ## Out of scope - Slice-bounds decode panic (#1211, separate PR). - A full in-process MQTT broker integration test would require a new dep (mochi-mqtt) — the observability and watchdog behaviors are independently verifiable by the unit tests above, and the reconnect path itself is paho's responsibility (we already test it's configured via `mqtt_opts_test.go`). --------- Co-authored-by: bot <bot@example.com> Co-authored-by: OpenClaw Bot <bot@openclaw.local> Co-authored-by: corescope-bot <bot@corescope.local> Co-authored-by: openclaw-bot <openclaw-bot@users.noreply.github.com>	2026-05-15 22:46:29 -07:00
Kpa-clawbot	eba9e89a72	fix(#1203 ): path-inspector — singleflight + stale-while-revalidate (#1208 ) Red commit: `c84a8f575a` (CI run: pending push) Fixes #1203 — path-inspector 503 storm. Three sub-fixes, each shipped as red→green per AGENTS TDD: A. Singleflight on rebuild (`ensureNeighborGraph`) Hand-rolled `sync.Mutex + chan` singleflight — no new deps (x/sync was not in cmd/server's go.mod). Concurrent callers attach to one in-flight rebuild instead of N parallel `BuildFromStore` goroutines. - Red: `7340f23b` — test asserts ≤1 build under 10 concurrent callers (saw 10 on master) - Green: `abac6b3c` B. Stale-while-revalidate (`handlePathInspect`) Stale non-nil graph is served immediately with `"stale": true` while a background rebuild runs (deduped by A). The 2s synchronous gate is gone. Stale responses are not cached, so the next request after rebuild lands fresh. - Red: `c84a8f57` — test asserts 200+`stale:true`+rebuild-kickoff (master returned 503) - Green: `5eb86975` C. Cold-start 503 still kicks rebuild True cold start (`graph == nil`) is the only path that still returns 503 `{"retry": true}`, but it now spawns an async `ensureNeighborGraph` so the very next request warms up. - Green test: `f5ac7059` (passed on top of A+B) Singleflight verified: `TestEnsureNeighborGraph_Singleflight` Stale-while-revalidate verified: `TestHandlePathInspect_StaleWhileRevalidate` Cold-start verified: `TestHandlePathInspect_ColdStartKicksRebuild` Acceptance criteria (issue #1203): - [x] Concurrent requests share ONE rebuild - [x] Stale non-nil graph served with `stale:true` async - [x] 503 only on true cold-start - [x] Cold-start 503 kicks rebuild → follow-up warm - [ ] p99 < 500ms under load (not unit-testable; design satisfies it) - [x] No regression in existing tests Out of scope (per issue): 5-min TTL constant, `BuildFromStore` perf, `/api/analytics/topology`, persist-lock contention. No new deps. --------- Co-authored-by: corescope-bot <bot@corescope.local> Co-authored-by: corescope-bot <bot@corescope.dev>	2026-05-15 22:46:28 -07:00
efiten	11d2026bb1	feat(startup): hot startup — load hotStartupHours synchronously, fill retentionHours in background (#1187 ) Closes #1183 ## Summary - Adds `packetStore.hotStartupHours` config key (float64, default 0 = disabled). When set, `Load()` loads only that many hours of data synchronously, reducing startup time on large DBs. Background goroutine fills the remaining `retentionHours` window in daily chunks after startup completes. - A background goroutine (`loadBackgroundChunks`) fills the remaining `retentionHours` window in daily chunks after startup completes. Analytics indexes are rebuilt once at the end. - `QueryPackets` and `QueryGroupedPackets` check `oldestLoaded` and fall back to `db.QueryPackets()` for any query whose `Since`/`Until` predates the in-memory window — covering days 8–30 permanently (beyond `retentionHours`) and the background-fill gap during startup. - `/api/perf` gains `hotStartupHours`, `backgroundLoadComplete`, and `backgroundLoadProgress` fields inside `packetStore` so operators can monitor the fill. ### Drive-by fixes - E2E: added `gotoPackets` navigation helper used across packet-related tests - E2E: rewrote stripe assertion to check per-row stripe parity rather than a fragile computed-style comparison - E2E: theme test updated to use `#/home` as the initial route (was `#/`) - `db.go`: removed the RFC3339→unix-timestamp subquery path in `buildTransmissionWhere`; `t.first_seen` is now always compared directly as a string for both RFC3339 and non-RFC3339 inputs ## Configuration ```json "packetStore": { "retentionHours": 168, "hotStartupHours": 24 } ``` `hotStartupHours: 0` (default) preserves existing behavior exactly. Recommended for large DBs to reduce startup time; set to 0 to disable (loads full retentionHours at startup, legacy behavior). ## Test plan - [x] `TestHotStartupConfig_Clamp` — clamping when `hotStartupHours > retentionHours` - [x] `TestHotStartupConfig_ZeroIsDisabled` — zero leaves feature disabled - [x] `TestHotStartup_LoadsOnlyHotWindow` — only hot-window packets in memory after `Load()` - [x] `TestHotStartup_DisabledWhenZero` — all retention packets loaded when disabled - [x] `TestHotStartup_loadChunk_AddsOlderData` — chunk merges correctly, ASC order maintained - [x] `TestHotStartup_BackgroundFillsToRetention` — background goroutine fills to `retentionHours` - [x] `TestHotStartup_ChunkErrorRecovery` — chunk SQL failure logged and skipped, loop terminates - [x] `TestHotStartup_SQLFallback_TriggeredForOldDate` — query before `oldestLoaded` routes to SQL - [x] `TestHotStartup_SQLFallback_NotTriggeredForRecentDate` — recent query stays in-memory - [x] `TestHotStartup_PerfStats` — new fields present in `GetPerfStoreStats()` (backs the perf endpoint) - [x] `TestHotStartup_PerfStoreHTTP` — HTTP-level: GET /api/perf returns `hotStartupHours`, `backgroundLoadComplete`, `backgroundLoadProgress` in `packetStore` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: CoreScope Bot <bot@corescope.local>	2026-05-15 22:46:25 -07:00
Kpa-clawbot	3255395bd0	fix(#1204 ): MESH LIVE panel — header inherited column flex from .live-overlay (#1215 ) Red commit: `c159a1153d` (CI run: pending — first CI is on this PR) Fixes #1204. ## Root cause `.live-overlay` (the base class for all overlay panels: feed, legend, node-detail, header) declares `flex-direction: column`. Feed/legend/ node-detail need that for their `.panel-header` + scrollable `.panel-content` stacking — but the header doesn't, it's a horizontal bar. PR #1180 (#16c48e73) split the header from a flat layout into three children: `.live-header-critical` (beacon + `0 pkts`) + collapsible toggle button + `.live-header-body` (title + stats row). Without an explicit `flex-direction` override, those three pieces inherited the column default and stacked vertically — pushing `0 pkts` above the `MESH LIVE` title and clipping the stats row out of the 40px max-height container. Exactly the "detached counter, hollow shell" the issue reports. ## Fix Add `flex-direction: row` to `.live-header` (one line + comment). Single-property CSS change, no JS, no DOM, no behavior outside layout. ## TDD Red commit `c159a115` — E2E `test-issue-1204-live-panel-structure-e2e.js` asserts: 1. `.live-header-critical` and `.live-title` vertically overlap (same row). 2. `#livePktCount` pill and title mid-Y differ by < 8px. 3. `.live-stats-row` is visible (nonzero size). 4. `.live-feed .panel-content` accepts an injected row (column container). Verified failing on master at red commit (3 of 5 fail with the exact "stacked above title" signature). Green commit `b7f57072` flips all to pass. E2E assertion added: `test-issue-1204-live-panel-structure-e2e.js:55` ## Verified - Local `cmd/server` + fresh fixture, viewport 1440×900, headless Chromium: 5/5 pass. - Preflight (`run-all.sh origin/master`): clean. ## Files - `public/live.css` — `flex-direction: row` on `.live-header` (+ rationale comment) - `test-issue-1204-live-panel-structure-e2e.js` — new E2E (added to `deploy.yml`) --------- Co-authored-by: corescope-bot <bot@corescope.local>	2026-05-15 22:34:22 -07:00
Kpa-clawbot	85e97d2f37	fix(#1211 ): bounds-check path length to prevent slice [218:15] panic in MQTT decode (#1214 ) RED commit: `65d9f57b` (CI run will appear at https://github.com/Kpa-clawbot/CoreScope/actions after PR opens) Fixes #1211 ## Root cause `decodePath()` returns `bytesConsumed = hash_size * hash_count` where both come straight from the wire-supplied `pathByte` (upper 2 bits → `hash_size`, lower 6 bits → `hash_count`). Max claimable: 4 × 63 = 252 bytes. A malformed packet on the wire claimed `pathByte=0xF6` (hash_size=4, hash_count=54 → 216 path bytes) inside a 15-byte buffer. The inner hop-extraction loop in `decodePath` did break early on overflow — but `bytesConsumed` was still returned at face value (216). `DecodePacket` then did `offset += 216` (offset=218) and `payloadBuf := buf[offset:]` panicked with the prod-observed signature: ``` runtime error: slice bounds out of range [218:15] ``` The handler-level `defer/recover` at `cmd/ingestor/main.go:258-263` caught it, but the message was silently dropped with no usable diagnostic. ## Fix Add a `if offset > len(buf)` guard at BOTH decoder sites (same pattern, same panic potential): - `cmd/ingestor/decoder.go` — DecodePacket after decodePath - `cmd/server/decoder.go` — DecodePacket after decodePath Return a descriptive error citing the claimed length and pathByte hex so operators can reproduce. Also: `cmd/ingestor/main.go` decode-error log now includes `topic`, `observer`, and `rawHexLen` so future malformed packets are reproducible without needing to attach a debugger. ## Tests (TDD red → green) Both packages got two new tests: - `TestDecodePacketBoundsFromWire_Issue1211` — feeds the exact wire shape from the prod log (`pathByte=0xF6` inside a 15-byte buf). Asserts `DecodePacket` does NOT panic and returns an error. - `TestDecodePacketFuzzTruncated_Issue1211` — sweeps every `(header, pathByte)` combination with tails 0..19 bytes (≈1.3M inputs). Asserts zero panics. ### Red commit proof On commit `65d9f57b` (RED), both tests fail with the panic: ``` === RUN TestDecodePacketBoundsFromWire_Issue1211 decoder_test.go:1996: DecodePacket panicked on malformed input: runtime error: slice bounds out of range [218:15] --- FAIL: TestDecodePacketBoundsFromWire_Issue1211 (0.00s) === RUN TestDecodePacketFuzzTruncated_Issue1211 decoder_test.go:2010: DecodePacket panicked during fuzz: runtime error: slice bounds out of range [3:2] --- FAIL: TestDecodePacketFuzzTruncated_Issue1211 (0.01s) ``` On commit `7a6ae52c` (GREEN), full suites pass: - `cmd/ingestor`: `ok 53.988s` - `cmd/server`: `ok 29.456s` ## Acceptance criteria - [x] Identify the slice op producing `[218:15]` — `payloadBuf := buf[offset:]` in `DecodePacket` (decoder.go), where `offset` had been advanced by an unchecked `bytesConsumed` from `decodePath()`. - [x] Bounds check added at the identified site(s) — both ingestor and server decoders. - [x] Test with crafted payload (length-field > remaining buffer) — `TestDecodePacketBoundsFromWire_Issue1211`. - [x] Log topic, observer ID, payload byte length on drop — updated `MQTT [%s] decode error` log line. - [x] Existing tests stay green — confirmed both packages. ## Out of scope Reconnect-after-disconnect (#1212) — handled by a separate subagent. This PR touches NO reconnect logic. --------- Co-authored-by: corescope-bot <bot@corescope.local> Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: corescope-bot <bot@corescope>	2026-05-15 22:34:21 -07:00
Kpa-clawbot	4925770aa4	fix(#1207 ): empty-state placeholder for Live Feed panel (no more orphan chrome) (#1210 ) Red commit: `6c28227884a1e79e277653465028365dc0863171` — CI: https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1207 Fixes #1207 ## Diagnosis The Live Map page renders `#liveFeed` (bottom-left panel) with two header buttons — `◫` (panel-corner-btn) and `✕` (feed-hide-btn) — but its `.panel-content` body has zero children on first paint, before any packets have been ingested via WebSocket. The user-reported "X + book icons, no content" is exactly these two header buttons sitting on an empty body. Verdict: intended panel, missing content due to a data race — the chrome mounts in HTML before the WS pushes its first packet. Not orphaned, not a leftover from #1186. ## Fix - Always render a persistent `.live-feed-empty` placeholder ("Waiting for packets…") inside `#liveFeed .panel-content`. - CSS hides it via `.live-feed .panel-content:has(.live-feed-item) .live-feed-empty { display: none; }` when real feed items exist. - `rebuildFeedList` re-adds the placeholder defensively after a wipe; eviction loop counts `.live-feed-item` only so the placeholder is never trimmed out. All colors via CSS variables (`var(--text-muted)`). ## Test (RED → GREEN) - RED `6c28227884a1e79e277653465028365dc0863171` — `test-e2e-playwright.js` adds a new test ("#1207 Live Feed panel never renders as empty chrome") that wipes `.live-feed-item` children to simulate the empty state and asserts the panel body has visible text or children. Fails on master. - GREEN `a5af80960ac42759ec83fd5ca5a72e81856228d4` — adds the placeholder; test now passes. ## Acceptance criteria - [x] No empty panel chrome visible on Live Map page - [x] Panel renders "Waiting for packets…" while feed is empty - [x] CSS auto-hides placeholder when packets arrive - [x] E2E assertion in `test-e2e-playwright.js` enforces non-empty `.panel-content` on `#liveFeed` ## Files - `public/live.js` — HTML markup + `rebuildFeedList` re-add + eviction-loop guard - `public/live.css` — `.live-feed-empty` style + `:has()` hide rule - `test-e2e-playwright.js` — regression test --------- Co-authored-by: clawbot <clawbot@kpabap.local> Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-15 22:34:17 -07:00

1 2 3 4 5 ...

1985 Commits