mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-05-22 02:55:10 +00:00
dbadef3e2fbe4e9372dfc39f29c15985d85ec63c
1992 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
dbadef3e2f |
refactor(db): move all server writes to ingestor; server truly read-only (#1283)
Eliminates the SQLITE_BUSY VACUUM bug from #1283 by making cmd/server truly read-only. The bug surfaced when supervisord launched both ingestor + server in one container: the ingestor took the write lock for INSERTs, then the server's VACUUM-on-startup immediately failed with SQLITE_BUSY. Same race latently affected three other server-side writes. Four write operations moved out of cmd/server/: 1. VACUUM / auto_vacuum migration (cmd/server/vacuum.go, entire file) → cmd/ingestor/db.go Store.CheckAutoVacuum (already existed; ingestor runs it BEFORE the MQTT subscriber starts so there is no contention with concurrent writes). 2. PruneOldPackets (DELETE FROM transmissions) cmd/server/db.go → cmd/ingestor/maintenance.go (new file, Store.PruneOldPackets) + main.go scheduler. 3. PruneOldMetrics (DELETE FROM observer_metrics) cmd/server/db.go → cmd/ingestor/db.go Store.PruneOldMetrics (already existed). 4. RemoveStaleObservers (UPDATE observers SET inactive = 1) cmd/server/db.go → cmd/ingestor/db.go Store.RemoveStaleObservers (already existed). Server-side changes: - vacuum.go deleted; checkAutoVacuum / runIncrementalVacuum gone. - cmd/server/db.go: PruneOldPackets, PruneOldMetrics, RemoveStaleObservers deleted. - cmd/server/main.go: packet/metrics/observer prune schedulers removed; the neighbor-edge prune scheduler (PruneNeighborEdges) is intentionally left in place — outside scope of #1283, tracked separately. - routes.go + openapi.go: /api/admin/prune endpoint removed (prune is scheduled by the ingestor now; operators restart the ingestor for an ad-hoc pass). Ingestor changes: - New cmd/ingestor/maintenance.go with Store.PruneOldPackets. - cmd/ingestor/config.go gains RetentionConfig.PacketDays and Config.PacketDaysOrZero(). - cmd/ingestor/main.go runs PruneOldPackets at startup (if packetDays > 0) and on a 24h ticker. Docs: - AGENTS.md: documents the read/write separation invariant. - config.example.json: notes that retention + vacuumOnStartup are consumed by the ingestor. TDD: - Red: bb1d749a — invariant tests + Store.PruneOldPackets stub. - Green: this commit — real implementation + server-side removals. Note: cachedRW() still has three out-of-scope callers in cmd/server (neighbor_persist.go, ensure_indexes.go, from_pubkey_migration.go). Those are pre-existing write paths not covered by issue #1283 and are left untouched per the issue scope. Future work can relocate them under the same invariant. |
||
|
|
f6290b6373 |
test(#1283): RED — server *DB has no write methods; ingestor owns PruneOldPackets
Enforces issue #1283 architecture: cmd/server is read-only, all write/maintenance ops live on the ingestor's *Store. Three new tests: - TestServerDBHasNoWriteMethods — reflect-asserts PruneOldPackets, PruneOldMetrics, RemoveStaleObservers are NOT methods on cmd/server *DB. Fails on master (all three currently exist + use cachedRW to bypass the server's read-only handle, racing ingestor INSERTs → SQLITE_BUSY). - TestServerDBConnIsReadOnly — opens via OpenDB, asserts INSERT fails. Today this passes via OpenDB(mode=ro), but pinning it as an invariant prevents future regressions if anyone ever drops the ro flag. - TestIngestorPruneOldPackets — exercises new Store.PruneOldPackets that the GREEN commit will implement. Stub returns 0; test asserts 2 rows pruned → fails (RED proof). Plus TestIngestorVacuumOnStartupMigratesNONEtoINCREMENTAL guarding the existing CheckAutoVacuum path so the GREEN commit's deletions in cmd/server cannot regress the vacuum migration. |
||
|
|
6b2bc62fc3 | ci: update go-server-coverage.json [skip ci] | ||
|
|
3d0b3ea551 | ci: update go-ingestor-coverage.json [skip ci] | ||
|
|
0ad6dd2c6d | ci: update frontend-tests.json [skip ci] | ||
|
|
b98f59475f | ci: update frontend-coverage.json [skip ci] | ||
|
|
9d76a91718 | ci: update e2e-tests.json [skip ci] | ||
|
|
c1d94f7db5 |
fix(#1273): collapse QR overlay wrap to content height (#1277)
## Summary Fixes #1273 — `.node-top-row .node-qr-wrap` was 2-3× taller than the QR canvas inside it, leaving empty translucent space below the QR. ## Root cause Three compounding issues: 1. **SVG intrinsic height not constrained.** `qrcode-generator` emits an SVG with fixed `width`/`height` attributes (e.g. 147×147). The CSS rule `.node-qr svg { max-width: 100px }` (and 72px mobile) constrains *width* only, so the svg's intrinsic height (147px) is preserved and the wrap is sized to that. 2. **Flex stretch.** `.node-top-row` is `display:flex` with default `align-items:stretch`, so the QR column was forced to match the map column's height (~280px) on desktop. 3. **Excess padding/margin** added another ~24px above and below the visible QR. ## Fix Three small CSS changes in `public/style.css`: | change | effect | |---|---| | `.node-qr svg { height: auto; }` | svg height scales with constrained width | | `.node-top-row .node-qr-wrap { align-self: flex-start; }` | wrap sizes to content, not column | | `.node-top-row .node-qr-wrap { padding: 8px; }` + zero inner `.node-qr` margin-top | tight hug | ## Measurements (real-data fixture, full node detail page) | viewport | wrap.height before | wrap.height after | QR canvas | |---|---|---|---| | 375×800 (mobile overlay) | 165px | **82px** | 72×72 | | 1280×800 (desktop side-by-side) | 217px | **154px** | 100×100 (+ 28px caption) | Overlay remains `position:absolute` top-right on mobile; the original #1243 behavior is preserved. ## TDD - **RED**: `test-issue-1273-qr-overlay-height-e2e.js` asserts wrap height ≤ visible QR + caption + 32px at 375×800 and 1280×800. Failed on master with deltas of 93px (mobile) and 89px (desktop). - **GREEN**: both viewports pass after the CSS fix. Wired into the deploy workflow alongside the other `test-issue-*-e2e.js` runs. ## Acceptance checklist - [x] Container height ≈ QR canvas height + 16-24px padding total - [x] No empty translucent space below the QR - [x] E2E asserts at 375×800 and 1280×800 - [x] Desktop layout unchanged (overlay position preserved; column no longer stretches but the QR card is the same width) - [x] All colors via CSS variables - [x] #1243 overlay behavior preserved (still top-right on mobile, still rendered) ## Commits - `e9d75c92` test(#1273): RED - `13899270` fix(#1273): collapse QR overlay wrap --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
21b6eb0d63 |
fix(live legend): document ACK/RESPONSE/PATH + white-ring repeater convention (#1274) (#1276)
RED commit `ac1fb4c3` (Playwright E2E asserts legend rows for ACK / RESPONSE / PATH text + "ring" + "repeater" — fails on master). CI: https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1274 ## What The Live legend rendered five packet-type rows but the codebase defines eight `TYPE_COLORS`. The three gray-area types (ACK, RESPONSE, PATH) had no swatch in the legend, leaving operators guessing what gray dots meant — they're either ACKs or unknown payload types. Separately, the L.circleMarker styling block uses a brighter white ring to mark repeaters vs. all other roles; that convention was nowhere on screen. ## Changes - `public/live.js` legend HTML — adds rows for RESPONSE, PATH and a combined **Ack / Other** row (covering both ACK and the unknown-type fallback that share `#6b7280`). Adds a new **MARKER STYLES** subsection below NODE ROLES with two entries: bright white ring = repeater, faded ring = other. - `public/live.css` — adds `.live-ring` / `.live-ring--repeater` / `.live-ring--other` swatches. Background uses `var(--text-muted)`; only the white border + opacity differ between the two, matching the actual circleMarker weights (1.5 / 0.5) and opacities (0.6 / 0.3). - `test-issue-1274-legend-coverage-e2e.js` — Playwright E2E (desktop + mobile attached-DOM) asserting all four new pieces. ## Notes - All colors via `TYPE_COLORS` — no hardcoded hex in HTML. - Legend is `display:none` at ≤640px (existing #279 behavior), so no mobile CSS tweak required for the longer list. - Does not touch the legend toggle (#1219), mobile single-row header (#1234), or VCR visibility (#1269). Fixes #1274. --------- Co-authored-by: corescope-bot <bot@meshcore.local> |
||
|
|
8bf7709970 |
feat(repeater): usefulness score — bridge axis (#672 axis 2 of 4) (#1275)
RED test commit: `fd661569` — CI will fail on this (stub returns empty map; assertions fail by design). GREEN: `bf4b8592`. ## What Implements **axis 2 of 4** for the repeater usefulness score per #672 ([status comment](https://github.com/Kpa-clawbot/CoreScope/issues/672#issuecomment-4484635378)). The Bridge axis measures *structural importance*: how many shortest paths between other nodes route through this one. A high-traffic redundant node and a low-traffic critical bridge will no longer look identical. ## Algorithm **Brandes' weighted betweenness centrality** with Dijkstra for shortest paths (`cmd/server/bridge_score.go`). - Nodes: pubkeys in the `neighbor_edges` graph - Edge weight: `Score(now) * Confidence()` — per the convention from #1235 (count + recency decay scaled by observer-diversity confidence). Geo-rejected edges already excluded at graph build time (#1230) so we don't re-filter here. - Dijkstra distance: `1 / max(epsilon, weight)` — high affinity = cheap cost. - Normalize: divide by max observed centrality so output is in `[0, 1]`. Cost: `O(V · (E + V log V))`. Staging-scale (~600 nodes / ~2 000 edges) ≈ ~4.8M ops, completes in milliseconds. ## Where it lives - `cmd/server/bridge_score.go` — pure algorithm, no locks - `cmd/server/bridge_recomputer.go` — background recomputer (mirrors #1240/#1262 pattern), 5-min default interval, initial sync prewarm, snapshot stored in `s.bridgeScoreMap atomic.Pointer[map[string]float64]` - `cmd/server/routes.go` — `handleNodes` adds `node["bridge_score"]` on repeater/room rows; node-detail handler adds it on the single-node path - `public/nodes.js` — separate **Bridge** row in the node detail panel, alongside the existing **Usefulness** (Traffic) row. Distinct colour-coded bar. ## What's NOT in this PR (still pending for #672) - **Coverage axis** (axis 3) — unique observer-pair connectivity - **Redundancy axis** (axis 4) — simulated node-removal impact - **Composite** — once all 4 axes ship, swap the `usefulness_score` formula from "traffic-only" to the weighted composite `Refs #672` (not `Fixes` — issue stays open until all 4 axes + composite ship). ## Tests - `TestComputeBridgeScores_LineGraph` — 4-node line: middles non-zero, leaves zero, max normalized to 1.0 - `TestComputeBridgeScores_TriangleNoBridge` — clique has zero bridges - `TestComputeBridgeScores_Empty` — defensive nil-safety - `TestComputeBridgeScores_WeightSensitive` — mutation guard: revert the `1/w` inversion and this test fails - `TestBridgeScore_HandleNodesSurface` — integration: `/api/nodes` returns `bridge_score` on repeater rows; middle nodes > 0, ends == 0 --------- Co-authored-by: clawbot <bot@meshcore.local> |
||
|
|
c09fec56ff | ci: update go-server-coverage.json [skip ci] | ||
|
|
6dbfd331a6 | ci: update go-ingestor-coverage.json [skip ci] | ||
|
|
a00e1c0e18 | ci: update frontend-tests.json [skip ci] | ||
|
|
763d4f707c | ci: update frontend-coverage.json [skip ci] | ||
|
|
ad467daeeb | ci: update e2e-tests.json [skip ci] | ||
|
|
46ce9590f1 |
fix(#1270): Prefix Tool Network Overview shows configured-hash-size counts, not math-only slices (#1271)
Red commit: `6b68080c24106301b6bfc25f8a05484f07d0612d` (test added that fails on master). CI: see Checks tab on this PR. Fixes #1270. ## Problem Two analytics surfaces told contradictory stories about prefix usage: - **Prefix Tool → Network Overview** showed e.g. `168 / 65,536` for the 2-byte tier — a pure math fact: every repeater pubkey sliced to 2 bytes yields N distinct values. Because collisions are rare, this number always equals (or nearly equals) the repeater count, making it look like the whole network uses 2-byte hashing. - **Hash Stats → By Repeaters** showed configured-hash-size counts straight from `/api/analytics/hash-sizes` `distributionByRepeaters` — usually a minority on 2-byte and near-zero on 3-byte. The Prefix Tool was presenting a math fact as if it were operational truth. ## Fix `renderPrefixTool` now also fetches `/api/analytics/hash-sizes` and restructures each tier card into three labeled stats with explicit hierarchy: 1. **Primary** — `X of Y repeaters configured` (from `distributionByRepeaters`). Same source the Hash Stats tab uses, so the two pages agree exactly. 2. **Operational collisions** — colliding slices among repeaters configured for *this* hash size only (matches Hash Issues semantics). 3. **Theoretical** (secondary, smaller, dashed-rule footnote) — `X unique N-byte slices across all repeater pubkeys (of Y possible)`. The math fact is preserved as educational info, no longer impersonating operational truth. The "Total repeaters" card now also notes how many have a known configured hash size. The "About these numbers" footer was rewritten to explain the three numbers and link to both Hash Stats and Hash Issues. The prefix collision detector (Check / Generate panels) is unchanged — it still scans every repeater pubkey because that is its job. ## Test Added `#1270 Prefix Tool primary counts match Hash Stats By Repeaters` to `test-e2e-playwright.js`. It fetches `/api/analytics/hash-sizes` for the ground-truth `distributionByRepeaters`, then visits `#/analytics?tab=prefix-tool`, opens Network Overview, and scrapes the primary count via a new `data-pt-configured="<bytes>"` `data-value="<count>"` marker on each tier card, asserting exact equality for 1/2/3-byte. - Red commit `6b68080c` (test only): fails on master with `NO data-pt-configured marker`. - Green commit `12ed2789` (fix): test passes; full E2E suite `123/126 passed, 3 skipped`. ## Acceptance - [x] Prefix Tool Network Overview shows configured-hash-size repeater counts as the primary number - [x] "Unique slices" math is shown as secondary/educational - [x] Two pages tell the same story (E2E asserts byte-equal match) - [x] E2E asserts the configured-count matches what Hash-Sizes tab shows at the same point in time |
||
|
|
0022c8fd1f | ci: update go-server-coverage.json [skip ci] | ||
|
|
7827c8e778 | ci: update go-ingestor-coverage.json [skip ci] | ||
|
|
cb0218fc4d | ci: update frontend-tests.json [skip ci] | ||
|
|
385f49b3d8 | ci: update frontend-coverage.json [skip ci] | ||
|
|
f4ecc96ccc | ci: update e2e-tests.json [skip ci] | ||
|
|
78b666c248 |
fix(#1267): mobile VCR bar invisible — JS height clobbered bottom-nav reserve (#1269)
## Summary Mobile-only regression: on the Live page at ≤768px viewports the VCR bar was rendered behind the fixed bottom-nav and never visible to the user. iOS Safari screenshot at 375x812 showed: top header strip, full-height map, bottom-nav — **no VCR row at all**. Fixes #1267. ## Root cause `public/live.js` `initResizeHandler` (the existing JS height override) was setting `page.style.height = window.innerHeight + 'px'`, which clobbered the CSS rule that already subtracts `--bottom-nav-reserve` from the live-page height. Because `.live-page` then spanned the full viewport, the VCR bar (`position:absolute; bottom:0; z-index:1000`) was painted underneath `.bottom-nav` (`position:fixed; z-index:1200`). The VCR bar element WAS in the DOM, WAS `display: flex`, and HAD `height: 53px` — it just sat at y=758..812 underneath the bottom-nav at y=754..812. CSS-only checks for `display:none` would never catch this; the test asserts the bar's bottom edge is at or above the bottom-nav's top edge. ## Fix One-liner in spirit: subtract the bottom-nav height before applying `page.style.height`. The implementation measures the rendered `.bottom-nav` (with a fallback to a hidden probe that resolves the `--bottom-nav-reserve` token), so it survives safe-area inset and the bottom-nav's 1px border. ```js const reserve = /* measure .bottom-nav, fall back to --bottom-nav-reserve token */; const h = Math.max(0, window.innerHeight - reserve); ``` Desktop is unchanged: `.bottom-nav` is `display: none`, the probe resolves to 0, and `h === window.innerHeight` exactly as before. ## TDD - **RED** (commit 1): `test-e2e-1267-mobile-vcr.js` — Playwright at iPhone 375x812 asserts `.vcr-bar` has `display !== 'none'`, `visibility !== 'hidden'`, `height > 0`, `top < viewport.height`, and (the key check) `bottom <= bottom-nav.top`. Fails on `master` with: *"VCR bar bottom 812 overlaps bottom-nav top 754"*. - **GREEN** (commit 2): the fix above. Test passes: *"VCR bar bottom 754 ≤ bottom-nav top 754"*. ## Verification - ✅ Mobile (375x812) repro reproduced against `master` (bar at y=758..812, behind bottom-nav) - ✅ Mobile (375x812) E2E green after fix (bar at y=700..754, flush above bottom-nav) - ✅ Desktop (1440x900) unaffected — bottom-nav hidden, page height = viewport height as before, VCR bar at viewport bottom - ✅ #1234 (top-nav hidden on /live), #1246 (single-row VCR), #1206/#1213 (VCR/feed clearance) unchanged — none touched ## Files - `public/live.js` — single function (`initResizeHandler`) modified - `test-e2e-1267-mobile-vcr.js` — new mobile-viewport Playwright regression test Run: `BASE_URL=http://localhost:13581 node test-e2e-1267-mobile-vcr.js` --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
d4d569278d | ci: update go-server-coverage.json [skip ci] | ||
|
|
ee21eafa66 | ci: update go-ingestor-coverage.json [skip ci] | ||
|
|
c6a90e9896 | ci: update frontend-tests.json [skip ci] | ||
|
|
92518ab234 | ci: update frontend-coverage.json [skip ci] | ||
|
|
cd84f51f8a | ci: update e2e-tests.json [skip ci] | ||
|
|
4cd8445233 |
perf(#1265): wire /api/observers/clock-skew + /api/nodes/clock-skew into analytics recomputer (#1266)
RED:
|
||
|
|
ae17a2be12 |
perf(#1262): /api/nodes?limit=2000 cold-miss 15.7s → <100ms — prewarm repeater enrichment cache (#1263)
RED commit: `22ce5736066142583017cad7303fa48d9e00ccf0` — CI on red: https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1262 ## Problem After #1260 added a 15s-TTL bulk cache for repeater enrichment in `handleNodes`, `/api/nodes` (default limit) dropped to ~500ms. But `/api/nodes?limit=2000` — called by `public/live.js` at SPA startup for hop resolution — still took **15.7s cold** on staging (75k tx, 600 nodes). Warm hits were ~40ms. Root cause: the bulk cache was lazily populated on the first request after TTL expiry. The rebuild ran on the request-serving goroutine. Every cold SPA load triggered the rebuild and ate 15s. ## Fix Add `StartRepeaterEnrichmentRecomputer` — a steady-state background recomputer that mirrors the `analytics_recomputer.go` pattern from #1240: - **Prewarm**: initial synchronous compute on Start so the first request hits a populated cache. - **Steady-state**: ticker refreshes the snapshot every 5min (configurable via the existing analytics recompute interval knob). - **Panic-safe** + idempotent Start. Wired into `main.go` right after `StartAnalyticsRecomputers`, using `cfg.GetHealthThresholds().RelayActiveHours` as the window. ## Test `TestHandleNodesLimit2000ColdMiss` — seeds 600 nodes + 150k non-advert tx with repeaters indexed under a shared 1-byte hop prefix (matches production hop-prefix collisions), starts the recomputer, then issues `/api/nodes?limit=2000` with **no HTTP warmup**. | State | Latency | |---|---| | Before (master, on-thread rebuild) | 3.37s | | After (prewarm + steady-state) | 56ms | | Budget | 2s | Staging end-to-end: 15.7s → expected sub-100ms on the same call path. Red commit (`22ce5736066142583017cad7303fa48d9e00ccf0`) compiles with a no-op stub of the new method so the test fails on the latency **assertion**, not a missing symbol. Fixes #1262 --------- Co-authored-by: corescope-bot <bot@corescope.local> |
||
|
|
094a96bd6c |
perf(#1258): /#/perf — parallel health fetch, sort endpoints, pause refresh while hidden (#1261)
Fixes #1258 — Perf dashboard (/#/perf) was slow because of three frontend issues; backend APIs were never the problem. ## Findings 1. **`/api/health` fetched sequentially after `Promise.all`** in `refresh()` — added a full RTT (~50-200ms) on every 5s tick on top of the parallel batch. 2. **Endpoints table not actually sorted** despite the heading "sorted by total time". JSON shape is `map[string]EndpointStatsResp` (no defined order); frontend rendered map iteration order. Visible correctness bug surfaced during investigation. 3. **`setInterval(refresh, 5000)` kept firing while tab was hidden**, rebuilding the entire ~10-section `innerHTML` (cards + 3 tables) in the background. On tab return the user saw a backlog thrash + felt the page was "slow to render". ## Fix (`public/perf.js`) - Move `/api/health` into the same `Promise.all` as the other 4 endpoints — saves one RTT per refresh. - Sort `Object.entries(server.endpoints)` by `count * avgMs` DESC client-side. - Add `document.hidden` guard in the interval tick + `visibilitychange` listener that refreshes once on return; `destroy()` removes the listener. ## Tests `test-perf-render-1258.js` (new): - All 5 initial fetches issued in parallel (including `/api/health`) - Refresh suppressed while `document.hidden` - Endpoints table sorted by total time DESC, regardless of input map order RED commit first (`6b54f9e8`, 0/3 pass) → GREEN commit (`be81303b`, 3/3 pass). Existing `test-perf-go-runtime.js` (13/13) and `test-perf-disk-io-1120.js` (15/15) still green. ## Investigation exemption No Playwright timing test — sandbox can't run a real browser. Static analysis + render-shape unit tests cover the three identified bottlenecks. Documented per AGENTS "investigation surfaces" exemption. ## Measurement Before: refresh = parallel batch (~max(server-side)) + sequential `/api/health` (~50ms) + full innerHTML rebuild every 5s including hidden tabs. After: refresh = single parallel batch, runs only while visible. Expected improvement on tab-return ≈ -1 RTT per refresh + zero background work. --------- Co-authored-by: corescope-bot <bot@corescope.local> |
||
|
|
1efe93d7f6 |
perf(#1257): bulk-cache repeater enrichment in /api/nodes — 32s → <500ms (#1260)
RED commit `a2879e12` — perf regression test; CI run: see Actions tab. Fixes #1257. ## Root cause `handleNodes` looped over the response page and called `store.GetRepeaterRelayInfo(pk, win)` + `store.GetRepeaterUsefulnessScore(pk)` for every repeater/room. Each call: - grabbed its own `s.mu.RLock`, - walked `byPathHop[pk]` (+ the matching 1-byte raw-prefix bucket, which on busy networks fans out to nearly the entire non-advert tx set), - and re-parsed every `tx.FirstSeen` with `parseRelayTS`. Default page is the 50 most-recently-seen nodes — almost all hot repeaters — so the request did O(50) lock acquisitions and hundreds of thousands of timestamp parses on the same set of txs. That's the classic load-then-paginate / per-row N+1 shape called out in the issue (same family as #1226). The `?limit=2000` variant looks faster relatively only because per-node enrichment dwarfs serialization; on staging both still bottleneck on the same loop. ## Fix Two new bulk methods on `PacketStore`: - `GetRepeaterRelayInfoMap(windowHours)` → `pubkey → RepeaterRelayInfo` - `GetRepeaterUsefulnessScoreMap()` → `pubkey → 0..1` Both snapshot `byPathHop` under a single `RLock`, pre-parse each `FirstSeen` exactly once (a tx that appears in N hop buckets used to be parsed N times), and emit one entry per hop key. Cached 15s — same TTL as `GetNodeHashSizeInfo` / `GetMultiByteCapMap`, same status-column freshness budget. `handleNodes` is one map-lookup per node; behavior, output schema, and `RelayActive` / `RelayCount{1h,24h}` / `LastRelayed` / `usefulness_score` semantics are preserved. ## Why no `limit` default change The issue mentioned a default-limit knob. Investigated: `queryInt(r, "limit", 50)` already defaults to 50 — frontends calling `/api/nodes` (no limit) get a 50-row page today. Capping further would change behavior (live.js already passes `?limit=2000` when it wants more); the cost was per-repeater enrichment, not page size. Fixing the N+1 is the correct lever and preserves backward compat. ## Perf Regression test `TestHandleNodesPerfLargeFleet` (600 nodes, 150k non-advert tx, repeaters indexed under `byPathHop`): | | elapsed | vs 2s budget | |---|---|---| | before (master) | 4.72s | ✗ | | after | ~4ms | ✓ (~1000×) | ## TDD - RED: `a2879e12` — test fails at 4.72s on master. - GREEN: `c529d29a` — fix; full `cmd/server` + `cmd/ingestor` suites green. --------- Co-authored-by: corescope-bot <bot@corescope> |
||
|
|
f81ed5b3cf |
perf(#1256): wire /api/analytics/roles into steady-state recomputer (#1259)
RED commit: `0190466d` — failing CI: https://github.com/Kpa-clawbot/CoreScope/actions (will populate after PR creation) ## Problem On staging (commit `d69d9fb`, 78k tx, 2.3M obs), `curl http://localhost/api/analytics/roles` times out at 60s with 0 bytes — the Roles tab is unusable. Issue #1256. PR #1248's steady-state recomputer fan-out (topology / rf / distance / channels / hash-collisions / hash-sizes) **didn't include roles**. The legacy handler: 1. Holds `s.mu.RLock` for the entire compute. 2. Calls `GetFleetClockSkew()`, which drives `clockSkew.Recompute(s)` over all ADVERT transmissions — O(78k) per request. 3. Concurrent ingest writers compound the latency through writer-starvation. Result: every request hits the cold path; the response never comes back inside the 60 s HTTP budget. ## Fix Add `roles` as the 7th endpoint in the recomputer fan-out — same pattern as #1248: - `PacketStore.recompRoles` slot, registered in `StartAnalyticsRecomputers` with default 5-min interval. - `PacketStore.GetAnalyticsRoles()` → atomic-pointer load from the snapshot (sub-ms), with a `computeAnalyticsRoles()` fallback only for the brief startup window before the initial sync compute completes. - Handler is now a thin wrapper — no lock-held work on the request path. - New optional `roles` key under `analytics.recomputeIntervalSeconds` in config; `config.example.json` and `_comment_analytics` updated. ## Latency (unit-scope benchmark) - Worst-of-50 handler latency: **<100 ms** (test budget; well under the 2 s p99 acceptance). - Compute itself is bounded by the existing 5-min recompute window — it runs once in the background, never on the request path. ## Tests - RED `0190466d`: asserts `recompRoles` is registered and the handler returns under the latency budget. Fails on master with `recompRoles not registered`. - GREEN `d7784f76`: registers the recomputer + snapshot accessor — both tests pass. Fixes #1256 --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
d69d9fbf8e |
perf(#1247): surgical fix for resolveWithContext tier-1 hot path (4.6× speedup) (#1253)
## Summary Surgical fix for #1247: analytics endpoints regressed 3-9× between prod `d818527` and master. pprof against staging traced the regression to `resolveWithContext` tier-1 affinity loop running on every analytics `resolveHop` call (post-#1198 plumbing) with redundant per-(cand, ctx) work. **Result: 4.6× speedup on the synthetic hot-shape benchmark (202µs → 44µs / op).** ## Root cause - PR #1198 (`353c5264`) lit up `resolveWithContext` tier 1 from every analytics resolveHop closure (previously they passed `contextPubkeys=nil` and short-circuited the entire tier-1 block). - The inner loop did `N_cand × N_ctx` iterations where each one did: - `graph.Neighbors(strings.ToLower(ctxPK))` — graph RLock + ToLower allocation **per candidate**, redundantly - `strings.ToLower(cand.PublicKey)` per `ctxPK` - `strings.EqualFold(otherPK, ctxPK)` + `EqualFold(otherPK, candPK)` — both sides were already lowercased (`NeighborEdge.NodeA/B` via `makeEdgeKey`; `contextPubkeys` via `buildHopContextPubkeys`) - At staging scale (5k+ contextPubkeys × 30k+ resolveHop calls) this dominated `computeAnalyticsTopology` (37% of its CPU) and `computeAnalyticsRF` (55%). ## pprof attribution (staging, region-keyed queries bypassing #1240 cache) ``` computeAnalyticsTopology cum: 19.24% (5.45s / 28.32s sampled) └─ resolveWithContext 37% ├─ strings.ToLower 41% ├─ strings.EqualFold 28% └─ graph.Neighbors 24% computeAnalyticsRF cum: 10.38% ``` ## Fix (~80 LoC in `cmd/server/store.go`) 1. Lowercase `contextPubkeys` **once per call**, skipped entirely when already lowercased (the analytics fast path). 2. Lowercase candidate pubkeys **once per call**. 3. Invert the loop nesting: outer-ctx / inner-edge / candidate-map lookup. `graph.Neighbors` is called once per context pubkey instead of `N_cand` times. 4. Raw `==` instead of `strings.EqualFold` for pubkey comparisons (both sides lowercased by step 1/2). 5. Added a tiny `hasUpperASCII` byte-loop helper next to `isHexLower` for the fast-path check. Behavior preserved: same `Score × Confidence` formula, same tier-1 ratio + min-observations gate, same per-candidate "best edge wins" semantics. No change to tiers 2/3/4. ## TDD evidence - Red commit (`5f8d1564`): `TestResolveWithContextTier1Floor` asserts `<100 µs/call` on the hot shape. **199 µs/call on regressed master → FAIL.** - Green commit (`e3bdbc65`): surgical fix lands. **44 µs/call → PASS.** - Reverification: locally stashed the fix, ran the test → 199.5 µs FAIL; popped fix → 44 µs PASS. `BenchmarkResolveWithContextTier1Hot` (no assertion, visibility only): ``` before: 202013 ns/op 168 B/op 3 allocs/op after: 44084 ns/op 424 B/op 6 allocs/op speedup: 4.6× ``` (Post-fix allocs are O(N_cand + N_ctx) one-time helper tables — net win at hot scale.) ## Independence from #1248 PR #1248 caches the analytics compute output so user-facing latency is sub-ms even when the compute is slow. That's correct for UX but it masks the regression. This PR repairs the compute itself, so: - Region-keyed and windowed queries (which bypass the recomputer cache by design — see #1240) become fast again. - Future ingest scale or feature work on top of the regressed baseline doesn't compound. ## Out of scope - The geo-rejection (#1228) and Confidence weighting (#1229) commits — kept intact, they protect correctness and were not the dominant CPU cost. - Reverting any suspect commit — surgical only. ## Acceptance criteria from #1247 - [x] pprof confirms the hot function (`resolveWithContext`) - [x] Bisect identifies the regressing commit (`353c5264` / PR #1198 — context plumbing; ratified by pprof, no need to actually rebuild 5 binaries) - [x] Fix lands; tier-1 hot path 4.6× faster - [x] No regression in disambiguator correctness — full `go test ./...` green, all existing `ResolveWithContext` / `HopDisambig` / `NeighborGraph` / `Affinity` tests pass Fixes #1247 --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
1d33ac53b0 |
fix(#1254): trim .badge-iata h-padding on mobile to clear 1.25px clip (#1255)
Fixes #1254. Master CI Playwright fail-fast on every push since #1252: ``` ❌ Mobile viewport (375px): observer IATA badge stays visible — not clipped: .badge-iata right edge 376.25 exceeds 375px viewport ``` ## Root cause After #1252 unhid `.col-observer` at narrow widths so the IATA pill from #1188 renders on mobile, at 375px the cell padding + truncated observer name (10 chars in grouped rows) + `.badge-iata` pill (`padding: 1px 5px` + `margin-left: 4px`) sums to ~376.25px — overflowing the viewport by 1.25px. Same class of failure as #1250/#1251 (VCR LCD-clip). ## Fix `public/style.css` — inside the existing `@media (max-width: 640px)` block, shrink `.badge-iata` `padding: 1px 5px → 1px 3px` and `margin-left: 4px → 2px`. Reclaims ~6px horizontally, well clear of the 1.25px overflow. Desktop (≥641px) styling untouched. ## TDD The failing E2E sub-test in `test-observer-iata-1188-e2e.js` (added in #1189 R1) IS the red. Mutation verified locally: | Variant | Result | |--------------------|--------| | WITHOUT this fix | ❌ `.badge-iata right edge 376.25 exceeds 375px viewport` | | WITH this fix | ✅ all 3 sub-tests pass | ## Local verification ``` $ go build -o /tmp/corescope-server ./cmd/server $ /tmp/corescope-server -port 13581 -db test-fixtures/e2e-fixture.db -public public & $ CHROMIUM_PATH=/usr/bin/chromium BASE_URL=http://localhost:13581 \ node test-observer-iata-1188-e2e.js Running observer-IATA E2E tests against http://localhost:13581 ✅ Packets table renders an IATA badge in an observer cell ✅ Filter grammar: observer_iata == "<code>" narrows the table ✅ Mobile viewport (375px): observer IATA badge stays visible — not clipped All observer-IATA E2E tests passed. ``` ## Constraints honored - All colors via existing CSS variables (no theming illusions; only `padding` / `margin-left` change inside `@media (max-width: 640px)`). - No JS changes. - Desktop badge display unaffected (selector scoped to narrow viewport). - `config.example.json`: no config field added. - PII preflight: clean. Co-authored-by: OpenClaw Bot <bot@openclaw.local> |
||
|
|
43203b09b7 |
fix(#1249): IATA badge missing on fixture + mobile clipping (#1252)
Failing test commit: `bdb4eefb` (added in #1189 R1) — original CI failure: https://github.com/Kpa-clawbot/CoreScope/actions/runs/25995819598 Fixes #1249. ## Root cause Two independent bugs surfaced by the same E2E test: 1. **Fixture join broken.** `scripts/capture-fixture.sh` wrote the text observer hash into `observations.observer_idx`, but the v3 join in `cmd/server` is `observers.rowid = observations.observer_idx`. The join silently nulled out `observer_id` / `observer_iata` for every packet. 2. **Mobile clipping.** `.col-observer` had `data-priority=3` (hides at ≤1024px) and was in the narrow-viewport `defaultHidden` list, so at 375px the cell collapsed to `display:none` and `.badge-iata` had a 0×0 box. ## Changes - `test-fixtures/e2e-fixture.db`: remap `observer_idx` text hash → integer rowid (500/500 rows resolved). - `scripts/capture-fixture.sh`: build an `observer_id → rowid` map before insert; skip rows whose observer isn't in the fixture. Comment explains the trap. - `public/packets.js`: bump `.col-observer` priority `3 → 1` and drop `observer` from narrow-viewport `defaultHidden`. ## Verification All three sub-tests in `test-observer-iata-1188-e2e.js` pass locally against the freshened fixture. `curl /api/packets?limit=5` returns real IATA codes (OAK / MRY / SFO) instead of empty strings. Co-authored-by: OpenClaw Bot <bot@openclaw.local> |
||
|
|
45872c8371 |
fix(#1250): trim mobile VCR bar h-padding 8px→4px to clear 0.83px LCD clip (#1251)
Red: master CI run https://github.com/Kpa-clawbot/CoreScope/actions/runs/25995768081 already fails on `test-e2e-playwright.js` `#1221 LCD clipped on right (right=375.828125, vw=375)`. No new test commit — the existing E2E assertion is the gate. **Root cause.** PR #1222's mobile rule set `.vcr-bar { padding: 4px 8px }`. The flex row holds three `flex-shrink: 0` children (controls + scope-btns + lcd) and one `flex: 1 1 0` absorber (`.vcr-timeline-container`, `min-width: 40px`). At 375px viewport the absorber hits its floor, so the intrinsic widths of the shrink-frozen children spill 0.83px past the padding box. **Fix.** Drop horizontal padding 8px → 4px inside the `@media (max-width: 640px)` block. That's 8px of new slack — order of magnitude above the 0.83px clip — keeping LCD's `getBoundingClientRect().right ≤ 375`. Desktop layout untouched (rule is mobile-scoped). VCR/feed overlap (#1206/#1213) not reintroduced because `--vcr-bar-height` is JS-measured by the ResizeObserver, not pinned in CSS. Fixes #1250 Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
356f001027 |
perf(#1240): steady-state background recompute for analytics endpoints (#1248)
RED commit: `27630f6a` — adds latency test that fails on master (p99=225ms > 50ms budget) and a stub `StartAnalyticsRecomputers` that returns a no-op so the assertion (not a build error) gates the change. GREEN commit: `20fbbceb` — wires real background recompute infrastructure. Test passes at p99=~1µs. ## What changed Replaces the on-request "compute-then-cache" pattern for the default-shape analytics queries with a steady-state background recompute loop. Reads always hit an `atomic.Value` snapshot in <1µs regardless of compute cost or writer contention. Operator principle: serving slightly stale data quickly beats real-time data slowly. ## Endpoints converted (default 5min interval each) | Endpoint | Cold compute | Recomputer interval | |---|---|---| | `/api/analytics/topology` | ~5s | 5 min | | `/api/analytics/rf` | ~4s | 5 min | | `/api/analytics/distance` | ~3s | 5 min | | `/api/analytics/channels` | ~0.5s | 5 min | | `/api/analytics/hash-collisions` | ~0.5s | 5 min | | `/api/analytics/hash-sizes` | ~22ms | 5 min | All intervals configurable per-endpoint via `analytics.recomputeIntervalSeconds.<name>` in `config.json`; documented in `config.example.json`. Default override via `analytics.defaultIntervalSeconds`. ## Scope: default query only Only the canonical shape `(region="", window=zero)` is precomputed. Region- or window-filtered requests fall back to the legacy TTL cache + on-request compute — keeps recomputer count bounded (6, not 6×N×M). ## Latency Test `TestAnalyticsRecomputerSteadyStateLatency`: 100 concurrent readers + 4 writers churning `s.mu.Lock` on 20k distHops. - Before: p50=188ms p99=225ms (assertion failed) - After: p50=240ns p99=1.1µs (atomic load + map return) ## Shutdown integration `StartAnalyticsRecomputers` returns a stop closure invoked from `main.go`'s SIGTERM handler BEFORE `dbClose()` so any in-flight SQLite compute drains cleanly. `TestAnalyticsRecomputerShutdownNoLeak` confirms all 6 goroutines are reaped (Δ=6 within 2s). ## Safety details - Initial compute is synchronous in `Start()` — first read after startup never sees nil. - `recover()` inside `runOnce` keeps a compute panic from killing the goroutine; previous snapshot remains valid. - `analyticsRecomputerMu` is a sync.RWMutex; recomputer pointers are read-locked in the hot path. The atomic.Value swap inside `runOnce` is lock-free. Fixes #1240. --------- Co-authored-by: OpenClaw Bot <bot@openclaw.local> |
||
|
|
b881a09f02 |
feat(#1188): show observer IATA on packets + filter grammar (#1189)
Red commit:
|
||
|
|
e395c471ed |
fix(#1244): live mobile VCR single row + disable orphan gesture-hint pills on /live (#1246)
Red commit:
|
||
|
|
74685ac82f |
fix(#1243): node detail mobile QR overlays map semi-transparently (#1245)
RED commit `fc9b619a` — CI: https://github.com/Kpa-clawbot/CoreScope/actions Fixes #1243. ## Problem On `#/nodes/<pubkey>` at 375×800, the QR code rendered as a separate ~250px-tall panel below the map. Desktop already overlays the QR semi-transparently via `.node-map-qr-overlay` for the compact view. ## Fix Extend the mobile breakpoint (`@media (max-width: 640px)`) so the full-screen `.node-top-row` mirrors the desktop overlay pattern: - `.node-top-row` → `position: relative`; map wrap expands to 100% - `.node-qr-wrap` → `position: absolute; bottom/right: 8px; z-index: 400` - Semi-transparent background (`rgba(255,255,255,0.85)` light / `0.4` dark) - Caption hidden in overlay (already shown above) Desktop (≥768px) flex layout untouched. ## TDD - RED `fc9b619a` — E2E at 375×800 asserts QR is `position: absolute|fixed`, overlaps map rect, and bg alpha < 1. - GREEN `ded978c0` — CSS adds overlay rule. ## Verification Preflight clean. Desktop layout unaffected — change is scoped inside `@media (max-width: 640px)`. ## Files - `public/style.css` (+29) - `test-e2e-playwright.js` (+57) --------- Co-authored-by: clawbot <clawbot@local> |
||
|
|
2754251a53 |
perf(#1239): /api/analytics/distance — TTL 15s→60s + drop main RLock around compute (#1241)
## Summary Fixes #1239 — `/api/analytics/distance` 15s cold on staging under heavy ingest. Two independent fixes. First commit on this branch is the RED test for Fix B (`a539882`), demonstrating reader/writer contention against the main store lock. CI: see Actions tab for the run on the test-only commit — it asserts >150µs avg writer cycle and fails at 82367µs pre-fix. GREEN commit (`d3938f1`) brings it to 1µs. ## Fix A — TTL bump 15s → 60s (`5eae1e0`) - `rfCacheTTL` default in `cmd/server/store.go` changed from `15 * time.Second` to `60 * time.Second`. This is the shared TTL for RF / topology / distance / hash-sizes / subpath / channel analytics caches. - Per operator clarification (issue thread): distance analytics IS viewed live during analysis sessions, not background-glanced. 60s smooths the cold-miss churn during heavy ingest without freezing data. - `config.example.json`: documented `cacheTTL.analyticsRF` with new default + caveat. - Existing assertions (`TestCacheTTLDefaults`, `TestHashCollisionsCacheTTL`) updated to the new default. ## Fix B — Drop main RLock around compute (`a539882` red, `d3938f1` green) `computeAnalyticsDistance` previously held `s.mu.RLock()` for the entire iteration: region match-set construction, hop/path filtering, sort, dedup, histogram, category stats, time series. Readers serialized writers (ingest, `buildDistanceIndex`). Refactor: hold the RLock only long enough to snapshot the `distHops`/`distPaths` slice headers AND build the region match-set (which reads `tx.Observations`, mutated under `s.mu.Lock`). For `region=""` (the hot cold-call path) the lock hold is just the header snapshot — microseconds. Everything else runs on the locally-captured slices outside the lock. Safety: `distHops`/`distPaths` are append-only via re-slice in `buildDistanceIndex` / `updateDistanceIndexForTxs` (both under `s.mu.Lock`). If the backing array reallocates after the snapshot, the snapshot still references the prior array (GC-pinned) at the consistent length captured under the lock. Records are value types — no torn writes. ## Test results `cmd/server/distance_lock_contention_test.go` (8 reader goroutines × 20k synthetic distHops × 200 writer Lock/Unlock cycles): - pre-fix avg writer cycle: **82367µs** (16.5s for 200 cycles) - post-fix avg writer cycle: **1µs** (279µs for 200 cycles) - ~82000× reduction in writer contention; reader result shape unchanged Full `go test ./cmd/server/...` green with `-race`. ## Out of scope (per issue) - Same lock pattern in topology / RF / hash / subpath analytics — file separately if needed. - Per-region cache key sharding. - WebSocket-driven cache invalidation. --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
aba20b3eda |
fix(#1234): Live mobile chrome pass 2 — single-row header, hide top-nav, VCR overflow (#1238)
## Summary Live page mobile chrome-reduction pass 2. Three coordinated trims at ≤640px: 1. **`.live-header` → single row, ≤44px.** Drop the MESH LIVE text label and the chart-icon (📊) header toggle. Promote `.live-stats-row` to a direct child of `.live-header` so beacon + pkts + nodes + active + rate + gear all sit on one row. The (now empty) `.live-header-body` collapses to `display:none`. `.live-controls-toggle` shrinks to 36×36 to fit the strip. 2. **Top app navbar hidden on `/live`.** `body:has(.live-page) .top-nav { display:none }` — scoped via `:has()` so other routes are unaffected. The `.live-page` height reclaims the freed 52px. 3. **VCR scope row: >6h collapsed into `More ▾`.** `12h` and `24h` get `.vcr-scope-btn--overflow`; the new `.vcr-scope-more-wrap` dropdown is desktop-hidden, mobile-shown. Dropdown items proxy `.click()` to the underlying scope buttons — single source of truth, existing handler unchanged. ## TDD - **RED** (`b975c828`): `test-issue-1234-live-chrome-pass2-e2e.js` — one E2E asserting all three acceptance items at 375×800 + desktop sanity at 1280×800. Wired into `deploy.yml`. Fails on master (no More button, navbar visible, MESH LIVE label visible). - **GREEN** (`1e529e63`): CSS + JS implementation. Updates `test-live-layout-1178-1179-e2e.js` and `test-issue-1204-live-panel-structure-e2e.js` in-place to match the new single-row contract (chart toggle gone, MESH LIVE label gone on mobile, gear shrunk to 36×36). ## Verification (local) - New E2E: 7/7 ✅ - `test-issue-1178-1179`: 10/10 ✅ - `test-issue-1204`: 10/10 ✅ - `test-issue-1205`: 18/18 ✅ - `test-issue-1206`: 7/7 ✅ - `test-live-mql-leak-1180`: 2/2 ✅ - `#1220` empty-chrome guard (in `test-e2e-playwright.js`): header = 38px collapsed ✅ Desktop (1280×800) layout unchanged — top-nav visible, all 4 VCR scopes inline, header behavior identical. Fixes #1234. --------- Co-authored-by: corescope-bot <bot@corescope.local> |
||
|
|
4ea1bf8ebc |
fix(#1236): map mobile — sticky panel header + remove right gutter (#1237)
RED:
|
||
|
|
2e28aa3e04 |
fix(#1229): source-diversity confidence weighting in neighbor-graph tier-1 resolver (#1235)
RED |
||
|
|
b21badbcbd |
fix(#1225): paginate channel messages at SQL level — 30s → <500ms (#1226)
## Summary Fixes #1225 — channel messages endpoint took ~30s on staging. ## Root cause `(*DB).GetChannelMessages` SELECTed every observation row for the channel (one row per observation, not per transmission), JSON-unmarshalled each row into a Go map, dedupe-folded by `(sender, packetHash)`, then sliced the tail in Go for pagination. On staging `#wardriving`: - `transmissions` rows with `channel_hash='#wardriving' AND payload_type=5`: **5,703** - `observations` joined to those: **274,632** (~48× amplification) - `time curl /api/channels/%23wardriving/messages?limit=50`: **30.04s / 31.41s / 31.48s / 35.33s / 34.05s** (5 calls before I killed the loop) `EXPLAIN QUERY PLAN` showed the index `idx_tx_channel_hash` was being used — the cost was entirely in fetching, unmarshalling, and folding the full observation set per request even for `limit=50`. Hypothesis #1 from the issue (full table scan on `messages/decoded`) is rejected; #2 (missing index) is rejected; the actual cause was **pagination in Go instead of SQL** — request cost was O(observations) not O(limit). ## Fix Move pagination into SQL on the `transmissions` table. Because `transmissions.hash` is `UNIQUE` and the original dedup key was `(sender, hash)`, each transmission collapses to exactly one logical message — paginating on transmissions is semantically equivalent to the prior in-Go dedup + tail slice. New shape: 1. `COUNT(*)` on transmissions for total (uses `idx_tx_channel_hash`). 2. `SELECT id FROM transmissions … ORDER BY first_seen DESC LIMIT ? OFFSET ?` to pick the page of newest transmissions. 3. `SELECT … FROM observations WHERE transmission_id IN (…page ids…)` — typically 50 ids → a few hundred observation rows. 4. Reassemble in pageIDs order, preserving the ASC-by-`first_seen` API contract. Region filtering, observation-count-as-`repeats`, and "first observation wins for hops/snr/observer" semantics are preserved (observations are scanned `ORDER BY o.id ASC`). ## Perf measurements **Before** (staging `#wardriving`, limit=50, 5 samples killed mid-loop): 30.04s, 31.41s, 31.48s, 35.33s, 34.05s. **Synthetic regression test** (`TestGetChannelMessagesPerfLargeChannel`): 3000 tx × 50 obs. - Broken impl: ~4.5s (test fails the 500ms budget — the RED commit). - Fixed impl: well under 500ms (test passes). **After (staging)**: will measure post-deploy and post-comment on issue with numbers. Synthetic scaling: staging is ~2× the test's transmission count, fixed-path cost scales with `limit` (50) + `COUNT(*)` (~5k rows on index) — expect <100ms p99. ## TDD - RED: `697c290d` — perf test asserts <500ms on 3k×50 dataset; fails at ~4.5s. - GREEN: `3f1f82d3` — fix; full suite green, perf test passes. ## Hypotheses status | # | Hypothesis | Verdict | |---|---|---| | 1 | Endpoint slow on prod-sized data | **CONFIRMED** (different mechanism — see root cause) | | 2 | Missing channel_hash index | Rejected (`idx_tx_channel_hash` exists & used) | | 3 | Frontend re-render storm | Not investigated (backend was clearly the bottleneck) | | 4 | Decode in request path | Rejected (decode is at ingest time; JSON unmarshal of cached `decoded_json` is the cost, addressed by reducing row count) | | 5 | WS subscription failure | Rejected | | 6 | Staging artifact | Rejected (reproducible) | ## Out of scope - The in-memory `(*PacketStore).GetChannelMessages` path (used when `s.db == nil`) has the same shape but operates on bounded in-memory data; not touched. If we ever fall back to it in production we'll revisit. --------- Co-authored-by: clawbot <bot@corescope> |
||
|
|
7179afcfde |
feat(#1228): reject geo-implausible neighbor-graph edges at build time (#1230)
Fixes #1228 — geo-implausible neighbor-graph edges are rejected at build time. Red commit: `5a6d9660` — failing tests for 4 cases (reject SF↔Berlin, accept local CA, accept no-GPS endpoint, counter increments). Live CI run (latest commit): https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1228 ## Why The disambiguator's tier-1 affinity graph is built blindly from path co-occurrence. On wide-geo MQTT deployments, a single bad hop disambiguation seeds an edge across geographically impossible distances (e.g. Bay Area ↔ Berlin), which then reinforces the same wrong resolution next time. Self-poisoning spiral. ## What changed - `upsertEdge` now consults a per-graph GPS index. When **both** endpoints have known GPS and their haversine distance exceeds the threshold, the edge is dropped and `NeighborGraph.RejectedEdgesGeoFar` (atomic) is incremented. - Either endpoint missing GPS ⇒ accept (no signal to reject), per acceptance criteria. - Threshold is configurable via `neighborGraph.maxEdgeKm` (default **500 km** — well above any plausible terrestrial LoRa hop, including satellite-assisted). 0 ⇒ use default; negative ⇒ disable the filter. Exposed via `Config.NeighborMaxEdgeKm()`. - New `BuildFromStoreWithOptions` carrying the threshold; `BuildFromStore` and `BuildFromStoreWithLog` are kept as thin wrappers. - Stats are surfaced under `GET /api/analytics/neighbor-graph` as `stats.rejected_edges_geo_far`. - All rejection logs PII-truncate pubkeys to 8 hex chars (public repo discipline). - `config.example.json` updated with the new field + comment. ## Follow-up #1229 (per-region scoped affinity graphs) depends on this landing first. --------- Co-authored-by: corescope-bot <bot@corescope.local> |
||
|
|
30ff45ad34 |
fix(#1220): collapse MESH LIVE mobile header into a single ~50px strip (#1223)
RED commit `c1a8cea` — E2E at 375x800 asserts MESH LIVE header is either ≤60px (collapsed) or ≥60px with a visible body. Fails on master with `height=118, bodyVisible=false, ctrlsVisible=false` — the empty-chrome middle state. CI for red commit: https://github.com/Kpa-clawbot/CoreScope/actions (will populate after push). ## Diagnosis On `(max-width: 768px)`, `#1180` collapses both `.live-header-body` and `.live-controls-body` to `display:none`. But `.live-controls` carries `flex: 0 0 100%` from the wide-viewport rule (introduced for `#1219` so the toggles wrap onto their own row below the title on tablet). On mobile, with the body hidden, that 100% basis still forces the gear button onto a full-width second row inside `#liveHeader`'s flex-wrap, ~60px tall — yielding the `~118-200px` empty panel the bug screenshot shows (the count badge + 📊 toggle on row 1, gear alone on row 2, nothing else). ## Fix — Option C Inside `@media (max-width: 768px)`, when `.live-controls.is-collapsed`: - drop `flex: 0 0 100%` → `flex: 0 0 auto; width: auto` so the gear inlines with the critical strip + 📊 toggle - when the header is also collapsed (`.is-collapsed:has(.live-controls.is-collapsed)`), zero the vertical padding so the strip hugs the 48px tap targets Result: collapsed mobile panel = single ~50px row, three icons inline. Expanded mobile = full toggle list (149px). Desktop unchanged (83px). Why Option C over A/B: a packet-watching mobile user keeps the map dominant and reaches for the gear when they want filters. The compact strip preserves both the WS-down red beacon (always visible) and the pkt count, with one-tap access to expand either body. Does not reintroduce #1204 (counter still attached to header) or #1205 (toggles still children of `#liveHeader`). Fixes #1220 --------- Co-authored-by: openclaw-bot <openclaw-bot@users.noreply.github.com> |
||
|
|
70855249c2 |
fix(#1224): channels page mobile UX overhaul (#1227)
## Summary RED test commit: `02652d0042b7cf65d1f9b3e96ce376bbb3064ba6` — CI: https://github.com/Kpa-clawbot/CoreScope/actions Mobile UX overhaul for the Channels page (#1224). At 375x800 the sidebar header was 112px tall (title + button stacked, analytics link + region filter each on their own row) and the channel-name column was clipped to 83px by the inline 📤 Share + ✕ Remove buttons. ## What changed - **Header is now ONE row**: title + region filter + `+ Add` chip + `📊` analytics overflow chip. Capped to ≤56px on mobile. - **`+ Add Channel` → `+ Add` chip** (no longer a full-width hero). Verified <65% of sidebar width. - **Analytics link** is an icon-only chip inside the header (was a full-row link below). - **Region filter** is inline inside the header (was its own row). - **Channel rows**: `.ch-item-name` takes `flex:1`, share button is icon-only (📤), remove button shrunk to 32px touch target. Name >150px on the first row. - **Empty state** is `max-height:30vh; padding:12px` on mobile — no longer dominates the viewport. ## Design decisions - Chose **inline chips** over an overflow `⋮` menu: header-level controls are few enough (4) that stacking pills + filter dropdown fits comfortably in 375px. Avoids the cost/complexity of a popover and matches the page's existing pill vocabulary (region filter). - Per-row share/remove kept inline but icon-only (`font-size:0` + `::before`) — preserves single-tap access without consuming the row. - Touch targets stay ≥32px (action chips) / 44px (other tappables); WCAG 2.5.5 spirit retained on the dominant interactive paths. - **Desktop layout (≥768px) is unchanged** — verified by a desktop guard in the E2E (`.ch-layout` flex-direction stays `row` at 1024px). ## Tests - `test-issue-1224-channels-mobile-ux-e2e.js` — 5 assertions at 375x800 + 1 desktop guard at 1024x800. Wired into CI. - Existing channel suites still pass: `test-channel-fluid-e2e.js` (11/11), `test-channel-issue-1087-e2e.js` (3/3), `test-channel-issue-1111-e2e.js` (2/2), `test-channel-modal-ux.js` (33/33), `test-channel-ux-followup.js` (29/29), `test-channel-sidebar-layout.js` + `test-channel-fluid-layout.js` (14/14). Fixes #1224 --------- Co-authored-by: clawbot <clawbot@users.noreply.github.com> |
||
|
|
24f277e5c6 |
fix(#1221): VCR LED clock in-row with controls and unclipped on mobile (#1222)
Red commit:
|
||
|
|
ab34d9fb65 |
fix(#1206): keep VCR bar from occluding the live packet feed (#1213)
Red commit: `bcfc74de` (CI: https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1206) Fixes #1206. ## Problem On Live Map the VCR (timeline/playback) bar overlays the bottom of the viewport. Bottom-pinned overlays — the live packet feed, the legend, any corner panel — used hard-coded `bottom: 58–88px` offsets that are smaller than the real bar height (two-row mobile layout + `env(safe-area-inset-bottom)` push it to ~80px and beyond). The last N packet-feed rows slid under the bar and became unreadable / unclickable. ## Fix Publish the bar's measured height as a CSS variable on the live page and bind every bottom-anchored overlay to it. - `public/live.js` — new `initVCRHeightTracker()` runs after init; uses `ResizeObserver` + `resize` / `visualViewport.resize` to keep `--vcr-bar-height` on `.live-page` in sync with `#vcrBar`. - `public/live.css` — `.live-feed`, `.feed-show-btn`, and the `.live-overlay[data-position="bl"|"br"]` corner slots now use `bottom: calc(var(--vcr-bar-height, 58px) + 10px)`. The feed's `max-height` is also capped against `100dvh - top - vcr - margin` so its scroll container can never extend past the bar. - Stale per-breakpoint overrides (the `@supports(env(safe-area-inset))` hard-coded `78px + safe-area` for feed/legend) are removed in favor of the single tracked variable. ## TDD - Red commit `bcfc74de` adds `test-issue-1206-vcr-overlap-e2e.js`: asserts `#liveFeed.getBoundingClientRect().bottom <= #vcrBar.top` (and same for the last row) at desktop 1280x800 and mid 720x800. Verified locally that reverting the green commit makes the feed-bottom assertions fail (feed bottom 742px > VCR top 721px) — see PR body for exact numbers from the local run. - Green commit `1ad17e7f` makes all 5 assertions pass. ## Browser verified Local Go server with `test-fixtures/e2e-fixture.db`, headless Chromium via the new E2E test — all 5 assertions green. ## E2E assertion added `test-issue-1206-vcr-overlap-e2e.js:84` (bottom-row vs VCR-top) plus container check at `:74`. --------- Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: clawbot <bot@corescope.local> |