Commit Graph

1985 Commits

Author SHA1 Message Date
Kpa-clawbot c1d94f7db5 fix(#1273): collapse QR overlay wrap to content height (#1277)
## Summary
Fixes #1273 — `.node-top-row .node-qr-wrap` was 2-3× taller than the QR
canvas inside it, leaving empty translucent space below the QR.

## Root cause
Three compounding issues:

1. **SVG intrinsic height not constrained.** `qrcode-generator` emits an
SVG with fixed `width`/`height` attributes (e.g. 147×147). The CSS rule
`.node-qr svg { max-width: 100px }` (and 72px mobile) constrains *width*
only, so the svg's intrinsic height (147px) is preserved and the wrap is
sized to that.
2. **Flex stretch.** `.node-top-row` is `display:flex` with default
`align-items:stretch`, so the QR column was forced to match the map
column's height (~280px) on desktop.
3. **Excess padding/margin** added another ~24px above and below the
visible QR.

## Fix
Three small CSS changes in `public/style.css`:

| change | effect |
|---|---|
| `.node-qr svg { height: auto; }` | svg height scales with constrained
width |
| `.node-top-row .node-qr-wrap { align-self: flex-start; }` | wrap sizes
to content, not column |
| `.node-top-row .node-qr-wrap { padding: 8px; }` + zero inner
`.node-qr` margin-top | tight hug |

## Measurements (real-data fixture, full node detail page)

| viewport | wrap.height before | wrap.height after | QR canvas |
|---|---|---|---|
| 375×800 (mobile overlay) | 165px | **82px** | 72×72 |
| 1280×800 (desktop side-by-side) | 217px | **154px** | 100×100 (+ 28px
caption) |

Overlay remains `position:absolute` top-right on mobile; the original
#1243 behavior is preserved.

## TDD
- **RED**: `test-issue-1273-qr-overlay-height-e2e.js` asserts wrap
height ≤ visible QR + caption + 32px at 375×800 and 1280×800. Failed on
master with deltas of 93px (mobile) and 89px (desktop).
- **GREEN**: both viewports pass after the CSS fix.

Wired into the deploy workflow alongside the other `test-issue-*-e2e.js`
runs.

## Acceptance checklist
- [x] Container height ≈ QR canvas height + 16-24px padding total
- [x] No empty translucent space below the QR
- [x] E2E asserts at 375×800 and 1280×800
- [x] Desktop layout unchanged (overlay position preserved; column no
longer stretches but the QR card is the same width)
- [x] All colors via CSS variables
- [x] #1243 overlay behavior preserved (still top-right on mobile, still
rendered)

## Commits
- `e9d75c92` test(#1273): RED
- `13899270` fix(#1273): collapse QR overlay wrap

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-18 22:51:29 -07:00
Kpa-clawbot 21b6eb0d63 fix(live legend): document ACK/RESPONSE/PATH + white-ring repeater convention (#1274) (#1276)
RED commit `ac1fb4c3` (Playwright E2E asserts legend rows for ACK /
RESPONSE / PATH text + "ring" + "repeater" — fails on master).
CI:
https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1274

## What
The Live legend rendered five packet-type rows but the codebase defines
eight `TYPE_COLORS`. The three gray-area types (ACK, RESPONSE, PATH) had
no swatch in the legend, leaving operators guessing what gray dots meant
— they're either ACKs or unknown payload types. Separately, the
L.circleMarker styling block uses a brighter white ring to mark
repeaters vs. all other roles; that convention was nowhere on screen.

## Changes
- `public/live.js` legend HTML — adds rows for RESPONSE, PATH and a
combined **Ack / Other** row (covering both ACK and the unknown-type
fallback that share `#6b7280`). Adds a new **MARKER STYLES** subsection
below NODE ROLES with two entries: bright white ring = repeater, faded
ring = other.
- `public/live.css` — adds `.live-ring` / `.live-ring--repeater` /
`.live-ring--other` swatches. Background uses `var(--text-muted)`; only
the white border + opacity differ between the two, matching the actual
circleMarker weights (1.5 / 0.5) and opacities (0.6 / 0.3).
- `test-issue-1274-legend-coverage-e2e.js` — Playwright E2E (desktop +
mobile attached-DOM) asserting all four new pieces.

## Notes
- All colors via `TYPE_COLORS` — no hardcoded hex in HTML.
- Legend is `display:none` at ≤640px (existing #279 behavior), so no
mobile CSS tweak required for the longer list.
- Does not touch the legend toggle (#1219), mobile single-row header
(#1234), or VCR visibility (#1269).

Fixes #1274.

---------

Co-authored-by: corescope-bot <bot@meshcore.local>
2026-05-18 22:51:26 -07:00
Kpa-clawbot 8bf7709970 feat(repeater): usefulness score — bridge axis (#672 axis 2 of 4) (#1275)
RED test commit: `fd661569` — CI will fail on this (stub returns empty
map; assertions fail by design). GREEN: `bf4b8592`.

## What

Implements **axis 2 of 4** for the repeater usefulness score per #672
([status
comment](https://github.com/Kpa-clawbot/CoreScope/issues/672#issuecomment-4484635378)).
The Bridge axis measures *structural importance*: how many shortest
paths between other nodes route through this one. A high-traffic
redundant node and a low-traffic critical bridge will no longer look
identical.

## Algorithm

**Brandes' weighted betweenness centrality** with Dijkstra for shortest
paths (`cmd/server/bridge_score.go`).

- Nodes: pubkeys in the `neighbor_edges` graph
- Edge weight: `Score(now) * Confidence()` — per the convention from
#1235 (count + recency decay scaled by observer-diversity confidence).
Geo-rejected edges already excluded at graph build time (#1230) so we
don't re-filter here.
- Dijkstra distance: `1 / max(epsilon, weight)` — high affinity = cheap
cost.
- Normalize: divide by max observed centrality so output is in `[0, 1]`.

Cost: `O(V · (E + V log V))`. Staging-scale (~600 nodes / ~2 000 edges)
≈ ~4.8M ops, completes in milliseconds.

## Where it lives

- `cmd/server/bridge_score.go` — pure algorithm, no locks
- `cmd/server/bridge_recomputer.go` — background recomputer (mirrors
#1240/#1262 pattern), 5-min default interval, initial sync prewarm,
snapshot stored in `s.bridgeScoreMap atomic.Pointer[map[string]float64]`
- `cmd/server/routes.go` — `handleNodes` adds `node["bridge_score"]` on
repeater/room rows; node-detail handler adds it on the single-node path
- `public/nodes.js` — separate **Bridge** row in the node detail panel,
alongside the existing **Usefulness** (Traffic) row. Distinct
colour-coded bar.

## What's NOT in this PR (still pending for #672)

- **Coverage axis** (axis 3) — unique observer-pair connectivity
- **Redundancy axis** (axis 4) — simulated node-removal impact
- **Composite** — once all 4 axes ship, swap the `usefulness_score`
formula from "traffic-only" to the weighted composite

`Refs #672` (not `Fixes` — issue stays open until all 4 axes + composite
ship).

## Tests

- `TestComputeBridgeScores_LineGraph` — 4-node line: middles non-zero,
leaves zero, max normalized to 1.0
- `TestComputeBridgeScores_TriangleNoBridge` — clique has zero bridges
- `TestComputeBridgeScores_Empty` — defensive nil-safety
- `TestComputeBridgeScores_WeightSensitive` — mutation guard: revert the
`1/w` inversion and this test fails
- `TestBridgeScore_HandleNodesSurface` — integration: `/api/nodes`
returns `bridge_score` on repeater rows; middle nodes > 0, ends == 0

---------

Co-authored-by: clawbot <bot@meshcore.local>
2026-05-18 22:51:23 -07:00
Kpa-clawbot c09fec56ff ci: update go-server-coverage.json [skip ci] 2026-05-19 01:42:02 +00:00
Kpa-clawbot 6dbfd331a6 ci: update go-ingestor-coverage.json [skip ci] 2026-05-19 01:42:01 +00:00
Kpa-clawbot a00e1c0e18 ci: update frontend-tests.json [skip ci] 2026-05-19 01:42:00 +00:00
Kpa-clawbot 763d4f707c ci: update frontend-coverage.json [skip ci] 2026-05-19 01:41:59 +00:00
Kpa-clawbot ad467daeeb ci: update e2e-tests.json [skip ci] 2026-05-19 01:41:57 +00:00
Kpa-clawbot 46ce9590f1 fix(#1270): Prefix Tool Network Overview shows configured-hash-size counts, not math-only slices (#1271)
Red commit: `6b68080c24106301b6bfc25f8a05484f07d0612d` (test added that
fails on master). CI: see Checks tab on this PR.

Fixes #1270.

## Problem

Two analytics surfaces told contradictory stories about prefix usage:

- **Prefix Tool → Network Overview** showed e.g. `168 / 65,536` for the
2-byte tier — a pure math fact: every repeater pubkey sliced to 2 bytes
yields N distinct values. Because collisions are rare, this number
always equals (or nearly equals) the repeater count, making it look like
the whole network uses 2-byte hashing.
- **Hash Stats → By Repeaters** showed configured-hash-size counts
straight from `/api/analytics/hash-sizes` `distributionByRepeaters` —
usually a minority on 2-byte and near-zero on 3-byte.

The Prefix Tool was presenting a math fact as if it were operational
truth.

## Fix

`renderPrefixTool` now also fetches `/api/analytics/hash-sizes` and
restructures each tier card into three labeled stats with explicit
hierarchy:

1. **Primary** — `X of Y repeaters configured` (from
`distributionByRepeaters`). Same source the Hash Stats tab uses, so the
two pages agree exactly.
2. **Operational collisions** — colliding slices among repeaters
configured for *this* hash size only (matches Hash Issues semantics).
3. **Theoretical** (secondary, smaller, dashed-rule footnote) — `X
unique N-byte slices across all repeater pubkeys (of Y possible)`. The
math fact is preserved as educational info, no longer impersonating
operational truth.

The "Total repeaters" card now also notes how many have a known
configured hash size.

The "About these numbers" footer was rewritten to explain the three
numbers and link to both Hash Stats and Hash Issues.

The prefix collision detector (Check / Generate panels) is unchanged —
it still scans every repeater pubkey because that is its job.

## Test

Added `#1270 Prefix Tool primary counts match Hash Stats By Repeaters`
to `test-e2e-playwright.js`. It fetches `/api/analytics/hash-sizes` for
the ground-truth `distributionByRepeaters`, then visits
`#/analytics?tab=prefix-tool`, opens Network Overview, and scrapes the
primary count via a new `data-pt-configured="<bytes>"`
`data-value="<count>"` marker on each tier card, asserting exact
equality for 1/2/3-byte.

- Red commit `6b68080c` (test only): fails on master with `NO
data-pt-configured marker`.
- Green commit `12ed2789` (fix): test passes; full E2E suite `123/126
passed, 3 skipped`.

## Acceptance

- [x] Prefix Tool Network Overview shows configured-hash-size repeater
counts as the primary number
- [x] "Unique slices" math is shown as secondary/educational
- [x] Two pages tell the same story (E2E asserts byte-equal match)
- [x] E2E asserts the configured-count matches what Hash-Sizes tab shows
at the same point in time
2026-05-18 18:20:29 -07:00
Kpa-clawbot 0022c8fd1f ci: update go-server-coverage.json [skip ci] 2026-05-18 22:43:47 +00:00
Kpa-clawbot 7827c8e778 ci: update go-ingestor-coverage.json [skip ci] 2026-05-18 22:43:46 +00:00
Kpa-clawbot cb0218fc4d ci: update frontend-tests.json [skip ci] 2026-05-18 22:43:46 +00:00
Kpa-clawbot 385f49b3d8 ci: update frontend-coverage.json [skip ci] 2026-05-18 22:43:45 +00:00
Kpa-clawbot f4ecc96ccc ci: update e2e-tests.json [skip ci] 2026-05-18 22:43:44 +00:00
Kpa-clawbot 78b666c248 fix(#1267): mobile VCR bar invisible — JS height clobbered bottom-nav reserve (#1269)
## Summary
Mobile-only regression: on the Live page at ≤768px viewports the VCR bar
was rendered behind the fixed bottom-nav and never visible to the user.
iOS Safari screenshot at 375x812 showed: top header strip, full-height
map, bottom-nav — **no VCR row at all**.

Fixes #1267.

## Root cause
`public/live.js` `initResizeHandler` (the existing JS height override)
was setting `page.style.height = window.innerHeight + 'px'`, which
clobbered the CSS rule that already subtracts `--bottom-nav-reserve`
from the live-page height. Because `.live-page` then spanned the full
viewport, the VCR bar (`position:absolute; bottom:0; z-index:1000`) was
painted underneath `.bottom-nav` (`position:fixed; z-index:1200`).

The VCR bar element WAS in the DOM, WAS `display: flex`, and HAD
`height: 53px` — it just sat at y=758..812 underneath the bottom-nav at
y=754..812. CSS-only checks for `display:none` would never catch this;
the test asserts the bar's bottom edge is at or above the bottom-nav's
top edge.

## Fix
One-liner in spirit: subtract the bottom-nav height before applying
`page.style.height`. The implementation measures the rendered
`.bottom-nav` (with a fallback to a hidden probe that resolves the
`--bottom-nav-reserve` token), so it survives safe-area inset and the
bottom-nav's 1px border.

```js
const reserve = /* measure .bottom-nav, fall back to --bottom-nav-reserve token */;
const h = Math.max(0, window.innerHeight - reserve);
```

Desktop is unchanged: `.bottom-nav` is `display: none`, the probe
resolves to 0, and `h === window.innerHeight` exactly as before.

## TDD
- **RED** (commit 1): `test-e2e-1267-mobile-vcr.js` — Playwright at
iPhone 375x812 asserts `.vcr-bar` has `display !== 'none'`, `visibility
!== 'hidden'`, `height > 0`, `top < viewport.height`, and (the key
check) `bottom <= bottom-nav.top`. Fails on `master` with: *"VCR bar
bottom 812 overlaps bottom-nav top 754"*.
- **GREEN** (commit 2): the fix above. Test passes: *"VCR bar bottom 754
≤ bottom-nav top 754"*.

## Verification
-  Mobile (375x812) repro reproduced against `master` (bar at
y=758..812, behind bottom-nav)
-  Mobile (375x812) E2E green after fix (bar at y=700..754, flush above
bottom-nav)
-  Desktop (1440x900) unaffected — bottom-nav hidden, page height =
viewport height as before, VCR bar at viewport bottom
-  #1234 (top-nav hidden on /live), #1246 (single-row VCR), #1206/#1213
(VCR/feed clearance) unchanged — none touched

## Files
- `public/live.js` — single function (`initResizeHandler`) modified
- `test-e2e-1267-mobile-vcr.js` — new mobile-viewport Playwright
regression test

Run: `BASE_URL=http://localhost:13581 node test-e2e-1267-mobile-vcr.js`

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-18 15:27:05 -07:00
Kpa-clawbot d4d569278d ci: update go-server-coverage.json [skip ci] 2026-05-18 19:46:11 +00:00
Kpa-clawbot ee21eafa66 ci: update go-ingestor-coverage.json [skip ci] 2026-05-18 19:46:10 +00:00
Kpa-clawbot c6a90e9896 ci: update frontend-tests.json [skip ci] 2026-05-18 19:46:09 +00:00
Kpa-clawbot 92518ab234 ci: update frontend-coverage.json [skip ci] 2026-05-18 19:46:08 +00:00
Kpa-clawbot cd84f51f8a ci: update e2e-tests.json [skip ci] 2026-05-18 19:46:08 +00:00
Kpa-clawbot 4cd8445233 perf(#1265): wire /api/observers/clock-skew + /api/nodes/clock-skew into analytics recomputer (#1266)
RED: 97f49a0c · CI:
https://github.com/Kpa-clawbot/CoreScope/actions/runs/26046530920

Fixes #1265.

## Problem
On staging two clock-skew endpoints serve compute-on-request:

- `/api/observers/clock-skew` — 3.3s
- `/api/nodes/clock-skew` — 8.9s

Both drive a full `clockSkew.Recompute` over 100k+ adverts while holding
`s.mu.RLock`, blocking under concurrent reader load.

## Fix
Wire both endpoints into the established `analytics_recomputer.go`
pattern (PRs #1248 / #1259 / #1263). Two new slots:

- `recompObserversClockSkew` — wraps `computeObserverCalibrations()`
- `recompNodesClockSkew` — wraps `computeFleetClockSkew()`

Accessors `GetObserverCalibrations` / `GetFleetClockSkew` now prefer the
atomic-pointer snapshot; on-request compute is fallback-only for the
brief window before initial sync compute lands (and for tests that skip
the recomputer).

Default interval **300s**, overridable via:

```json
"analytics": {
  "recomputeIntervalSeconds": {
    "observersClockSkew": 300,
    "nodesClockSkew": 300
  }
}
```

`config.example.json` + the `_comment_analytics` doc updated.

## TDD
- RED `97f49a0c` — `TestClockSkewRecomputersRegistered` +
`TestClockSkewHandlersSteadyStateLatency` (8 concurrent readers × 25
reqs per endpoint, p99 < 100ms gate). Fails on master: recomputer slots
nil.
- GREEN `19599375` — wire + accessor switch. p99 well under 5ms on the
test fixture.

## Verification
```
cd cmd/server && go test ./... -count=1   # ok 42s
bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master   # all gates pass
```

---------

Co-authored-by: CoreScope Bot <bot@corescope.local>
2026-05-18 12:27:44 -07:00
Kpa-clawbot ae17a2be12 perf(#1262): /api/nodes?limit=2000 cold-miss 15.7s → <100ms — prewarm repeater enrichment cache (#1263)
RED commit: `22ce5736066142583017cad7303fa48d9e00ccf0` — CI on red:
https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1262

## Problem
After #1260 added a 15s-TTL bulk cache for repeater enrichment in
`handleNodes`,
`/api/nodes` (default limit) dropped to ~500ms. But
`/api/nodes?limit=2000` —
called by `public/live.js` at SPA startup for hop resolution — still
took
**15.7s cold** on staging (75k tx, 600 nodes). Warm hits were ~40ms.

Root cause: the bulk cache was lazily populated on the first request
after
TTL expiry. The rebuild ran on the request-serving goroutine. Every cold
SPA
load triggered the rebuild and ate 15s.

## Fix
Add `StartRepeaterEnrichmentRecomputer` — a steady-state background
recomputer that mirrors the `analytics_recomputer.go` pattern from
#1240:

- **Prewarm**: initial synchronous compute on Start so the first request
  hits a populated cache.
- **Steady-state**: ticker refreshes the snapshot every 5min
(configurable
  via the existing analytics recompute interval knob).
- **Panic-safe** + idempotent Start.

Wired into `main.go` right after `StartAnalyticsRecomputers`, using
`cfg.GetHealthThresholds().RelayActiveHours` as the window.

## Test
`TestHandleNodesLimit2000ColdMiss` — seeds 600 nodes + 150k non-advert
tx with repeaters indexed under a shared 1-byte hop prefix (matches
production hop-prefix collisions), starts the recomputer, then issues
`/api/nodes?limit=2000` with **no HTTP warmup**.

| State | Latency |
|---|---|
| Before (master, on-thread rebuild) | 3.37s |
| After (prewarm + steady-state) | 56ms |
| Budget | 2s |

Staging end-to-end: 15.7s → expected sub-100ms on the same call path.

Red commit (`22ce5736066142583017cad7303fa48d9e00ccf0`) compiles with a
no-op stub of the new method so the
test fails on the latency **assertion**, not a missing symbol.

Fixes #1262

---------

Co-authored-by: corescope-bot <bot@corescope.local>
2026-05-18 09:22:27 -07:00
Kpa-clawbot 094a96bd6c perf(#1258): /#/perf — parallel health fetch, sort endpoints, pause refresh while hidden (#1261)
Fixes #1258 — Perf dashboard (/#/perf) was slow because of three
frontend issues; backend APIs were never the problem.

## Findings

1. **`/api/health` fetched sequentially after `Promise.all`** in
`refresh()` — added a full RTT (~50-200ms) on every 5s tick on top of
the parallel batch.
2. **Endpoints table not actually sorted** despite the heading "sorted
by total time". JSON shape is `map[string]EndpointStatsResp` (no defined
order); frontend rendered map iteration order. Visible correctness bug
surfaced during investigation.
3. **`setInterval(refresh, 5000)` kept firing while tab was hidden**,
rebuilding the entire ~10-section `innerHTML` (cards + 3 tables) in the
background. On tab return the user saw a backlog thrash + felt the page
was "slow to render".

## Fix (`public/perf.js`)

- Move `/api/health` into the same `Promise.all` as the other 4
endpoints — saves one RTT per refresh.
- Sort `Object.entries(server.endpoints)` by `count * avgMs` DESC
client-side.
- Add `document.hidden` guard in the interval tick + `visibilitychange`
listener that refreshes once on return; `destroy()` removes the
listener.

## Tests

`test-perf-render-1258.js` (new):
- All 5 initial fetches issued in parallel (including `/api/health`)
- Refresh suppressed while `document.hidden`
- Endpoints table sorted by total time DESC, regardless of input map
order

RED commit first (`6b54f9e8`, 0/3 pass) → GREEN commit (`be81303b`, 3/3
pass). Existing `test-perf-go-runtime.js` (13/13) and
`test-perf-disk-io-1120.js` (15/15) still green.

## Investigation exemption

No Playwright timing test — sandbox can't run a real browser. Static
analysis + render-shape unit tests cover the three identified
bottlenecks. Documented per AGENTS "investigation surfaces" exemption.

## Measurement

Before: refresh = parallel batch (~max(server-side)) + sequential
`/api/health` (~50ms) + full innerHTML rebuild every 5s including hidden
tabs.
After: refresh = single parallel batch, runs only while visible.
Expected improvement on tab-return ≈ -1 RTT per refresh + zero
background work.

---------

Co-authored-by: corescope-bot <bot@corescope.local>
2026-05-18 08:02:27 -07:00
Kpa-clawbot 1efe93d7f6 perf(#1257): bulk-cache repeater enrichment in /api/nodes — 32s → <500ms (#1260)
RED commit `a2879e12` — perf regression test; CI run: see Actions tab.

Fixes #1257.

## Root cause

`handleNodes` looped over the response page and called
`store.GetRepeaterRelayInfo(pk, win)` +
`store.GetRepeaterUsefulnessScore(pk)` for every repeater/room. Each
call:

- grabbed its own `s.mu.RLock`,
- walked `byPathHop[pk]` (+ the matching 1-byte raw-prefix bucket, which
on busy networks fans out to nearly the entire non-advert tx set),
- and re-parsed every `tx.FirstSeen` with `parseRelayTS`.

Default page is the 50 most-recently-seen nodes — almost all hot
repeaters — so the request did O(50) lock acquisitions and hundreds of
thousands of timestamp parses on the same set of txs. That's the classic
load-then-paginate / per-row N+1 shape called out in the issue (same
family as #1226).

The `?limit=2000` variant looks faster relatively only because per-node
enrichment dwarfs serialization; on staging both still bottleneck on the
same loop.

## Fix

Two new bulk methods on `PacketStore`:

- `GetRepeaterRelayInfoMap(windowHours)` → `pubkey → RepeaterRelayInfo`
- `GetRepeaterUsefulnessScoreMap()` → `pubkey → 0..1`

Both snapshot `byPathHop` under a single `RLock`, pre-parse each
`FirstSeen` exactly once (a tx that appears in N hop buckets used to be
parsed N times), and emit one entry per hop key. Cached 15s — same TTL
as `GetNodeHashSizeInfo` / `GetMultiByteCapMap`, same status-column
freshness budget.

`handleNodes` is one map-lookup per node; behavior, output schema, and
`RelayActive` / `RelayCount{1h,24h}` / `LastRelayed` /
`usefulness_score` semantics are preserved.

## Why no `limit` default change

The issue mentioned a default-limit knob. Investigated: `queryInt(r,
"limit", 50)` already defaults to 50 — frontends calling `/api/nodes`
(no limit) get a 50-row page today. Capping further would change
behavior (live.js already passes `?limit=2000` when it wants more); the
cost was per-repeater enrichment, not page size. Fixing the N+1 is the
correct lever and preserves backward compat.

## Perf

Regression test `TestHandleNodesPerfLargeFleet` (600 nodes, 150k
non-advert tx, repeaters indexed under `byPathHop`):

| | elapsed | vs 2s budget |
|---|---|---|
| before (master) | 4.72s | ✗ |
| after | ~4ms | ✓ (~1000×) |

## TDD

- RED: `a2879e12` — test fails at 4.72s on master.
- GREEN: `c529d29a` — fix; full `cmd/server` + `cmd/ingestor` suites
green.

---------

Co-authored-by: corescope-bot <bot@corescope>
2026-05-18 07:36:33 -07:00
Kpa-clawbot f81ed5b3cf perf(#1256): wire /api/analytics/roles into steady-state recomputer (#1259)
RED commit: `0190466d` — failing CI:
https://github.com/Kpa-clawbot/CoreScope/actions (will populate after PR
creation)

## Problem
On staging (commit `d69d9fb`, 78k tx, 2.3M obs), `curl
http://localhost/api/analytics/roles` times out at 60s with 0 bytes —
the Roles tab is unusable. Issue #1256.

PR #1248's steady-state recomputer fan-out (topology / rf / distance /
channels / hash-collisions / hash-sizes) **didn't include roles**. The
legacy handler:

1. Holds `s.mu.RLock` for the entire compute.
2. Calls `GetFleetClockSkew()`, which drives `clockSkew.Recompute(s)`
over all ADVERT transmissions — O(78k) per request.
3. Concurrent ingest writers compound the latency through
writer-starvation.

Result: every request hits the cold path; the response never comes back
inside the 60 s HTTP budget.

## Fix
Add `roles` as the 7th endpoint in the recomputer fan-out — same pattern
as #1248:

- `PacketStore.recompRoles` slot, registered in
`StartAnalyticsRecomputers` with default 5-min interval.
- `PacketStore.GetAnalyticsRoles()` → atomic-pointer load from the
snapshot (sub-ms), with a `computeAnalyticsRoles()` fallback only for
the brief startup window before the initial sync compute completes.
- Handler is now a thin wrapper — no lock-held work on the request path.
- New optional `roles` key under `analytics.recomputeIntervalSeconds` in
config; `config.example.json` and `_comment_analytics` updated.

## Latency (unit-scope benchmark)
- Worst-of-50 handler latency: **<100 ms** (test budget; well under the
2 s p99 acceptance).
- Compute itself is bounded by the existing 5-min recompute window — it
runs once in the background, never on the request path.

## Tests
- RED `0190466d`: asserts `recompRoles` is registered and the handler
returns under the latency budget. Fails on master with `recompRoles not
registered`.
- GREEN `d7784f76`: registers the recomputer + snapshot accessor — both
tests pass.

Fixes #1256

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-18 07:36:28 -07:00
Kpa-clawbot d69d9fbf8e perf(#1247): surgical fix for resolveWithContext tier-1 hot path (4.6× speedup) (#1253)
## Summary
Surgical fix for #1247: analytics endpoints regressed 3-9× between prod
`d818527` and master. pprof against staging traced the regression to
`resolveWithContext` tier-1 affinity loop running on every analytics
`resolveHop` call (post-#1198 plumbing) with redundant per-(cand, ctx)
work.

**Result: 4.6× speedup on the synthetic hot-shape benchmark (202µs →
44µs / op).**

## Root cause
- PR #1198 (`353c5264`) lit up `resolveWithContext` tier 1 from every
analytics resolveHop closure (previously they passed
`contextPubkeys=nil` and short-circuited the entire tier-1 block).
- The inner loop did `N_cand × N_ctx` iterations where each one did:
- `graph.Neighbors(strings.ToLower(ctxPK))` — graph RLock + ToLower
allocation **per candidate**, redundantly
  - `strings.ToLower(cand.PublicKey)` per `ctxPK`
- `strings.EqualFold(otherPK, ctxPK)` + `EqualFold(otherPK, candPK)` —
both sides were already lowercased (`NeighborEdge.NodeA/B` via
`makeEdgeKey`; `contextPubkeys` via `buildHopContextPubkeys`)
- At staging scale (5k+ contextPubkeys × 30k+ resolveHop calls) this
dominated `computeAnalyticsTopology` (37% of its CPU) and
`computeAnalyticsRF` (55%).

## pprof attribution (staging, region-keyed queries bypassing #1240
cache)
```
computeAnalyticsTopology cum: 19.24%  (5.45s / 28.32s sampled)
  └─ resolveWithContext      37%
     ├─ strings.ToLower      41%
     ├─ strings.EqualFold    28%
     └─ graph.Neighbors      24%
computeAnalyticsRF cum: 10.38%
```

## Fix (~80 LoC in `cmd/server/store.go`)
1. Lowercase `contextPubkeys` **once per call**, skipped entirely when
already lowercased (the analytics fast path).
2. Lowercase candidate pubkeys **once per call**.
3. Invert the loop nesting: outer-ctx / inner-edge / candidate-map
lookup. `graph.Neighbors` is called once per context pubkey instead of
`N_cand` times.
4. Raw `==` instead of `strings.EqualFold` for pubkey comparisons (both
sides lowercased by step 1/2).
5. Added a tiny `hasUpperASCII` byte-loop helper next to `isHexLower`
for the fast-path check.

Behavior preserved: same `Score × Confidence` formula, same tier-1 ratio
+ min-observations gate, same per-candidate "best edge wins" semantics.
No change to tiers 2/3/4.

## TDD evidence
- Red commit (`5f8d1564`): `TestResolveWithContextTier1Floor` asserts
`<100 µs/call` on the hot shape. **199 µs/call on regressed master →
FAIL.**
- Green commit (`e3bdbc65`): surgical fix lands. **44 µs/call → PASS.**
- Reverification: locally stashed the fix, ran the test → 199.5 µs FAIL;
popped fix → 44 µs PASS.

`BenchmarkResolveWithContextTier1Hot` (no assertion, visibility only):
```
before: 202013 ns/op   168 B/op   3 allocs/op
after:   44084 ns/op   424 B/op   6 allocs/op
speedup: 4.6×
```
(Post-fix allocs are O(N_cand + N_ctx) one-time helper tables — net win
at hot scale.)

## Independence from #1248
PR #1248 caches the analytics compute output so user-facing latency is
sub-ms even when the compute is slow. That's correct for UX but it masks
the regression. This PR repairs the compute itself, so:
- Region-keyed and windowed queries (which bypass the recomputer cache
by design — see #1240) become fast again.
- Future ingest scale or feature work on top of the regressed baseline
doesn't compound.

## Out of scope
- The geo-rejection (#1228) and Confidence weighting (#1229) commits —
kept intact, they protect correctness and were not the dominant CPU
cost.
- Reverting any suspect commit — surgical only.

## Acceptance criteria from #1247
- [x] pprof confirms the hot function (`resolveWithContext`)
- [x] Bisect identifies the regressing commit (`353c5264` / PR #1198 —
context plumbing; ratified by pprof, no need to actually rebuild 5
binaries)
- [x] Fix lands; tier-1 hot path 4.6× faster
- [x] No regression in disambiguator correctness — full `go test ./...`
green, all existing `ResolveWithContext` / `HopDisambig` /
`NeighborGraph` / `Affinity` tests pass

Fixes #1247

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-17 16:42:01 -07:00
Kpa-clawbot 1d33ac53b0 fix(#1254): trim .badge-iata h-padding on mobile to clear 1.25px clip (#1255)
Fixes #1254.

Master CI Playwright fail-fast on every push since #1252:

```
 Mobile viewport (375px): observer IATA badge stays visible — not clipped:
   .badge-iata right edge 376.25 exceeds 375px viewport
```

## Root cause

After #1252 unhid `.col-observer` at narrow widths so the IATA pill from
#1188 renders on mobile, at 375px the cell padding + truncated observer
name (10 chars in grouped rows) + `.badge-iata` pill (`padding: 1px 5px`
+ `margin-left: 4px`) sums to ~376.25px — overflowing the viewport by
1.25px.

Same class of failure as #1250/#1251 (VCR LCD-clip).

## Fix

`public/style.css` — inside the existing `@media (max-width: 640px)`
block, shrink `.badge-iata` `padding: 1px 5px → 1px 3px` and
`margin-left: 4px → 2px`. Reclaims ~6px horizontally, well clear of the
1.25px overflow. Desktop (≥641px) styling untouched.

## TDD

The failing E2E sub-test in `test-observer-iata-1188-e2e.js` (added in
#1189 R1) IS the red. Mutation verified locally:

| Variant            | Result |
|--------------------|--------|
| WITHOUT this fix |  `.badge-iata right edge 376.25 exceeds 375px
viewport` |
| WITH this fix      |  all 3 sub-tests pass |

## Local verification

```
$ go build -o /tmp/corescope-server ./cmd/server
$ /tmp/corescope-server -port 13581 -db test-fixtures/e2e-fixture.db -public public &
$ CHROMIUM_PATH=/usr/bin/chromium BASE_URL=http://localhost:13581 \
    node test-observer-iata-1188-e2e.js
Running observer-IATA E2E tests against http://localhost:13581
   Packets table renders an IATA badge in an observer cell
   Filter grammar: observer_iata == "<code>" narrows the table
   Mobile viewport (375px): observer IATA badge stays visible — not clipped
All observer-IATA E2E tests passed.
```

## Constraints honored

- All colors via existing CSS variables (no theming illusions; only
  `padding` / `margin-left` change inside `@media (max-width: 640px)`).
- No JS changes.
- Desktop badge display unaffected (selector scoped to narrow viewport).
- `config.example.json`: no config field added.
- PII preflight: clean.

Co-authored-by: OpenClaw Bot <bot@openclaw.local>
2026-05-17 16:26:51 -07:00
Kpa-clawbot 43203b09b7 fix(#1249): IATA badge missing on fixture + mobile clipping (#1252)
Failing test commit: `bdb4eefb` (added in #1189 R1) — original CI
failure:
https://github.com/Kpa-clawbot/CoreScope/actions/runs/25995819598

Fixes #1249.

## Root cause

Two independent bugs surfaced by the same E2E test:

1. **Fixture join broken.** `scripts/capture-fixture.sh` wrote the text
observer hash into `observations.observer_idx`, but the v3 join in
`cmd/server` is `observers.rowid = observations.observer_idx`. The join
silently nulled out `observer_id` / `observer_iata` for every packet.

2. **Mobile clipping.** `.col-observer` had `data-priority=3` (hides at
≤1024px) and was in the narrow-viewport `defaultHidden` list, so at
375px the cell collapsed to `display:none` and `.badge-iata` had a 0×0
box.

## Changes

- `test-fixtures/e2e-fixture.db`: remap `observer_idx` text hash →
integer rowid (500/500 rows resolved).
- `scripts/capture-fixture.sh`: build an `observer_id → rowid` map
before insert; skip rows whose observer isn't in the fixture. Comment
explains the trap.
- `public/packets.js`: bump `.col-observer` priority `3 → 1` and drop
`observer` from narrow-viewport `defaultHidden`.

## Verification

All three sub-tests in `test-observer-iata-1188-e2e.js` pass locally
against the freshened fixture. `curl /api/packets?limit=5` returns real
IATA codes (OAK / MRY / SFO) instead of empty strings.

Co-authored-by: OpenClaw Bot <bot@openclaw.local>
2026-05-17 20:06:25 +00:00
Kpa-clawbot 45872c8371 fix(#1250): trim mobile VCR bar h-padding 8px→4px to clear 0.83px LCD clip (#1251)
Red: master CI run
https://github.com/Kpa-clawbot/CoreScope/actions/runs/25995768081
already fails on `test-e2e-playwright.js` `#1221 LCD clipped on right
(right=375.828125, vw=375)`. No new test commit — the existing E2E
assertion is the gate.

**Root cause.** PR #1222's mobile rule set `.vcr-bar { padding: 4px 8px
}`. The flex row holds three `flex-shrink: 0` children (controls +
scope-btns + lcd) and one `flex: 1 1 0` absorber
(`.vcr-timeline-container`, `min-width: 40px`). At 375px viewport the
absorber hits its floor, so the intrinsic widths of the shrink-frozen
children spill 0.83px past the padding box.

**Fix.** Drop horizontal padding 8px → 4px inside the `@media
(max-width: 640px)` block. That's 8px of new slack — order of magnitude
above the 0.83px clip — keeping LCD's `getBoundingClientRect().right ≤
375`. Desktop layout untouched (rule is mobile-scoped). VCR/feed overlap
(#1206/#1213) not reintroduced because `--vcr-bar-height` is JS-measured
by the ResizeObserver, not pinned in CSS.

Fixes #1250

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-17 12:58:27 -07:00
Kpa-clawbot 356f001027 perf(#1240): steady-state background recompute for analytics endpoints (#1248)
RED commit: `27630f6a` — adds latency test that fails on master
(p99=225ms > 50ms budget) and a stub `StartAnalyticsRecomputers` that
returns a no-op so the assertion (not a build error) gates the change.

GREEN commit: `20fbbceb` — wires real background recompute
infrastructure. Test passes at p99=~1µs.

## What changed

Replaces the on-request "compute-then-cache" pattern for the
default-shape analytics queries with a steady-state background recompute
loop. Reads always hit an `atomic.Value` snapshot in <1µs regardless of
compute cost or writer contention. Operator principle: serving slightly
stale data quickly beats real-time data slowly.

## Endpoints converted (default 5min interval each)

| Endpoint | Cold compute | Recomputer interval |
|---|---|---|
| `/api/analytics/topology` | ~5s | 5 min |
| `/api/analytics/rf` | ~4s | 5 min |
| `/api/analytics/distance` | ~3s | 5 min |
| `/api/analytics/channels` | ~0.5s | 5 min |
| `/api/analytics/hash-collisions` | ~0.5s | 5 min |
| `/api/analytics/hash-sizes` | ~22ms | 5 min |

All intervals configurable per-endpoint via
`analytics.recomputeIntervalSeconds.<name>` in `config.json`; documented
in `config.example.json`. Default override via
`analytics.defaultIntervalSeconds`.

## Scope: default query only

Only the canonical shape `(region="", window=zero)` is precomputed.
Region- or window-filtered requests fall back to the legacy TTL cache +
on-request compute — keeps recomputer count bounded (6, not 6×N×M).

## Latency

Test `TestAnalyticsRecomputerSteadyStateLatency`: 100 concurrent readers
+ 4 writers churning `s.mu.Lock` on 20k distHops.
- Before: p50=188ms p99=225ms (assertion failed)
- After:  p50=240ns p99=1.1µs (atomic load + map return)

## Shutdown integration

`StartAnalyticsRecomputers` returns a stop closure invoked from
`main.go`'s SIGTERM handler BEFORE `dbClose()` so any in-flight SQLite
compute drains cleanly. `TestAnalyticsRecomputerShutdownNoLeak` confirms
all 6 goroutines are reaped (Δ=6 within 2s).

## Safety details

- Initial compute is synchronous in `Start()` — first read after startup
never sees nil.
- `recover()` inside `runOnce` keeps a compute panic from killing the
goroutine; previous snapshot remains valid.
- `analyticsRecomputerMu` is a sync.RWMutex; recomputer pointers are
read-locked in the hot path. The atomic.Value swap inside `runOnce` is
lock-free.

Fixes #1240.

---------

Co-authored-by: OpenClaw Bot <bot@openclaw.local>
2026-05-17 17:33:30 +00:00
Kpa-clawbot b881a09f02 feat(#1188): show observer IATA on packets + filter grammar (#1189)
Red commit: 4ed272761b (CI run:
https://github.com/Kpa-clawbot/CoreScope/actions/runs/25651898290)

Fixes #1188 — observer IATA on packets in three UI surfaces + filter
grammar.

cross-stack: justified — feature spans API shape (Go), store, filter
grammar (JS), three packets UI surfaces.

## Scope shipped
- Packets table row: `.badge-iata` pill inline next to observer name
- Expanded observation rows: per-observation IATA badge
- Detail pane: Observer dd + per-observation list both render the badge
- Filter grammar: `observer_iata` field + `iata` alias;
`==`/`!=`/`contains`, plus a new `in (a, b, c)` list operator. Both
names appear in autocomplete with descriptions.

## TDD red→green pairs
1. `271d72f` filter-grammar tests → `2c182eb` evaluator + suggest
entries
2. `4ed2727` backend `observer_iata` API tests → `7856914` SQL join +
struct/store wiring
3. `0e09371` display E2E → `7a3f45d` packets.js + style.css badge
(E2E swapped for string-contract unit test in `ee414b4` — fixture
`observations.observer_idx` stores text pubkeys, blocking the join the
badge depends on)

## Backend
- `cmd/server/db.go`: SELECT `obs.iata AS observer_iata` in
`transmissionBaseSQL`, grouped query, observations-by-transmissions
- `cmd/server/store.go`: `ObserverIATA` on `StoreTx`/`StoreObs`, load
via all three ingest paths, surface in
`txToMap`/`enrichObs`/`groupedTxsToPage`
- `cmd/server/types.go`: field added to
`TransmissionResp`/`ObservationResp`/`GroupedPacketResp`
- Test fixture schemas declare `iata` on observers

## Perf
Per #383, `obsIataBadge(packet)` reads `packet.observer_iata` directly
(server-joined). Falls back to `observerMap.get(id).iata` only if absent
— hot row-render loop avoids per-row Map lookup on fresh data.

## Display rules
Missing IATA: nothing inline (Region column still shows `—`). No new hex
— `.badge-iata` uses `var(--nav-bg)` / `var(--nav-text)`.

E2E assertion added: test-observer-iata-1188.js:51

---------

Co-authored-by: OpenClaw Bot <bot@openclaw.dev>
Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-17 16:13:11 +00:00
Kpa-clawbot e395c471ed fix(#1244): live mobile VCR single row + disable orphan gesture-hint pills on /live (#1246)
Red commit: 58b307228e (CI run pending;
URL added after first workflow run posts).

Fixes #1244

## Sub-issue A — VCR controls still 2 rows on mobile
`public/live.css` mobile `@media (max-width:640px)` block had
`flex-wrap: wrap` plus `.vcr-timeline-container { width:100%; flex:none
}`, which guaranteed a 2-row layout (controls + LCD on row 1, scope
buttons + scrubber on row 2) — the exact bug #1234 was supposed to
eliminate.

Fix: switched `.vcr-bar` to `flex-wrap: nowrap`, gave
`.vcr-timeline-container` `flex: 1 1 0` so it absorbs leftover width,
and shrunk `.vcr-btn` / `.vcr-scope-btn` to a 32px touch target (still
WCAG 2.5.5 AA). Reorder on mobile: controls → scopes → timeline → LCD,
single row. `.vcr-mode` stays hidden on mobile as before (and `.vcr-lcd`
no longer needs `margin-left:auto` because the timeline pushes it right
via flex-grow).

## Sub-issue B — Orphan "Got it" hint pills hidden below the fold
`public/gesture-hints.js` row-swipe relevance included `/live`, and the
pills are bottom-anchored — so they rendered under the
absolute-positioned VCR bar + safe-area inset and were only findable by
scrolling.

Picked **option (a)** from the issue (simplest, matches user's report):
all four hints now early-return on `/#/live*`. Swipe-nav discoverability
doesn't apply on Live — map drag, VCR controls, and feed own the touch
surface.

## TDD
- RED `test-issue-1244-live-vcr-row-hints-e2e.js`: asserts at 375x800
(A) `.vcr-bar` children share a row (≤8px top spread OR
`flex-wrap:nowrap`), (B) zero `.gesture-hint` elements on `/live`.
Desktop sanity asserts LCD/controls still share a row.
- GREEN: the two source fixes.

E2E assertion added: `test-issue-1244-live-vcr-row-hints-e2e.js:67`
(single-row), `:101` (no hints). Wired into
`.github/workflows/deploy.yml` `e2e-test` job.

Browser verified: pending CI on Playwright fixture run (local Playwright
unavailable on this ARM host).

Desktop layout untouched — every mobile rule lives under `@media
(max-width:640px)`; existing #1221 + #1234 desktop assertions still
apply.

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-17 16:10:53 +00:00
Kpa-clawbot 74685ac82f fix(#1243): node detail mobile QR overlays map semi-transparently (#1245)
RED commit `fc9b619a` — CI:
https://github.com/Kpa-clawbot/CoreScope/actions

Fixes #1243.

## Problem
On `#/nodes/<pubkey>` at 375×800, the QR code rendered as a separate
~250px-tall panel below the map. Desktop already overlays the QR
semi-transparently via `.node-map-qr-overlay` for the compact view.

## Fix
Extend the mobile breakpoint (`@media (max-width: 640px)`) so the
full-screen `.node-top-row` mirrors the desktop overlay pattern:

- `.node-top-row` → `position: relative`; map wrap expands to 100%
- `.node-qr-wrap` → `position: absolute; bottom/right: 8px; z-index:
400`
- Semi-transparent background (`rgba(255,255,255,0.85)` light / `0.4`
dark)
- Caption hidden in overlay (already shown above)

Desktop (≥768px) flex layout untouched.

## TDD
- RED `fc9b619a` — E2E at 375×800 asserts QR is `position:
absolute|fixed`, overlaps map rect, and bg alpha < 1.
- GREEN `ded978c0` — CSS adds overlay rule.

## Verification
Preflight clean. Desktop layout unaffected — change is scoped inside
`@media (max-width: 640px)`.

## Files
- `public/style.css` (+29)
- `test-e2e-playwright.js` (+57)

---------

Co-authored-by: clawbot <clawbot@local>
2026-05-17 16:03:55 +00:00
Kpa-clawbot 2754251a53 perf(#1239): /api/analytics/distance — TTL 15s→60s + drop main RLock around compute (#1241)
## Summary
Fixes #1239 — `/api/analytics/distance` 15s cold on staging under heavy
ingest. Two independent fixes.

First commit on this branch is the RED test for Fix B (`a539882`),
demonstrating reader/writer contention against the main store lock. CI:
see Actions tab for the run on the test-only commit — it asserts >150µs
avg writer cycle and fails at 82367µs pre-fix. GREEN commit (`d3938f1`)
brings it to 1µs.

## Fix A — TTL bump 15s → 60s (`5eae1e0`)
- `rfCacheTTL` default in `cmd/server/store.go` changed from `15 *
time.Second` to `60 * time.Second`. This is the shared TTL for RF /
topology / distance / hash-sizes / subpath / channel analytics caches.
- Per operator clarification (issue thread): distance analytics IS
viewed live during analysis sessions, not background-glanced. 60s
smooths the cold-miss churn during heavy ingest without freezing data.
- `config.example.json`: documented `cacheTTL.analyticsRF` with new
default + caveat.
- Existing assertions (`TestCacheTTLDefaults`,
`TestHashCollisionsCacheTTL`) updated to the new default.

## Fix B — Drop main RLock around compute (`a539882` red, `d3938f1`
green)
`computeAnalyticsDistance` previously held `s.mu.RLock()` for the entire
iteration: region match-set construction, hop/path filtering, sort,
dedup, histogram, category stats, time series. Readers serialized
writers (ingest, `buildDistanceIndex`).

Refactor: hold the RLock only long enough to snapshot the
`distHops`/`distPaths` slice headers AND build the region match-set
(which reads `tx.Observations`, mutated under `s.mu.Lock`). For
`region=""` (the hot cold-call path) the lock hold is just the header
snapshot — microseconds. Everything else runs on the locally-captured
slices outside the lock.

Safety: `distHops`/`distPaths` are append-only via re-slice in
`buildDistanceIndex` / `updateDistanceIndexForTxs` (both under
`s.mu.Lock`). If the backing array reallocates after the snapshot, the
snapshot still references the prior array (GC-pinned) at the consistent
length captured under the lock. Records are value types — no torn
writes.

## Test results
`cmd/server/distance_lock_contention_test.go` (8 reader goroutines × 20k
synthetic distHops × 200 writer Lock/Unlock cycles):
- pre-fix avg writer cycle: **82367µs** (16.5s for 200 cycles)
- post-fix avg writer cycle: **1µs** (279µs for 200 cycles)
- ~82000× reduction in writer contention; reader result shape unchanged

Full `go test ./cmd/server/...` green with `-race`.

## Out of scope (per issue)
- Same lock pattern in topology / RF / hash / subpath analytics — file
separately if needed.
- Per-region cache key sharding.
- WebSocket-driven cache invalidation.

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-16 20:56:52 +00:00
Kpa-clawbot aba20b3eda fix(#1234): Live mobile chrome pass 2 — single-row header, hide top-nav, VCR overflow (#1238)
## Summary

Live page mobile chrome-reduction pass 2. Three coordinated trims at
≤640px:

1. **`.live-header` → single row, ≤44px.** Drop the MESH LIVE text label
and the chart-icon (📊) header toggle. Promote `.live-stats-row` to a
direct child of `.live-header` so beacon + pkts + nodes + active + rate
+ gear all sit on one row. The (now empty) `.live-header-body` collapses
to `display:none`. `.live-controls-toggle` shrinks to 36×36 to fit the
strip.
2. **Top app navbar hidden on `/live`.** `body:has(.live-page) .top-nav
{ display:none }` — scoped via `:has()` so other routes are unaffected.
The `.live-page` height reclaims the freed 52px.
3. **VCR scope row: >6h collapsed into `More ▾`.** `12h` and `24h` get
`.vcr-scope-btn--overflow`; the new `.vcr-scope-more-wrap` dropdown is
desktop-hidden, mobile-shown. Dropdown items proxy `.click()` to the
underlying scope buttons — single source of truth, existing handler
unchanged.

## TDD

- **RED** (`b975c828`): `test-issue-1234-live-chrome-pass2-e2e.js` — one
E2E asserting all three acceptance items at 375×800 + desktop sanity at
1280×800. Wired into `deploy.yml`. Fails on master (no More button,
navbar visible, MESH LIVE label visible).
- **GREEN** (`1e529e63`): CSS + JS implementation. Updates
`test-live-layout-1178-1179-e2e.js` and
`test-issue-1204-live-panel-structure-e2e.js` in-place to match the new
single-row contract (chart toggle gone, MESH LIVE label gone on mobile,
gear shrunk to 36×36).

## Verification (local)

- New E2E: 7/7 
- `test-issue-1178-1179`: 10/10 
- `test-issue-1204`: 10/10 
- `test-issue-1205`: 18/18 
- `test-issue-1206`: 7/7 
- `test-live-mql-leak-1180`: 2/2 
- `#1220` empty-chrome guard (in `test-e2e-playwright.js`): header =
38px collapsed 

Desktop (1280×800) layout unchanged — top-nav visible, all 4 VCR scopes
inline, header behavior identical.

Fixes #1234.

---------

Co-authored-by: corescope-bot <bot@corescope.local>
2026-05-16 20:09:24 +00:00
Kpa-clawbot 4ea1bf8ebc fix(#1236): map mobile — sticky panel header + remove right gutter (#1237)
RED: 862d7c82 — E2E asserting (A) leaflet-map width == viewport on
mobile and (B) sticky panel header. CI URL: see Checks tab.

Fixes #1236.

## Sub-issue A — Map Controls panel scroll affordance
**Root cause:** `.map-controls` already had `max-height` + `overflow-y:
auto`, but the `<h3>` title was static — once the panel scrolled, the
title scrolled away with it and users lost the affordance that they were
inside a scroll container. No visual cue, no anchor.

**Fix:** make `.map-controls h3` `position: sticky` at the top of the
scroll container (pulled flush to the panel edges with negative margins
so it covers the corner radius cleanly), with the panel `--card-bg`
background and a `--border` bottom rule. Added `scrollbar-gutter:
stable` so the scroll indicator is consistently present.

## Sub-issue B — Map canvas offset left with right gutter
**Root cause:** `.map-side-pane` (Path Inspector) is `flex: 0 0 32px`
inside the flexbox `#map-wrap`. At every viewport width that 32px is
consumed before the leaflet canvas gets sized, leaving an unused band on
the right. Desktop has room for it; mobile (375px viewport) does not —
and Path Inspector hex-prefix entry is impractical on a phone anyway.

**Fix:** `display: none` on `.map-side-pane` at `≤640px`. Leaflet canvas
now fills 100% of the viewport.

## Verification
- E2E `test-issue-1236-map-mobile-e2e.js` covers both at 375x800 +
desktop guard at 1280x800. RED commit (`862d7c82`) failed 2/3 mobile
assertions; GREEN commit (`85efcba7`) passes 3/3.
- Map canvas width at 375x800: **343px → 375px**.
- Existing channels mobile E2E (#1224) still passes.
- Desktop (1280px): panel stays `position: absolute`, Path Inspector
pane still present.

All colors via CSS variables. No JS changes.

---------

Co-authored-by: OpenClaw Bot <bot@openclaw.local>
2026-05-16 20:01:00 +00:00
Kpa-clawbot 2e28aa3e04 fix(#1229): source-diversity confidence weighting in neighbor-graph tier-1 resolver (#1235)
RED 235b65b4 (CI will surface URL after PR open) — `test(#1229): tier-1
must prefer multi-observer edges`. Green: 841fc5de.

## Summary
Implements **Option C** from issue #1229: edge source-diversity
confidence weighting. Each neighbor-graph edge already tracks the set of
distinct observers that contributed to it (`NeighborEdge.Observers`).
This PR is the first to consume that signal in the disambiguator.

Tier-1 score in `pm.resolveWithContext` becomes `Score(now) ×
Confidence()` where:

```
Confidence() = min(1.0, max(1, |Observers|) / 3.0)
```

- 1 observer → 1/3 weight (single-source, suspect)
- 2 observers → 2/3 weight
- ≥3 observers → 1.0 (saturated, full historical weight)

A 6-observer edge (30 obs) now beats a 1-observer edge (25 obs) by 3.6×
(vs. 1.2× before) — enough to clear `affinityConfidenceRatio` and skip
the tier-2 geo fallback that was misresolving in cross-region cases.
Stacks with the geo-rejection filter merged in #1228/#1230 to give two
independent defenses against cross-region prefix-collision pollution.

## Why C over A/B
- **A (per-observer graphs):** N×memory cost, biggest refactor surface.
- **B (per-region/IATA segmented):** requires region attribution on
every packet + per-region cache plumbing; deferred follow-up.
- **C:** smallest diff (~30 lines), no schema migration, leverages an
existing field, composes additively with #1228.

A and B remain valid follow-ups if C proves insufficient.

## Backward compatibility (persistence)
`neighbor_edges` schema is **unchanged**. `Observers` is rebuilt by
`BuildFromStoreWithOptions` from live observations on every graph
refresh (5-min TTL). Persisted rows carry an empty set only during the
post-restart warm-up; `Confidence()` defaults n→1 when `|Observers|==0`,
so legacy rows resolve as single-observer (degraded but non-zero)
confidence rather than disappearing. Defensive.

## Tests
- `cmd/server/hop_disambig_confidence_test.go:48` — RED-then-GREEN E2E:
two `8a` candidates from the same anchor, candX placed geo-near with 1
observer × 25 obs, candY placed geo-far with 6 observers × 5 obs.
Without confidence weighting tier-1 falls through (1.2× ratio) and
tier-2 picks the wrong (geo-near) candX. With confidence weighting
tier-1 fires and picks candY. Asserts `method == "neighbor_affinity"` to
pin the resolver path.
- `TestNeighborEdge_ObserverSetIsDistinct` — guards the source-diversity
counter against double-counting same-observer contributions and pins the
`Confidence()` formula at both endpoints (single → fractional, ≥3 →
1.0).

All existing tier-1 tests (`hop_disambig_tier1_test.go`) continue to
pass — they seed with a single observer, so their weights drop from 1.0
to 1/3 uniformly across candidates, preserving the ratio guard outcome.

Fixes #1229

---------

Co-authored-by: bot <bot@corescope.local>
2026-05-16 19:55:00 +00:00
Kpa-clawbot b21badbcbd fix(#1225): paginate channel messages at SQL level — 30s → <500ms (#1226)
## Summary
Fixes #1225 — channel messages endpoint took ~30s on staging.

## Root cause
`(*DB).GetChannelMessages` SELECTed every observation row for the
channel (one row per observation, not per transmission),
JSON-unmarshalled each row into a Go map, dedupe-folded by `(sender,
packetHash)`, then sliced the tail in Go for pagination.

On staging `#wardriving`:
- `transmissions` rows with `channel_hash='#wardriving' AND
payload_type=5`: **5,703**
- `observations` joined to those: **274,632** (~48× amplification)
- `time curl /api/channels/%23wardriving/messages?limit=50`: **30.04s /
31.41s / 31.48s / 35.33s / 34.05s** (5 calls before I killed the loop)

`EXPLAIN QUERY PLAN` showed the index `idx_tx_channel_hash` was being
used — the cost was entirely in fetching, unmarshalling, and folding the
full observation set per request even for `limit=50`.

Hypothesis #1 from the issue (full table scan on `messages/decoded`) is
rejected; #2 (missing index) is rejected; the actual cause was
**pagination in Go instead of SQL** — request cost was O(observations)
not O(limit).

## Fix
Move pagination into SQL on the `transmissions` table. Because
`transmissions.hash` is `UNIQUE` and the original dedup key was
`(sender, hash)`, each transmission collapses to exactly one logical
message — paginating on transmissions is semantically equivalent to the
prior in-Go dedup + tail slice.

New shape:
1. `COUNT(*)` on transmissions for total (uses `idx_tx_channel_hash`).
2. `SELECT id FROM transmissions … ORDER BY first_seen DESC LIMIT ?
OFFSET ?` to pick the page of newest transmissions.
3. `SELECT … FROM observations WHERE transmission_id IN (…page ids…)` —
typically 50 ids → a few hundred observation rows.
4. Reassemble in pageIDs order, preserving the ASC-by-`first_seen` API
contract.

Region filtering, observation-count-as-`repeats`, and "first observation
wins for hops/snr/observer" semantics are preserved (observations are
scanned `ORDER BY o.id ASC`).

## Perf measurements
**Before** (staging `#wardriving`, limit=50, 5 samples killed mid-loop):
30.04s, 31.41s, 31.48s, 35.33s, 34.05s.
**Synthetic regression test**
(`TestGetChannelMessagesPerfLargeChannel`): 3000 tx × 50 obs.
- Broken impl: ~4.5s (test fails the 500ms budget — the RED commit).
- Fixed impl: well under 500ms (test passes).
**After (staging)**: will measure post-deploy and post-comment on issue
with numbers. Synthetic scaling: staging is ~2× the test's transmission
count, fixed-path cost scales with `limit` (50) + `COUNT(*)` (~5k rows
on index) — expect <100ms p99.

## TDD
- RED: `697c290d` — perf test asserts <500ms on 3k×50 dataset; fails at
~4.5s.
- GREEN: `3f1f82d3` — fix; full suite green, perf test passes.

## Hypotheses status
| # | Hypothesis | Verdict |
|---|---|---|
| 1 | Endpoint slow on prod-sized data | **CONFIRMED** (different
mechanism — see root cause) |
| 2 | Missing channel_hash index | Rejected (`idx_tx_channel_hash`
exists & used) |
| 3 | Frontend re-render storm | Not investigated (backend was clearly
the bottleneck) |
| 4 | Decode in request path | Rejected (decode is at ingest time; JSON
unmarshal of cached `decoded_json` is the cost, addressed by reducing
row count) |
| 5 | WS subscription failure | Rejected |
| 6 | Staging artifact | Rejected (reproducible) |

## Out of scope
- The in-memory `(*PacketStore).GetChannelMessages` path (used when
`s.db == nil`) has the same shape but operates on bounded in-memory
data; not touched. If we ever fall back to it in production we'll
revisit.

---------

Co-authored-by: clawbot <bot@corescope>
2026-05-16 17:28:40 +00:00
Kpa-clawbot 7179afcfde feat(#1228): reject geo-implausible neighbor-graph edges at build time (#1230)
Fixes #1228 — geo-implausible neighbor-graph edges are rejected at build
time.

Red commit: `5a6d9660` — failing tests for 4 cases (reject SF↔Berlin,
accept local CA, accept no-GPS endpoint, counter increments). Live CI
run (latest commit):
https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1228

## Why

The disambiguator's tier-1 affinity graph is built blindly from path
co-occurrence. On wide-geo MQTT deployments, a single bad hop
disambiguation seeds an edge across geographically impossible distances
(e.g. Bay Area ↔ Berlin), which then reinforces the same wrong
resolution next time. Self-poisoning spiral.

## What changed

- `upsertEdge` now consults a per-graph GPS index. When **both**
endpoints have known GPS and their haversine distance exceeds the
threshold, the edge is dropped and `NeighborGraph.RejectedEdgesGeoFar`
(atomic) is incremented.
- Either endpoint missing GPS ⇒ accept (no signal to reject), per
acceptance criteria.
- Threshold is configurable via `neighborGraph.maxEdgeKm` (default **500
km** — well above any plausible terrestrial LoRa hop, including
satellite-assisted). 0 ⇒ use default; negative ⇒ disable the filter.
Exposed via `Config.NeighborMaxEdgeKm()`.
- New `BuildFromStoreWithOptions` carrying the threshold;
`BuildFromStore` and `BuildFromStoreWithLog` are kept as thin wrappers.
- Stats are surfaced under `GET /api/analytics/neighbor-graph` as
`stats.rejected_edges_geo_far`.
- All rejection logs PII-truncate pubkeys to 8 hex chars (public repo
discipline).
- `config.example.json` updated with the new field + comment.

## Follow-up

#1229 (per-region scoped affinity graphs) depends on this landing first.

---------

Co-authored-by: corescope-bot <bot@corescope.local>
2026-05-16 10:14:44 -07:00
Kpa-clawbot 30ff45ad34 fix(#1220): collapse MESH LIVE mobile header into a single ~50px strip (#1223)
RED commit `c1a8cea` — E2E at 375x800 asserts MESH LIVE header is either
≤60px (collapsed) or ≥60px with a visible body. Fails on master with
`height=118, bodyVisible=false, ctrlsVisible=false` — the empty-chrome
middle state.

CI for red commit: https://github.com/Kpa-clawbot/CoreScope/actions
(will populate after push).

## Diagnosis
On `(max-width: 768px)`, `#1180` collapses both `.live-header-body` and
`.live-controls-body` to `display:none`. But `.live-controls` carries
`flex: 0 0 100%` from the wide-viewport rule (introduced for `#1219` so
the toggles wrap onto their own row below the title on tablet). On
mobile, with the body hidden, that 100% basis still forces the gear
button onto a full-width second row inside `#liveHeader`'s flex-wrap,
~60px tall — yielding the `~118-200px` empty panel the bug screenshot
shows (the count badge + 📊 toggle on row 1, gear alone on row 2, nothing
else).

## Fix — Option C
Inside `@media (max-width: 768px)`, when `.live-controls.is-collapsed`:
- drop `flex: 0 0 100%` → `flex: 0 0 auto; width: auto` so the gear
inlines with the critical strip + 📊 toggle
- when the header is also collapsed
(`.is-collapsed:has(.live-controls.is-collapsed)`), zero the vertical
padding so the strip hugs the 48px tap targets

Result: collapsed mobile panel = single ~50px row, three icons inline.
Expanded mobile = full toggle list (149px). Desktop unchanged (83px).

Why Option C over A/B: a packet-watching mobile user keeps the map
dominant and reaches for the gear when they want filters. The compact
strip preserves both the WS-down red beacon (always visible) and the pkt
count, with one-tap access to expand either body.

Does not reintroduce #1204 (counter still attached to header) or #1205
(toggles still children of `#liveHeader`).

Fixes #1220

---------

Co-authored-by: openclaw-bot <openclaw-bot@users.noreply.github.com>
2026-05-16 15:54:12 +00:00
Kpa-clawbot 70855249c2 fix(#1224): channels page mobile UX overhaul (#1227)
## Summary
RED test commit: `02652d0042b7cf65d1f9b3e96ce376bbb3064ba6` — CI:
https://github.com/Kpa-clawbot/CoreScope/actions

Mobile UX overhaul for the Channels page (#1224). At 375x800 the sidebar
header was 112px tall (title + button stacked, analytics link + region
filter each on their own row) and the channel-name column was clipped to
83px by the inline 📤 Share + ✕ Remove buttons.

## What changed
- **Header is now ONE row**: title + region filter + `+ Add` chip + `📊`
analytics overflow chip. Capped to ≤56px on mobile.
- **`+ Add Channel` → `+ Add` chip** (no longer a full-width hero).
Verified <65% of sidebar width.
- **Analytics link** is an icon-only chip inside the header (was a
full-row link below).
- **Region filter** is inline inside the header (was its own row).
- **Channel rows**: `.ch-item-name` takes `flex:1`, share button is
icon-only (📤), remove button shrunk to 32px touch target. Name >150px on
the first row.
- **Empty state** is `max-height:30vh; padding:12px` on mobile — no
longer dominates the viewport.

## Design decisions
- Chose **inline chips** over an overflow `⋮` menu: header-level
controls are few enough (4) that stacking pills + filter dropdown fits
comfortably in 375px. Avoids the cost/complexity of a popover and
matches the page's existing pill vocabulary (region filter).
- Per-row share/remove kept inline but icon-only (`font-size:0` +
`::before`) — preserves single-tap access without consuming the row.
- Touch targets stay ≥32px (action chips) / 44px (other tappables); WCAG
2.5.5 spirit retained on the dominant interactive paths.
- **Desktop layout (≥768px) is unchanged** — verified by a desktop guard
in the E2E (`.ch-layout` flex-direction stays `row` at 1024px).

## Tests
- `test-issue-1224-channels-mobile-ux-e2e.js` — 5 assertions at 375x800
+ 1 desktop guard at 1024x800. Wired into CI.
- Existing channel suites still pass: `test-channel-fluid-e2e.js`
(11/11), `test-channel-issue-1087-e2e.js` (3/3),
`test-channel-issue-1111-e2e.js` (2/2), `test-channel-modal-ux.js`
(33/33), `test-channel-ux-followup.js` (29/29),
`test-channel-sidebar-layout.js` + `test-channel-fluid-layout.js`
(14/14).

Fixes #1224

---------

Co-authored-by: clawbot <clawbot@users.noreply.github.com>
2026-05-16 15:50:52 +00:00
Kpa-clawbot 24f277e5c6 fix(#1221): VCR LED clock in-row with controls and unclipped on mobile (#1222)
Red commit: 41d02ffa (CI run: pending — will fill in after first CI run
completes)

## Summary
Fixes #1221. VCR LED clock (`.vcr-lcd`) was wrapping to a separate row
on mobile (`.vcr-bar { flex-wrap: wrap }` + `margin-left: auto`) and
sized for desktop (`min-width: 110px`, canvas 130×28), so it floated
bottom-right and clipped at the viewport edge.

## Fix
- DOM (`public/live.js`): no move needed — `.vcr-lcd` is already a child
of `.vcr-bar`. (Verified by grep.)
- CSS (`public/live.css`) mobile `@media (max-width: 640px)`:
- Removed `margin-left: auto` on `.vcr-lcd` so it stays in-row with
controls.
- Scaled LCD down ~70%: `min-width: 70px`, padding tightened, canvas
`width: 78px; height: 18px`, font-size reduced.
  - Removed redundant `display: flex` override.

## Test
RED → GREEN E2E at `test-e2e-playwright.js` (around line 2978): viewport
375×800, asserts:
- LCD inside `.vcr-bar`, shares parent with `.vcr-controls`.
- LCD bounds entirely inside viewport (no clip on any side).
- LCD vertically overlaps `.vcr-controls` (same row).
- LCD width < 100px on mobile (scaled vs desktop).

E2E assertion added: `test-e2e-playwright.js:2978`
Browser verified: staging analyzer.00id.net after merge (manual VCR
layout sanity)

Fixes #1221

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-16 08:36:38 -07:00
Kpa-clawbot ab34d9fb65 fix(#1206): keep VCR bar from occluding the live packet feed (#1213)
Red commit: `bcfc74de` (CI:
https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1206)

Fixes #1206.

## Problem
On Live Map the VCR (timeline/playback) bar overlays the bottom of the
viewport. Bottom-pinned overlays — the live packet feed, the legend, any
corner panel — used hard-coded `bottom: 58–88px` offsets that are
smaller than the real bar height (two-row mobile layout +
`env(safe-area-inset-bottom)` push it to ~80px and beyond). The last N
packet-feed rows slid under the bar and became unreadable / unclickable.

## Fix
Publish the bar's measured height as a CSS variable on the live page
and bind every bottom-anchored overlay to it.

- `public/live.js` — new `initVCRHeightTracker()` runs after init; uses
  `ResizeObserver` + `resize` / `visualViewport.resize` to keep
  `--vcr-bar-height` on `.live-page` in sync with `#vcrBar`.
- `public/live.css` — `.live-feed`, `.feed-show-btn`, and the
  `.live-overlay[data-position="bl"|"br"]` corner slots now use
  `bottom: calc(var(--vcr-bar-height, 58px) + 10px)`. The feed's
  `max-height` is also capped against `100dvh - top - vcr - margin`
  so its scroll container can never extend past the bar.
- Stale per-breakpoint overrides (the `@supports(env(safe-area-inset))`
  hard-coded `78px + safe-area` for feed/legend) are removed in favor
  of the single tracked variable.

## TDD
- Red commit `bcfc74de` adds `test-issue-1206-vcr-overlap-e2e.js`:
  asserts `#liveFeed.getBoundingClientRect().bottom <= #vcrBar.top`
  (and same for the last row) at desktop 1280x800 and mid 720x800.
  Verified locally that reverting the green commit makes the feed-bottom
  assertions fail (feed bottom 742px > VCR top 721px) — see PR body for
  exact numbers from the local run.
- Green commit `1ad17e7f` makes all 5 assertions pass.

## Browser verified
Local Go server with `test-fixtures/e2e-fixture.db`, headless Chromium
via the new E2E test — all 5 assertions green.

## E2E assertion added
`test-issue-1206-vcr-overlap-e2e.js:84` (bottom-row vs VCR-top) plus
container check at `:74`.

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
Co-authored-by: clawbot <bot@corescope.local>
2026-05-16 05:55:21 +00:00
Kpa-clawbot a1f9dca951 fix(live #1205): re-anchor settings toggles inside MESH LIVE panel (#1219)
Red commit: f80ce5248a (CI URL appears in
the Checks tab once the workflow starts).

Supersedes closed PR #1209 with the correct approach (toggles in MESH
LIVE panel, not legend).

Fixes #1205.

## Problem
The Live Map settings toggle row (Heat / Ghosts / Realistic / Color by
hash / Matrix / Rain / Audio / Favorites / node filter / region filter —
`#liveControls`) rendered as a free-floating sibling `.live-overlay`
pinned `position: fixed` at bottom-right with `bottom: calc(78px +
var(--bottom-nav-height) + safe-area)`. On many viewports it visually
orphaned across the middle of the map, anchored to no panel.

## Regression cause
PR **#1180** (commit `127a1927` — "compact header, pin controls
bottom-right, narrow toggles") extracted `.live-toggles` from inside
`.live-header` (the MESH LIVE panel) into a brand-new sibling
`.live-overlay.live-controls` cluster. Before #1180 the toggles lived as
a direct child of `.live-header`.

## Fix
Restore the pre-#1180 structural pattern: `#liveControls` is re-parented
as a child of `#liveHeader`, breaking onto its own row via `flex: 0 0
100%`. No more `position: fixed` overlay, no more free-floating cluster
— the toggles share the MESH LIVE panel's chrome (background, blur,
border, padding).

- `public/live.js`: re-parent the `#liveControls` block inside
`#liveHeader`, drop the `.live-overlay` class.
- `public/live.css`:
- `.live-controls`: `position: static`, transparent (header supplies
chrome), `flex: 0 0 100%`.
- `.live-header`: `flex-wrap: wrap`, `row-gap: 6px`, `max-width:
calc(100vw - 24px)`; drop the `max-height: 40px` cap.

Why this beats PR #1209: that PR parked toggles inside `#liveLegend`,
inverting the *data → key → controls* hierarchy and pushing the legend
to 60vh on mobile. Anchoring back to the MESH LIVE panel keeps controls
with the panel that already labels the live surface and inherits its
corner / drag affordances.

## Tests
- **Red** (`test-issue-1205-live-controls-anchor-e2e.js`): asserts
`#liveHeader.contains(#liveControls)` AND not contained in
`#liveLegend`, parent is not `<body>` / `.live-page` directly, and the
controls rect stays within the viewport. Runs at **1440×900, 640×900,
320×800**. Fails on master.
- **Updated** `test-live-layout-1178-1179-e2e.js`:
- (a) `.live-header-critical` height ≤ 40px (the critical strip stays
compact; header itself now wraps).
- (b) `.live-controls` `position: static` AND descendant of
`#liveHeader` (new contract replacing the retired "fixed/right
≤24px/bottom>0").
- Wired in `.github/workflows/deploy.yml` next to the other live-layout
E2Es.

## Acceptance criteria
- [x] Settings toggle row renders inside the MESH LIVE panel
(`#liveHeader`)
- [x] Not parked in `#liveLegend` (rejected by #1209 review)
- [x] Tested at desktop + tablet + narrow phone viewport widths
- [x] E2E DOM assertion: parent is the MESH LIVE panel, not body /
`.live-page` / `#liveLegend`

---------

Co-authored-by: meshcore-bot <bot@meshcore.local>
Co-authored-by: clawbot <clawbot@users.noreply.github.com>
2026-05-16 05:54:43 +00:00
Kpa-clawbot 170f0ac66d fix(#1212): MQTT per-attempt logging + stall watchdog — prevent silent reconnect-loop death (#1216)
RED commit: `1cd25f7b` — CI (failing on assertion):
https://github.com/Kpa-clawbot/CoreScope/actions?query=sha%3A1cd25f7b1bdd0091f689dd64ce1bfec6d031191f

Fixes #1212

## Root cause

NOT that `AutoReconnect` was off — it was set;
`MaxReconnectInterval=30s` was set (PR #949); a `SetReconnectingHandler`
was wired. The defect was an **observability gap**:

`SetReconnectingHandler` fires only INSIDE paho's reconnect goroutine.
If that goroutine never iterates (status race after the recovered
handler panic at 21:07:13, or an internal abort), operators see ONLY the
`disconnected: pingresp not received` line and then total silence. They
cannot distinguish "paho is patiently retrying" from "paho gave up and
the goroutine is gone." That ambiguity is what turned a 30s blip into 6h
of downtime.

## Changes

### `cmd/ingestor/main.go` — `SetConnectionAttemptHandler`
Fires on every TCP/TLS dial — the initial `Connect()` AND every
reconnect — independent of paho's internal reconnect-loop state. Logs:

```
MQTT [staging] connection attempt #1 to tcp://broker:1883
MQTT [staging] connection attempt #2 to tcp://broker:1883
```

Per-source attempt counter via `atomic.AddInt64`.

### `cmd/ingestor/mqtt_watchdog.go` (new) — per-source stall watchdog
Satisfies the watchdog acceptance criterion. Even when paho reports
`connected`, if no MQTT messages have flowed for >5m, log a WARN line
every 60s:

```
MQTT [staging] WATCHDOG: client reports connected to tcp://broker:1883 but no messages received for 7m30s (threshold 5m) — possible half-open socket or upstream stall
```

Catches half-open TCP and broker-accepted-but-not-forwarding scenarios
that look "connected" to paho.

Hot-path cost: one `atomic.StoreInt64` per inbound message. Watchdog
scans the registry once a minute.

### Tests (`cmd/ingestor/mqtt_reconnect_test.go`, new)
- `TestBuildMQTTOpts_InstrumentsConnectionAttempt` — asserts
`OnConnectAttempt` is wired in `buildMQTTOpts`.
- `TestMQTTStallWatchdog_FiresOnSilentSource` — connected + 10m silent +
5m threshold → stall flagged.
- `TestMQTTStallWatchdog_QuietWhenRecent` — recent message → no stall.
- `TestMQTTStallWatchdog_QuietWhenDisconnected` — disconnected → no
stall (paho's reconnect logging covers it).

## TDD
- RED `1cd25f7b` — 2 assertion failures (compile OK, stub returns
no-stall, `OnConnectAttempt` nil).
- GREEN `2527be6f` — implementation; all ingestor tests pass.

## Out of scope
- Slice-bounds decode panic (#1211, separate PR).
- A full in-process MQTT broker integration test would require a new dep
(mochi-mqtt) — the observability and watchdog behaviors are
independently verifiable by the unit tests above, and the reconnect path
itself is paho's responsibility (we already test it's configured via
`mqtt_opts_test.go`).

---------

Co-authored-by: bot <bot@example.com>
Co-authored-by: OpenClaw Bot <bot@openclaw.local>
Co-authored-by: corescope-bot <bot@corescope.local>
Co-authored-by: openclaw-bot <openclaw-bot@users.noreply.github.com>
2026-05-15 22:46:29 -07:00
Kpa-clawbot eba9e89a72 fix(#1203): path-inspector — singleflight + stale-while-revalidate (#1208)
Red commit: c84a8f575a (CI run: pending
push)

Fixes #1203 — path-inspector 503 storm.

Three sub-fixes, each shipped as red→green per AGENTS TDD:

**A. Singleflight on rebuild** (`ensureNeighborGraph`)
Hand-rolled `sync.Mutex + chan` singleflight — no new deps (x/sync was
not in cmd/server's go.mod). Concurrent callers attach to one in-flight
rebuild instead of N parallel `BuildFromStore` goroutines.
- Red: `7340f23b` — test asserts ≤1 build under 10 concurrent callers
(saw 10 on master)
- Green: `abac6b3c`

**B. Stale-while-revalidate** (`handlePathInspect`)
Stale non-nil graph is served immediately with `"stale": true` while a
background rebuild runs (deduped by A). The 2s synchronous gate is gone.
Stale responses are not cached, so the next request after rebuild lands
fresh.
- Red: `c84a8f57` — test asserts 200+`stale:true`+rebuild-kickoff
(master returned 503)
- Green: `5eb86975`

**C. Cold-start 503 still kicks rebuild**
True cold start (`graph == nil`) is the only path that still returns 503
`{"retry": true}`, but it now spawns an async `ensureNeighborGraph` so
the very next request warms up.
- Green test: `f5ac7059` (passed on top of A+B)

Singleflight verified: `TestEnsureNeighborGraph_Singleflight`
Stale-while-revalidate verified:
`TestHandlePathInspect_StaleWhileRevalidate`
Cold-start verified: `TestHandlePathInspect_ColdStartKicksRebuild`

**Acceptance criteria (issue #1203):**
- [x] Concurrent requests share ONE rebuild
- [x] Stale non-nil graph served with `stale:true` async
- [x] 503 only on true cold-start
- [x] Cold-start 503 kicks rebuild → follow-up warm
- [ ] p99 < 500ms under load (not unit-testable; design satisfies it)
- [x] No regression in existing tests

**Out of scope (per issue):** 5-min TTL constant, `BuildFromStore` perf,
`/api/analytics/topology`, persist-lock contention.

No new deps.

---------

Co-authored-by: corescope-bot <bot@corescope.local>
Co-authored-by: corescope-bot <bot@corescope.dev>
2026-05-15 22:46:28 -07:00
efiten 11d2026bb1 feat(startup): hot startup — load hotStartupHours synchronously, fill retentionHours in background (#1187)
Closes #1183

## Summary

- Adds `packetStore.hotStartupHours` config key (float64, default 0 =
disabled). When set, `Load()` loads only that many hours of data
synchronously, reducing startup time on large DBs. Background goroutine
fills the remaining `retentionHours` window in daily chunks after
startup completes.
- A background goroutine (`loadBackgroundChunks`) fills the remaining
`retentionHours` window in daily chunks after startup completes.
Analytics indexes are rebuilt once at the end.
- `QueryPackets` and `QueryGroupedPackets` check `oldestLoaded` and fall
back to `db.QueryPackets()` for any query whose `Since`/`Until` predates
the in-memory window — covering days 8–30 permanently (beyond
`retentionHours`) and the background-fill gap during startup.
- `/api/perf` gains `hotStartupHours`, `backgroundLoadComplete`, and
`backgroundLoadProgress` fields inside `packetStore` so operators can
monitor the fill.

### Drive-by fixes

- E2E: added `gotoPackets` navigation helper used across packet-related
tests
- E2E: rewrote stripe assertion to check per-row stripe parity rather
than a fragile computed-style comparison
- E2E: theme test updated to use `#/home` as the initial route (was
`#/`)
- `db.go`: removed the RFC3339→unix-timestamp subquery path in
`buildTransmissionWhere`; `t.first_seen` is now always compared directly
as a string for both RFC3339 and non-RFC3339 inputs

## Configuration

```json
"packetStore": {
  "retentionHours": 168,
  "hotStartupHours": 24
}
```

`hotStartupHours: 0` (default) preserves existing behavior exactly.
Recommended for large DBs to reduce startup time; set to 0 to disable
(loads full retentionHours at startup, legacy behavior).

## Test plan

- [x] `TestHotStartupConfig_Clamp` — clamping when `hotStartupHours >
retentionHours`
- [x] `TestHotStartupConfig_ZeroIsDisabled` — zero leaves feature
disabled
- [x] `TestHotStartup_LoadsOnlyHotWindow` — only hot-window packets in
memory after `Load()`
- [x] `TestHotStartup_DisabledWhenZero` — all retention packets loaded
when disabled
- [x] `TestHotStartup_loadChunk_AddsOlderData` — chunk merges correctly,
ASC order maintained
- [x] `TestHotStartup_BackgroundFillsToRetention` — background goroutine
fills to `retentionHours`
- [x] `TestHotStartup_ChunkErrorRecovery` — chunk SQL failure logged and
skipped, loop terminates
- [x] `TestHotStartup_SQLFallback_TriggeredForOldDate` — query before
`oldestLoaded` routes to SQL
- [x] `TestHotStartup_SQLFallback_NotTriggeredForRecentDate` — recent
query stays in-memory
- [x] `TestHotStartup_PerfStats` — new fields present in
`GetPerfStoreStats()` (backs the perf endpoint)
- [x] `TestHotStartup_PerfStoreHTTP` — HTTP-level: GET /api/perf returns
`hotStartupHours`, `backgroundLoadComplete`, `backgroundLoadProgress` in
`packetStore`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: openclaw-bot <bot@openclaw.local>
Co-authored-by: CoreScope Bot <bot@corescope.local>
2026-05-15 22:46:25 -07:00
Kpa-clawbot 3255395bd0 fix(#1204): MESH LIVE panel — header inherited column flex from .live-overlay (#1215)
Red commit: c159a1153d (CI run: pending —
first CI is on this PR)

Fixes #1204.

## Root cause

`.live-overlay` (the base class for all overlay panels: feed, legend,
node-detail, header) declares `flex-direction: column`. Feed/legend/
node-detail need that for their `.panel-header` + scrollable
`.panel-content` stacking — but the header doesn't, it's a horizontal
bar.

PR #1180 (#16c48e73) split the header from a flat layout into three
children: `.live-header-critical` (beacon + `0 pkts`) + collapsible
toggle button + `.live-header-body` (title + stats row). Without an
explicit `flex-direction` override, those three pieces inherited the
column default and stacked vertically — pushing `0 pkts` above the
`MESH LIVE` title and clipping the stats row out of the 40px max-height
container. Exactly the "detached counter, hollow shell" the issue
reports.

## Fix

Add `flex-direction: row` to `.live-header` (one line + comment).
Single-property CSS change, no JS, no DOM, no behavior outside layout.

## TDD

Red commit `c159a115` — E2E
`test-issue-1204-live-panel-structure-e2e.js`
asserts:
1. `.live-header-critical` and `.live-title` vertically overlap (same
row).
2. `#livePktCount` pill and title mid-Y differ by < 8px.
3. `.live-stats-row` is visible (nonzero size).
4. `.live-feed .panel-content` accepts an injected row (column
container).

Verified failing on master at red commit (3 of 5 fail with the exact
"stacked above title" signature). Green commit `b7f57072` flips all to
pass.

E2E assertion added: `test-issue-1204-live-panel-structure-e2e.js:55`

## Verified

- Local `cmd/server` + fresh fixture, viewport 1440×900, headless
Chromium: 5/5 pass.
- Preflight (`run-all.sh origin/master`): clean.

## Files

- `public/live.css` — `flex-direction: row` on `.live-header` (+
rationale comment)
- `test-issue-1204-live-panel-structure-e2e.js` — new E2E (added to
`deploy.yml`)

---------

Co-authored-by: corescope-bot <bot@corescope.local>
2026-05-15 22:34:22 -07:00
Kpa-clawbot 85e97d2f37 fix(#1211): bounds-check path length to prevent slice [218:15] panic in MQTT decode (#1214)
**RED commit:** `65d9f57b` (CI run will appear at
https://github.com/Kpa-clawbot/CoreScope/actions after PR opens)

Fixes #1211

## Root cause

`decodePath()` returns `bytesConsumed = hash_size * hash_count` where
both come straight from the wire-supplied `pathByte` (upper 2 bits →
`hash_size`, lower 6 bits → `hash_count`). Max claimable: 4 × 63 = 252
bytes.

A malformed packet on the wire claimed `pathByte=0xF6` (hash_size=4,
hash_count=54 → 216 path bytes) inside a 15-byte buffer. The inner
hop-extraction loop in `decodePath` did break early on overflow — but
`bytesConsumed` was still returned at face value (216). `DecodePacket`
then did `offset += 216` (offset=218) and `payloadBuf := buf[offset:]`
panicked with the prod-observed signature:

```
runtime error: slice bounds out of range [218:15]
```

The handler-level `defer/recover` at `cmd/ingestor/main.go:258-263`
caught it, but the message was silently dropped with no usable
diagnostic.

## Fix

Add a `if offset > len(buf)` guard at BOTH decoder sites (same pattern,
same panic potential):

- `cmd/ingestor/decoder.go` — DecodePacket after decodePath
- `cmd/server/decoder.go` — DecodePacket after decodePath

Return a descriptive error citing the claimed length and pathByte hex so
operators can reproduce.

Also: `cmd/ingestor/main.go` decode-error log now includes `topic`,
`observer`, and `rawHexLen` so future malformed packets are reproducible
without needing to attach a debugger.

## Tests (TDD red → green)

Both packages got two new tests:

- **`TestDecodePacketBoundsFromWire_Issue1211`** — feeds the exact wire
shape from the prod log (`pathByte=0xF6` inside a 15-byte buf). Asserts
`DecodePacket` does NOT panic and returns an error.
- **`TestDecodePacketFuzzTruncated_Issue1211`** — sweeps every `(header,
pathByte)` combination with tails 0..19 bytes (≈1.3M inputs). Asserts
zero panics.

### Red commit proof

On commit `65d9f57b` (RED), both tests fail with the panic:
```
=== RUN   TestDecodePacketBoundsFromWire_Issue1211
    decoder_test.go:1996: DecodePacket panicked on malformed input: runtime error: slice bounds out of range [218:15]
--- FAIL: TestDecodePacketBoundsFromWire_Issue1211 (0.00s)
=== RUN   TestDecodePacketFuzzTruncated_Issue1211
    decoder_test.go:2010: DecodePacket panicked during fuzz: runtime error: slice bounds out of range [3:2]
--- FAIL: TestDecodePacketFuzzTruncated_Issue1211 (0.01s)
```

On commit `7a6ae52c` (GREEN), full suites pass:
- `cmd/ingestor`: `ok 53.988s`
- `cmd/server`:   `ok 29.456s`

## Acceptance criteria

- [x] Identify the slice op producing `[218:15]` — `payloadBuf :=
buf[offset:]` in `DecodePacket` (decoder.go), where `offset` had been
advanced by an unchecked `bytesConsumed` from `decodePath()`.
- [x] Bounds check added at the identified site(s) — both ingestor and
server decoders.
- [x] Test with crafted payload (length-field > remaining buffer) —
`TestDecodePacketBoundsFromWire_Issue1211`.
- [x] Log topic, observer ID, payload byte length on drop — updated
`MQTT [%s] decode error` log line.
- [x] Existing tests stay green — confirmed both packages.

## Out of scope

Reconnect-after-disconnect (#1212) — handled by a separate subagent.
This PR touches NO reconnect logic.

---------

Co-authored-by: corescope-bot <bot@corescope.local>
Co-authored-by: openclaw-bot <bot@openclaw.local>
Co-authored-by: corescope-bot <bot@corescope>
2026-05-15 22:34:21 -07:00
Kpa-clawbot 4925770aa4 fix(#1207): empty-state placeholder for Live Feed panel (no more orphan chrome) (#1210)
Red commit: `6c28227884a1e79e277653465028365dc0863171` — CI:
https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1207

Fixes #1207

## Diagnosis

The Live Map page renders `#liveFeed` (bottom-left panel) with two
header buttons — `◫` (panel-corner-btn) and `✕` (feed-hide-btn) — but
its `.panel-content` body has zero children on first paint, before any
packets have been ingested via WebSocket. The user-reported "X + book
icons, no content" is exactly these two header buttons sitting on an
empty body.

**Verdict:** intended panel, missing content due to a data race — the
chrome mounts in HTML before the WS pushes its first packet. Not
orphaned, not a leftover from #1186.

## Fix

- Always render a persistent `.live-feed-empty` placeholder ("Waiting
for packets…") inside `#liveFeed .panel-content`.
- CSS hides it via `.live-feed .panel-content:has(.live-feed-item)
.live-feed-empty { display: none; }` when real feed items exist.
- `rebuildFeedList` re-adds the placeholder defensively after a wipe;
eviction loop counts `.live-feed-item` only so the placeholder is never
trimmed out.

All colors via CSS variables (`var(--text-muted)`).

## Test (RED → GREEN)

- **RED** `6c28227884a1e79e277653465028365dc0863171` —
`test-e2e-playwright.js` adds a new test ("#1207 Live Feed panel never
renders as empty chrome") that wipes `.live-feed-item` children to
simulate the empty state and asserts the panel body has visible text or
children. Fails on master.
- **GREEN** `a5af80960ac42759ec83fd5ca5a72e81856228d4` — adds the
placeholder; test now passes.

## Acceptance criteria

- [x] No empty panel chrome visible on Live Map page
- [x] Panel renders "Waiting for packets…" while feed is empty
- [x] CSS auto-hides placeholder when packets arrive
- [x] E2E assertion in `test-e2e-playwright.js` enforces non-empty
`.panel-content` on `#liveFeed`

## Files

- `public/live.js` — HTML markup + `rebuildFeedList` re-add +
eviction-loop guard
- `public/live.css` — `.live-feed-empty` style + `:has()` hide rule
- `test-e2e-playwright.js` — regression test

---------

Co-authored-by: clawbot <clawbot@kpabap.local>
Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-05-15 22:34:17 -07:00