334 Commits

Author SHA1 Message Date
Michael J. Arcan 096e16409c fix(#1741): wrap test-DB insert loops in a single transaction (#1819)
## Fixes #1741

`TestBoundedLoad_OldestLoadedSet` (and any test building a 5000-row
fixture) hung/timed out, blocking reliable `go test ./cmd/server` and
CI.

  ## Root cause

The four test-DB builders in `cmd/server/bounded_load_test.go`
(`createTestDBAt`, `createTestDBWithObs`, `createTestDBWithAgedPackets`)
inserted rows in a loop with no `BEGIN`/`COMMIT`. With the pure-Go
`modernc.org/sqlite` driver every `Exec` auto-commits → one fsync per
row → ~2N fsyncs for N transmissions (tx + obs). At
`numTx=5000` that's ~10k fsyncs and the fixture blows past the test
timeout. Sibling tests with `numTx<=3000` happened to stay under the
timeout, so only the 5000-row cases visibly hung.

  ## Fix

Wrap each insert loop in a single `BEGIN`/`COMMIT` so the whole fixture
build becomes one commit. Fixtures now finish in well under a second
regardless of `numTx`; the tests' actual assertions (`oldestLoaded` set,
newest-first ordering, bounded load) are exercised instead of the
timeout masking them. Also made the
prepared-statement `Exec` calls check their error (previously discarded)
so a failed insert surfaces instead of silently leaving the DB short.

  No production code changed — test infrastructure only.

  ## Verified

- `TestBoundedLoad_OldestLoadedSet`: **0.18s** (was: 30s timeout /
FAIL).
  - Full `TestBoundedLoad*` + retention group: passes in ~1.2s.
- `go test ./...` in `cmd/server`: exit 0 (no longer blocks on this
test).

Co-authored-by: Waydroid Builder <build@waydroid.local>
Co-authored-by: Claude <noreply@anthropic.com>
2026-07-03 02:21:08 -07:00
Michael J. Arcan 6a32ec2b2d fix(#1729): preserve firmware-default Public channel (0x11) in analytics (#1817)
## Fixes #1729

The firmware-default **Public** channel (channel-hash byte `0x11` = 17)
was rendered as an opaque **"Encrypted (0x11)"** row at the bottom of
the analytics Channels tab, despite the key being well-known and
builtin.

  ## Root cause

`computeAnalyticsChannels` applied the #978 rainbow-table validation
(`SHA256(SHA256("#name")[:16])[0]`, the **hashtag** hash scheme) to
every decoded channel name. The Public channel is a **PSK** channel
whose hash byte is key-derived (`SHA256(key)[0]` = 17), not
hashtag-derived (`186` for `#Public`). So the ingestor-decoded name
`"Public"` failed the hashtag check and was discarded, the row forced to
`encrypted=true, name="ch17"`.

  ## Fix

Trust the ingestor's `decryptionStatus`. The ingestor already persists
`decryptionStatus:"decrypted"` when it decoded a packet with a real key
(PSK), and `"no_key"` / `"decryption_failed"` otherwise. When the packet
is `decrypted`, skip the hashtag hash check and keep the name — it came
from a key-based decryption, not a
rainbow-table lookup. The #978 mismatch rejection still applies to
non-decrypted packets, so rainbow-table collisions are still caught.

Frontend needs no change: `encrypted=false, name="Public"` lands in the
"Network" group (top), not "Encrypted".

  ## Tests

- `makeGrpTx` gains `makeGrpTxWithStatus` companion to set
`decryptionStatus`.
- `TestComputeAnalyticsChannels_PublicChannelPreserved`: hash 17 /
"Public" / `decrypted` → name stays `"Public"`, `encrypted=false`.
- `TestComputeAnalyticsChannels_UndecryptedNameStillValidated`: a
non-`decrypted` name failing the hashtag check is still downgraded to
`ch17` (#978 regression guard).

  All channel-analytics tests pass; `go build ./...` clean.

Co-authored-by: Waydroid Builder <build@waydroid.local>
Co-authored-by: Claude <noreply@anthropic.com>
2026-07-02 19:20:14 -07:00
Kpa-clawbot 750b8742a7 fix(staging-compose): decouple in-container mosquitto from standalone broker (#1813)
Red commit: `3898dbc5` (verified locally — CI run URL pending)

## Problem

A standalone `mqtt-broker` container (`eclipse-mosquitto:2`) was
provisioned out-of-band on the staging VM. It now owns MQTT, is attached
to external docker network `meshcore-net`, and binds host port `8883`.
The current `docker-compose.staging.yml` still:

- Publishes `1883:1883` on the host (dead weight; conflicts the moment
the broker moves to that port).
- Defaults `DISABLE_MOSQUITTO=false`, so the in-container mosquitto
burns RAM and briefly contests the `mqtt-broker` docker DNS name on cold
start.
- Doesn't join `meshcore-net`, so the ingestor can't resolve
`mqtt-broker:1883` via docker DNS without manual surgery.

## Fix (`docker-compose.staging.yml` only)

1. Remove the `1883:1883` host port publish from `staging-go`.
2. Flip `DISABLE_MOSQUITTO` default from `false` to `true`. Operators
can opt back in with the env var.
3. Attach `staging-go` to both `default` and `meshcore-net`; declare
`meshcore-net` as `external: true` so the file never tries to
create/destroy operator state.

Healthcheck and Caddy/443 plumbing untouched (out of scope).

## Test added (TDD framing: Option A — Go shape-asserts)

`cmd/server/staging_compose_broker_test.go:1` adds four regex-based
assertions on the compose file shape:

- staging-go does **not** bind port `1883` in ANY form (quoted/unquoted
short form, or long-form `target: 1883` / `published: 1883`).
- `DISABLE_MOSQUITTO` uses the interpolated default form
`${DISABLE_MOSQUITTO:-true}` (preserves operator override). Bare literal
`true`, or a later `=false` override in the same env block, is rejected.
- Top-level `networks:` declares `meshcore-net` as `external: true`.
- `staging-go` attaches to `meshcore-net` via a real
`services.staging-go.networks:` sub-key (comment-stripped so an
in-comment example can't masquerade).

Regex (not YAML byte-equality) so cosmetic edits don't break the guard.
No new go module deps. Red commit `3898dbc5` fails all 4 assertions on
master. Green commit `38297ff4` makes them pass. Round-1 hardening
commit `9f7155e2` tightens the regexes (per adversarial + kent-beck
must-fixes) and was verified against master's YAML shape — all 4 tests
fail on `origin/master`'s compose, pass on branch, proving the tightened
regexes still gate a real regression.

## Risk

Low, with one intentional semantic change.

- **Semantic change (v3.7+):** `DISABLE_MOSQUITTO` in
`docker-compose.staging.yml` now defaults to `true`. This is a
**deliberate flip** — the standalone `mqtt-broker` container is now
authoritative on the staging host, and running the in-container
mosquitto alongside it wastes RAM and races the docker DNS name
`mqtt-broker` on cold start. Operators who want the pre-v3.7 shape
(in-container mosquitto + host-published `1883`) must explicitly opt
back in via env override AND re-add the `1883:1883` port mapping
(concrete snippet is inline in the compose file and in `DEPLOY.md` under
"Standalone MQTT broker (staging)"). This intent is called out in a
`SEMANTIC CHANGE (v3.7+)` header comment at the top of
`docker-compose.staging.yml`.
- **Deploy prereq:** the external `meshcore-net` docker network MUST
already exist on the host before `docker compose up`. If it doesn't,
compose refuses to start `staging-go`. This is documented inline in the
compose file (with the `docker network create meshcore-net` one-liner)
and in `DEPLOY.md`.
- **Only takes effect where the standalone broker is deployed** — which
it already is on staging today. The legacy `DISABLE_MOSQUITTO=false`
path remains reachable via env override; the ingestor's upstream config
is untouched.

Partial fix — no tracking issue; follow-up to operator-side broker
provisioning.

---------

Co-authored-by: corescope-bot <bot@corescope.local>
Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-06-30 18:21:21 -07:00
Kpa-clawbot fa15ab0a30 fix(#1809): gate background loader on LoadChunked completion (#1811)
Partial fix for #1809.

Red commit: c9c782b5 (CI runs on PR open; standalone red-branch CI not
configured — local repro proves the gate, see below).

## Problem
Issue #1809: at startup the background fill loader logged `background
load FAILED` within seconds and `backgroundLoadFailed=true` was set,
leaving the coverage gate tripped even though LoadChunked itself
completed normally.

## Root cause
`main.go:225-245` spawned `go store.loadBackgroundChunks()` as soon as
`FirstChunkReady` fired (chunk #1 = 10000 tx). But `s.oldestLoaded` is
only assigned at the end of `LoadChunked` (`chunked_load.go:329-333`),
~tens of seconds later. The bg loader read `oldestLoaded==""` at
`store.go:1462-1466`, broke out immediately, walked zero chunks, and the
coverage gate at `store.go:1543-1554` flipped
`backgroundLoadFailed=true`.

## Fix (initial)
Introduce `PacketStore.RunStartupLoad(chunkSize)` (`chunked_load.go`).
It runs `LoadChunked` first; only on success and only when
`hotStartupHours > 0` does it call `loadBackgroundChunks`. `main.go`
invokes `RunStartupLoad` in the same goroutine pattern as before, so
`FirstChunkReady` still unblocks the HTTP listener bind at chunk #1 —
only the bg loader is gated.

## Round-1 followups (this push)
Reviewer-driven hardening on top of the initial fix:

### Production behavior (commit db5592f6)
- **Steady-state semantics tightened.** `RunStartupLoad` now picks a
terminal state on every branch:
- LoadChunked error → `backgroundLoadFailed=true` with captured error
(was: `done=false, failed=false` indefinite).
- `hotStartupHours == 0` → `backgroundLoadDone=true` immediately,
`progress=100` (was: `done=false` forever → healthz stuck on
`backgroundLoadComplete=false`).
- Successful hot-window path → terminal state is whatever
`loadBackgroundChunks` sets (#1690 semantics, unchanged).
- **Runtime invariant assertion (A7).** `loadBackgroundChunks` panics
when `oldestLoaded==""` and packets exist — a future refactor that
re-introduces the parallel-spawn race fails loudly instead of silently
shipping the same coverage regression.
- **`RunStartupLoad` cleanup.** Inlined the superfluous goroutine +
channel that wrapped `LoadChunked` (direct call is equivalent).
- **Logging.** Added an INFO line between `LoadChunked` completion and
bg-loader start (the #1809 post-mortem needed exactly this signal).
Fixed the lying `"background load will start"` log that fired even on
the `hotStartupHours==0` branch.
- **Immutability documented.** `hotStartupHours` is now explicitly
documented as immutable post-construction, so the lock-free reads in
`LoadChunked` / `RunStartupLoad` / `loadBackgroundChunks` are sound.

### Test coverage (commits db5592f6 + e9e12acf)
- **Tautology fix (B1, commit e9e12acf).** The original
`Test1809_StartupLoad_BgLoaderSeesOldestLoaded` fixture seeded all 100
rows inside the 1h hot window, so `LoadChunked` alone produced
coverage=1.0 — the test passed even if `loadBackgroundChunks` was a
no-op. Rewrote the fixture to spread 100 rows over 14 days with
`hotStartupHours=24`, so only ~7 rows are hot and the remaining ~93 MUST
be loaded by the bg loader for the assertions to hold. Original
red-commit assertions kept intact; added `len(packets) > hot-only cap`
and `oldestLoaded < hot-cutoff - 12h` assertions on top.
- **New tests (commit db5592f6, `runstartup_load_test.go`)** codify the
new contracts:
  - `TestRunStartupLoad_HotStartupHoursZero_SetsDoneImmediately`
  - `TestRunStartupLoad_LoadChunkedError_SetsFailedTerminal`
  - `TestRunStartupLoad_EmptyDB_SetsDoneTerminal`
  - `TestRunStartupLoad_BgLoaderRunsAfterLoadChunkedSets_OldestLoaded`
  - `TestLoadBackgroundChunks_PanicsOnOldestLoadedEmpty_Invariant`

### Docs (commit 70fa16f7)
- Package-level doc in `chunked_load.go` now documents `RunStartupLoad`
as the orchestrator entry point alongside `LoadChunked` /
`loadStatusMiddleware` / `OnChunkLoaded`.

### Preflight (commit eec1b48c)
- Test-fixture DDL annotated with `// PREFLIGHT: async=true
reason="unit-test fixture"` so the async-migration gate distinguishes
ephemeral test schema from prod migration paths.

## What's still NOT fixed (left intentionally open)
This PR addresses the startup race specifically. Issue #1809 will be
closed by the operator after observing healthy startup logs (`background
load complete: ... coverage=100.0%`) in prod for a full restart cycle.
Do not auto-close — leave open until the operator verifies.

## Test
Local repro (red branch state, before green commit): `go test
./cmd/server/ -run Test1809_StartupLoad_BgLoaderSeesOldestLoaded` → FAIL
on assertion `backgroundLoadFailed=true ...
oldest="2026-06-30T18:56:09Z"`. After green commit + round-1 followups:
PASS. Full `cmd/server/...` suite: ok in ~70s.

## Risk
Low — startup path only. New behavior gates surfaced by the new tests;
coverage-gate semantics unchanged. Runtime panic is a new failure mode
but only fires on a state (`oldestLoaded=="" && len(packets)>0`) that is
unreachable on the current code path — it exists solely as a refactor
tripwire.

---------

Co-authored-by: mc-bot <bot@corescope.local>
Co-authored-by: openclaw-bot <bot@openclaw.local>
Co-authored-by: meshcore-bot <bot@meshcore>
2026-06-30 17:19:56 -07:00
Kpa-clawbot 242c7c609b fix(mqtt): escalate persistent paho disconnect + recover from emit panic + expose watchdog tick (#1749) (#1810)
# Partial fix for #1749 — MQTT watchdog escalation + panic recovery +
tick exposure

Red commit: 9912bbb3e9 (CI run:
https://github.com/Kpa-clawbot/CoreScope/actions/runs?branch=fix%2Fissue-1749)

## Problem
Production CoreScope v3.9.1 (and a more recent prod recurrence on
2026-06-30 with the wcmesh source on `ssl://mqtt2.wcmesh.com:8883`)
showed two distinct watchdog failure modes:

1. **Per-source paho machinery dies silently.** `IsConnectedFn` returns
false; paho's `SetAutoReconnect(true)` never retries. The watchdog's
`processLivenessTransition` deliberately stays silent on
`LivenessDisconnected`, trusting paho to recover — so there is no
escalation path when that trust is misplaced.
2. **Watchdog goroutine death.** Three sources went silent within ~60s
of each other; no `WATCHDOG` log lines for 75 min. The most plausible
single point of failure is a panic inside `emit` (e.g. a blocked log
pipe) killing the loop with no defer/recover.

## Changes

**`cmd/ingestor/mqtt_watchdog.go`**
- Added `disconnectedReconnectMultiplier = 5` constant.
- Added `DisconnectedSinceUnix int64` (atomic) on `SourceLivenessState`.
Stamped on the first tick the source is observed disconnected; cleared
on any non-disconnected tick.
- `processLivenessTransition` now escalates: when `(now -
DisconnectedSinceUnix) > multiplier × threshold`, emits a `WATCHDOG
ESCALATION` WARN and calls `maybeForceReconnect` (subject to existing
`forceReconnectThrottle`). Distinct from the existing `LivenessStalled`
path so operators can grep escalation events independently.
- Added package-level `watchdogLastTickUnix atomic.Int64` +
`WatchdogLastTickUnix()` getter. The loop stamps it BEFORE per-source
processing — a wedged source-handler does not freeze the clock for an
external observer.
- `runLivenessWatchdogLoop` wraps each per-source
`processLivenessTransition` call in `func() { defer recover; ... }()` so
a panic in `emit` (or in any per-source code path) is logged and
skipped, not fatal. The loop continues to the next source and the next
tick.

**`cmd/ingestor/stats_file.go`**
- Added `WatchdogLastTickUnix int64` field on `IngestorStatsSnapshot`
(additive, `omitempty`); populated from `WatchdogLastTickUnix()` each
stats tick.

**`cmd/server/mqtt_status.go`**
- `MqttStatusResponse` gains `WatchdogLastTickUnix int64` (additive,
`omitempty`) sourced from the ingestor stats file; surfaced via `GET
/api/mqtt/status`.

**`config.example.json`**
- No new config field added — the multiplier is a code constant (5×) per
the issue's "N×threshold (e.g. 5×)" recommendation. The active
per-source `threshold` is the 5-minute scan threshold hard-coded at the
sole `runLivenessWatchdog` callsite (`cmd/ingestor/main.go:460`:
`runLivenessWatchdog(60*time.Second, 5*time.Minute)`), so escalation
fires at ~25 minutes (5 × 5min) of continuous disconnect — plus a
deterministic per-source jitter of 0..30s (#1810 round-1, see Taleb #4)
to avoid synchronized escalation across N sources sharing an upstream
broker outage.

## Acceptance (#1749)
- [x] Persistent `LivenessDisconnected` > N×threshold → force-reconnect
+ WARN
- [x] Watchdog goroutine liveness clock exposed (`WatchdogLastTickUnix`
in `/api/mqtt/status`)
- [x] Test: `IsConnectedFn` false for >5×threshold → assert
`ForceReconnectFn` invoked at least once
- [x] Test: panic in `emit` → assert loop recovers and continues ticking

## Test plan
- 4 new tests in `cmd/ingestor/mqtt_watchdog_1749_test.go` (all RED on
master, GREEN on this PR).
- Existing watchdog tests (`mqtt_watchdog_force_reconnect_test.go`,
`mqtt_reconnect_test.go`, r1/r2/m1 suites) continue to pass — the
escalation path is additive.

## Preflight
- TDD: red commit pushed and asserted to fail BEFORE green commit
landed.
- PII grep: clean on diff and PR body.
- Worktree: `_wt-fix-1749` on branch `fix/issue-1749`.

---------

Co-authored-by: corescope-bot <bot@corescope.dev>
Co-authored-by: openclaw-bot <bot@openclaw.local>
Co-authored-by: bot <bot@local>
2026-06-30 15:16:31 -07:00
Kpa-clawbot b74a64ccfa fix(ui): canonical payload label map across packets/live/packet-filter (#1799) (#1804)
## Summary

Replaces the three drifted per-surface payload-type label vocabularies
with a single canonical map keyed by firmware enum name.

Per the locked triage comment on #1799
([comment-4823975431](https://github.com/Kpa-clawbot/CoreScope/issues/1799#issuecomment-4823975431)):

> Create `public/payload-labels.js` exporting `{GRP_DATA: {short:'Group
Data', long:'Group data packet', enumId:6}, ...}`. Migrate `packets.js
typeMap`, `packet-filter.js FW_PAYLOAD_TYPES`, `live.js TYPE_COLORS
legend` to consume it. E2E that scrapes each surface and asserts label
equality.

## Changes

- **`public/payload-labels.js`** (new) — canonical map exposed as
`window.PayloadLabels` and `window.PayloadLabelsApi`. Keys are firmware
enum names; values carry `{short, long, enumId}` plus derived
`SHORT_BY_ID` / `FW_PAYLOAD_TYPES` / `TYPE_ALIASES` for legacy callers.
- **`public/packets.js`** — `TYPE_NAMES` + `typeMap` now read from
`PayloadLabelsApi.SHORT_BY_ID`. Literal kept only as a defensive
fallback for the case where the script tag fails to load.
- **`public/packet-filter.js`** — `FW_PAYLOAD_TYPES` + `TYPE_ALIASES`
now sourced from `PayloadLabelsApi`. Literal fallback retained so `node
test-packet-filter.js` still works headlessly.
- **`public/live.js`** — legend `<li>` rows are now generated from
`window.PayloadLabels` in stable order, killing the third-vocabulary
`Message — Group text` / `Direct — Direct message` drift the #1797
review surfaced.
- **`public/index.html`** — `<script src="payload-labels.js">` loaded
before `roles.js` / `packet-filter.js` / `packets.js`.
- **`test-issue-1799-label-vocab-e2e.js`** (new) — Playwright E2E.
Scrapes `#liveLegend` rows and the `/packets` type-filter checklist,
asserts each label matches `window.PayloadLabels[ENUM].short` for
`TXT_MSG`, `GRP_TXT`, `GRP_DATA`. Also verifies `window.PacketFilter`
still recognises the enum names.
- **`.github/workflows/deploy.yml`** — wired the new E2E into the
existing Playwright block.

## TDD trail

- Red commit `eb392d4` — adds the failing E2E only (asserts
`window.PayloadLabels` exists and labels match; both fail).
- Green commit `44e902a` — introduces the canonical map and migrates the
three surfaces.

## Verification

- `node test-packet-filter.js` — 92/92 pass with the new fallback
wiring.
- Preflight: `bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh
origin/master` — clean.

Browser verified: E2E `test-issue-1799-label-vocab-e2e.js` exercises
`/live` legend + `/packets` type filter against a Playwright headless
Chromium; CI's Playwright block runs it on every push.

E2E assertion added: `test-issue-1799-label-vocab-e2e.js:139` —
`assert(fromLegend === canon, ...)` and `assert(fromPackets === canon,
...)` per enum.

Fixes #1799

---------

Co-authored-by: mc-bot <bot@corescope>
Co-authored-by: openclaw-bot <bot@openclaw.local>
Co-authored-by: clawbot <clawbot@kpa.com>
Co-authored-by: clawbot <bot@clawbot.local>
2026-06-30 05:48:47 -07:00
Michael J. Arcan 9ae547ed7b test: de-flake distance-202 and anchor-bias tests (deterministic timing) (#1808)
Two server tests flaked intermittently and reddened CI on unrelated
(frontend)
PRs that merged master:

- TestDistanceConcurrentRequestsDuringBuildReturn202 asserted all 10
concurrent
requests get 202 'during the build window', but the lazy distance build
on the
tiny test DB finishes almost instantly, so on a fast machine some
requests
raced past it and got 200 (~50% flake). Add a nil-by-default
distanceBuildHook
seam on PacketStore (zero overhead in prod) that the test uses to hold
the
build open until all requests have been served — making the window
guarantee
  deterministic.

- TestHandleNodePaths_AnchorBiasInconsistency_Issue1278 queried /paths
right
  after store.Load(), racing the path-hop index that Load() builds in a
  background goroutine (#1008); the membership/canonical result was thus
  non-deterministic (rarer flake, worse under suite load). Wait for
  PathHopIndexReady() before querying.

Both run 30x green and pass -race. No production behavior change (hook
is nil).

Co-authored-by: Waydroid Builder <claude@michael.arcan.de>
2026-06-29 15:53:59 -07:00
Kpa-clawbot ec0ebeda2f fix(#1793): WebSocket CheckOrigin allowlist (block cross-origin scrapers) (#1795)
## Summary

Closes the wide-open `/ws` WebSocket upgrader (`CheckOrigin: return
true`) that lets any browser origin scrape live packet data. Replaces it
with an explicit allowlist consulted from `cfg.CORSAllowedOrigins`, plus
an implicit same-origin allowance and an empty-Origin (non-browser
client) allowance.

Fixes #1793.

## Rules (`Hub.checkOrigin`)

- Empty `Origin` header → **allow** (non-browser clients; per-IP
rate/deny gating tracked separately in #1794).
- `Origin` host == request `Host` (case-insensitive) → **allow**
(same-origin).
- `Origin` matches an entry in `cfg.CORSAllowedOrigins` by exact
case-insensitive match → **allow**.
- `"*"` in `cfg.CORSAllowedOrigins` is **deliberately ignored** for
`/ws`. A startup `[ws] WARNING:` is logged once when present.
- Anything else → **reject** (gorilla returns 403).

### Deliberate divergence from CORS XHR

CORS XHR (`corsMiddleware`) still honors `"*"` for read-only
cross-origin GETs. The `/ws` upgrade does NOT, per OWASP's WebSocket
Security Cheat Sheet:

> Use an allowlist, not a denylist. Avoid wildcards or substring
matching.

—
https://cheatsheetseries.owasp.org/cheatsheets/WebSocket_Security_Cheat_Sheet.html

`"*"` on the WS path would re-open the exact CSWSH/scraping vector this
PR closes, so it is rejected with a startup warning rather than silently
honored. This intentional asymmetry is documented in the updated
`_comment_corsAllowedOrigins` in `config.example.json`.

## TDD red → green

- `e5974c6a` **RED** — adds `cmd/server/websocket_checkorigin_test.go`
with five cases; `SetAllowedOrigins` introduced as an enforcement stub
so the test compiles and fails on the assertion (CI fails on this commit
by design).
- `a4791dc3` **GREEN** — implements `Hub.checkOrigin`, wires
`SetAllowedOrigins` from `main.go`, updates the config example. All
tests pass.

## Tests added (`cmd/server/websocket_checkorigin_test.go`)

- `TestCheckOriginRejectsForeignOrigin` — foreign Origin → 403
- `TestCheckOriginAllowsEmptyOrigin` — non-browser client → 101
- `TestCheckOriginAllowsSameHost` — same-origin → 101
- `TestCheckOriginAllowsAllowlistedOrigin` — exact allowlist match → 101
- `TestCheckOriginWildcardDoesNotAllowForeignOrigin` — `"*"` in
allowlist still rejects foreign origin → 403

## Files changed

- `cmd/server/websocket.go` — `Hub.allowedOrigins`, `SetAllowedOrigins`,
`checkOrigin`, wired into `Upgrader.CheckOrigin`.
- `cmd/server/main.go` — `hub.SetAllowedOrigins(cfg.CORSAllowedOrigins)`
at the single call site.
- `cmd/server/websocket_checkorigin_test.go` — new test file.
- `config.example.json` — updated `_comment_corsAllowedOrigins` to
document `/ws` gating and the `"*"` divergence.

## Out of scope (follow-up)

- **#1794** — per-IP rate limit / deny list / connection cap for
non-browser clients (which still bypass Origin because they don't send
one). Layered defense; not in this PR.

## Verification

- `go test ./cmd/server/...` — all server tests pass locally (574s).
- Preflight clean (`bash
~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master`).

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-06-29 05:48:58 -07:00
Michael J. Arcan ae2e3933dd feat(server): store memory diagnostics + drop redundant obs.RawHex (#1773)
Drops the redundant per-observation RawHex (~98MB on a live store;
reader already falls back to tx.RawHex #881) and adds an opt-in
/api/perf?mem=1 memory breakdown (flood-forward share + per-component
bytes). Profiled against a live instance.

**Savings substantiation:** live-instance profiling shows ~1.66M
observations in the store, each previously carrying its own
per-observation `raw_hex` (avg ≈118 hex chars ≈59 bytes) that exactly
duplicates the parent transmission's `raw_hex`. Dropping the duplicate
on every load/ingest path eliminates ≈98 MB of redundant in-memory
storage plus ~1.66M string allocations, with no data loss — the read
path (`enrichObs`) already falls back to `tx.RawHex` when `obs.RawHex`
is empty (verified by the new safety-gate test). The patched build
cannot be run against the live instance here; instead the new opt-in
`/api/perf?mem=1` diagnostic lets operators measure the
real before/after (`trackedMB` and the per-component breakdown) directly
after deploy.
2026-06-28 13:48:47 -07:00
Michael J. Arcan 3efa37c46c feat(server): complete the #672 4-axis repeater usefulness score (#1762)
Adds Coverage (harmonic reach) + Redundancy (Tarjan articulation) axes +
composite & grade. Closes #672.
**TDD note (BLOCKER-1):** Community PR delivered as a single squashed
commit, so there is no separate pre-fix failing-test commit — please
accept as a community-PR exemption. The tests are *gating*, not just
thorough: each axis test pins a specific topology outcome (coverage on
line/star/disconnected/weight-sensitive; redundancy
online/triangle/star/bridged-cliques), and an end-to-end `/api/nodes`
surface test drives the whole pipeline and asserts the composite
diverges from the Traffic axis. Inverting the `1/weight` distance,
dropping the NaN/Inf reject, removing the `redundancyMinWeight` floor,
or aliasing `usefulness_score` back onto `traffic_share_score` each
break a specific assertion. The axis functions are pure (no hidden
state), so the suite fully characterises the behavior without the red
anchor.

Co-authored-by: Waydroid Builder <build@waydroid.local>
2026-06-27 22:03:05 -07:00
Michael J. Arcan 17654dd090 docs(api): document per-node usefulness metrics in OpenAPI (#1769)
Documented Node schema (the four #672 usefulness axes + composite + A-F
grade + relay fields) and response schemas on the node endpoints.
Documentation-only; no behaviour change. Pairs with #1762 (documents the
metrics it adds).

Co-authored-by: Waydroid Builder <build@waydroid.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 15:01:05 -07:00
Michael J. Arcan fc26fb6b3a feat(#1751): show transported region scopes in repeater sidebar (#1752)
Closes #1751.
2026-06-27 14:59:50 -07:00
efiten f0763aecce fix(#1726): clear stale "varies" hash size once a node settles (#1788)
Fixes #1726.

## Problem

A MeshCore v1.16.0 repeater configured for 2-byte path hashes
(`path.hash.mode=1`) — e.g. `36f6c7c7…` (`DK_3400_RAK_TEST`) — kept
showing as **"varies"** / mixed 1-byte + 2-byte for the full 7-day
advert window.

Per the live data in the issue triage: of the node's ~20 recent adverts,
exactly **one** (2026-06-09, across 15 distinct observer paths) was a
genuine 1-byte flood advert; every other advert was 2-byte. The
flip-flop heuristic in `computeNodeHashSizeInfo` weighs that stale
advert equally with recent ones, so an operator who flips
`path.hash.mode` mid-flight (or a single old 1-byte advert) stays
flagged for the full window with no way to signal "the config is settled
now."

## Fix

Two coupled changes in `cmd/server/store.go` `computeNodeHashSizeInfo`:

1. **Chronological ordering.** `byPayloadType[4]` iterates in insertion
order, not timestamp order, so `HashSize = Seq[last]` could pick the
wrong advert under out-of-order MQTT ingest or chunked cold-load (the
"carmack" concern from triage). We now collect `(FirstSeen, size)` pairs
and **stable-sort by `FirstSeen`**; ties keep insertion order,
preserving prior behavior when timestamps are equal.

2. **Recency decay.** After `transitions >= 2` raises the flip-flop
flag, clear it when the most recent `hashSizeRecentAgreeCount` (= **3**)
non-zero-hop adverts all agree on a single size. A node still flapping
(recent adverts disagree) stays flagged. `3` mirrors the existing
≥3-observation threshold used to raise the flag.

## Policy note

Triage marked this **needs-operator-input** because the decay is a
behavior/policy change. This PR implements the rule the triage proposed
("if the last 3 adverts agree, clear inconsistent"), which matches the
reporter's stated expectation. Happy to adjust the threshold or gate it
differently per your call.

## Tests

`cmd/server/issue1726_hash_decay_test.go`:
- `TestIssue1726_SettledNodeNotInconsistent` — reporter's case
(`[2,1,2,2,2]` within window) → `Inconsistent=false`, `HashSize=2`.
- `TestIssue1726_HashSizeUsesChronologicallyLatest` — out-of-order
insertion still reports the chronologically-latest size.
- `TestIssue1726_ActiveFlapperStaysInconsistent` — a node whose recent
adverts disagree stays flagged.

Existing flip-flop / hash-collision tests unchanged and green; full
`cmd/server` package suite passes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Erwin Fiten <e.fiten@opteco.be>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 05:05:46 -07:00
efiten d437958474 fix(map): pin APC (Napa) and STS (Sonoma) observers (#1786) (#1787)
Fixes the map-coordinate gap in #1786.

## Problem

Observers tagged with IATA code **APC** (Napa County) or **STS**
(Charles M. Schulz–Sonoma County) render with no location and never pin
on the map.

## Root cause

`iataCoords` in `cmd/server/routes.go` is a hardcoded `IATA -> lat/lon`
lookup used purely for placing observer/region markers on the map. It
had no entry for APC or STS, so those observers had no coordinates to
render with.

This is **display-only**. Ingestion is not gated on these codes:
`IsObserverIATAAllowed` (`cmd/ingestor/config.go`) short-circuits to
`true` when the observer IATA whitelist is empty — which is the staging
configuration. The reporter''s "packets disappear entirely" symptom is
therefore **not** explained by this code path (likely an upstream
`meshcoretomqtt`/broker topic issue; needs operator `mosquitto_sub`
confirmation per triage).

## Fix

- Add `APC {38.2132, -122.2807}` and `STS {38.509, -122.8128}` to
`iataCoords`, matching the airports'' published coordinates.
- Add a regression test (`TestIataCoordsIncludesNapaAndSonoma`)
asserting both are present with the expected coordinates.

## Verification

- `go test ./cmd/server/` — full package passes (`ok`).
- `go vet ./cmd/server/` — clean.

## Scope note

Checked the repo for other statically-enumerable region codes
(`config.example.json` regions: SJC/SFO/OAK/MRY) — all already covered.
The broader "are other in-use codes missing" question can only be
answered against the live `cfg.Regions` + `db.GetDistinctIATAs()` set,
which is operational, not in-tree.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Erwin Fiten <e.fiten@opteco.be>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 02:44:28 -07:00
Kpa-clawbot 57956712e7 fix(#1768): Relay Airtime Share uses LoRa Time-on-Air (preamble-aware) — partial fix (#1776)
Partial fix for #1768 — Relay Airtime Share now uses closed-form LoRa
Time-on-Air instead of a payload-bytes-only proxy, removing the ~3-4×
bias against small frames (preamble + fixed-symbol intercept).

cross-stack: justified — backend score formula needs a frontend caption
change (`public/analytics.js` dumbbell preset banner + tooltip) so
operators can interpret the assumed PHY block. Both move together or the
metric is misleading.

## Red commit

`8da57062` — failing test asserts ToA-based score (~83.48 % ADVERT share
on the locked acceptance fixture) instead of the byte proxy's 95.24 %.
`internal/lora.TimeOnAir` was a zero-returning stub at the red commit;
tests failed with assertion errors, not build errors.

## Green commit

`dd402edd` — implements `lora.TimeOnAir` (Semtech AN1200.13 / SX126x
§6.1.4 closed form, cross-checked against RadioLib), wires `score =
TimeOnAir(payloadBytes, preset) × distinctRelays` in
`cmd/server/relay_airtime_share.go`, surfaces the preset in the JSON
response and analytics caption.

## Config (per AGENTS Config Documentation Rule)

New keys under existing `analytics` block:

```json
"loraPreset": { "freq": 869600000, "bw": 62.5, "sf": 8, "cr": 5 }
```

Defaults match the deployment's actual `get radio` (869.6 MHz / BW 62.5
kHz / SF 8 / CR 4/5). `CRC=1`, `IH=0`, `DE = (T_sym ≥ 16 ms)`, and the
SF-dependent preamble (32 for SF≤8 else 16, per firmware
`preambleLengthForSF` / MeshCore PR #1954) are firmware-fixed constants
in `internal/lora/toa.go` and intentionally NOT surfaced as config (per
re-triage).

## Scope

In-scope files (6):

- `internal/lora/toa.go` (new package — closed-form ToA)
- `internal/lora/toa_test.go` (table-driven preset tests)
- `cmd/server/relay_airtime_share.go` (wire ToA into score)
- `cmd/server/relay_airtime_share_test.go` (recomputed expected values)
- `cmd/server/config.go` + `config.example.json` (preset config keys)
- `public/analytics.js` (preset caption on dumbbell chart + tooltip)

Plus `cmd/server/go.mod` (replace directive for the new internal
module).

## Deferred to v2 (separate issues per re-triage)

- Per-observation SF/BW + radio-settings-aware dedup (blocked: ingestor
stores SNR/RSSI only, no SF/BW on observations).
- CR-per-hop dual-point sensitivity band (CR scales only the payload
symbol term `(CR+4)`, not the preamble/header; second-order accuracy
gain).
- Cross-SF bridge accounting.

## Tests

```
cd internal/lora && go test ./...           → PASS
cd cmd/server && go test -run RelayAirtime  → PASS
```

## Preflight overrides

- `check-branch-clean` (cross-stack): justified above — score formula
change requires matching caption update; both files trace to the same
issue.

---------

Co-authored-by: kpa-clawbot <kpa-clawbot@users.noreply.github.com>
Co-authored-by: Kpa-clawbot <bot@openclaw.local>
Co-authored-by: bot <bot@meshcore>
2026-06-22 10:07:49 -07:00
efiten 22fe929da2 feat: opt-in mobile client-RX coverage (crowdsourced RF reach) + /api/nodes/resolve (#1728)
Implements #1727.

## What this adds

**Mobile client-RX coverage** — an opt-in, crowdsourced RF-coverage
feature. A roaming MeshCore **companion** radio (driven by the
open-source [corescope-rx](https://github.com/efiten/corescope-rx) PWA,
GPLv3) reports which nodes it heard directly, tagged with the phone's
GPS and the packet's SNR/RSSI. CoreScope ingests these into a new
`client_receptions` table and renders per-node **hex coverage** on the
Reach page, plus a standalone **Coverage dashboard** (`#/rx-coverage`)
with a top-mobile-observers leaderboard.

Also includes **`GET /api/nodes/resolve?prefix=<hex>`** — a read-only
node-name lookup by pubkey prefix (`{name, pubkey, ambiguous}`), used by
the companion app for friendly names.

## Opt-in — default OFF (zero impact on existing deployments)

The whole feature is gated behind one config flag, **disabled by
default**:

```jsonc
"clientRxCoverage": { "enabled": false }
```

When disabled (the default): the ingestor writes **no**
`client_receptions`; the three coverage endpoints return a clean
**404**; the UI hides the Coverage nav link, the `#/rx-coverage` route,
and the Reach-page toggle. `/api/nodes/resolve` is always available (not
coverage-specific).

## How it works

```
companion ──BLE 0x88 (snr+rssi+raw)──▶ corescope-rx PWA ──▶ MQTT meshcore/client/{pubkey}/packets
                                                                      │
                                          ingestor (gated) ──▶ client_receptions (GPS + SNR + heard-key)
                                                                      │
              server: pure-Go hex grid ──▶ GeoJSON ──▶ Reach hex overlay + Coverage dashboard
```

- **Direct-only capture:** records only what the companion heard itself
and directly — a 0-hop advert's pubkey, or `path[last]` (last forwarder)
for FLOOD routes; ≥2-byte path-hash required. Upstream hops discarded.
- **No new deps:** hexbins are a pure-Go pointy-top grid over Web
Mercator (`cmd/server/hexgrid.go`) computed at query time
(`CGO_ENABLED=0` / `modernc.org/sqlite` friendly); frontend uses the
existing Leaflet.
- **Trust:** companion pubkey = identity; an EMQX ACL binds each client
to publish only to its own `meshcore/client/{pubkey}/packets` topic.
Payload contract in `docs/client-rx-coverage.md`.

## How to enable / try it

1. In `config.json`, set `"clientRxCoverage": { "enabled": true }` and
restart server + ingestor.
2. Point an EMQX (or any broker) listener so a client can publish to
`meshcore/client/<pubkey>/packets`; the ingestor already subscribes
under `meshcore/#`.
3. Run the [corescope-rx](https://github.com/efiten/corescope-rx) PWA on
an Android phone paired (BLE) to a MeshCore companion — it captures
heard nodes + GPS and publishes.
4. View results: per-node Reach page → toggle **coverage**, or the
**Coverage** dashboard at `#/rx-coverage`.

## What's where

- **Ingestor:** `cmd/ingestor/client_reception.go` (ingest), `db.go`
(`client_receptions` + `client_observers` schema), `main.go` (gated
dispatch), `config.go` (flag).
- **Server:** `cmd/server/rx_coverage.go` + `rx_dashboard.go`
(endpoints, self-guard 404 when off), `hexgrid.go` (pure-Go grid),
`node_resolve.go` (resolve), `routes.go` / `types.go` / `config.go`
(wiring + flag + `/api/config/client` field).
- **Frontend:** `public/rx-coverage.js` (dashboard),
`node-reach-coverage.js` + `.css` (overlay), `node-reach.js` (Reach
toggle, flag-gated), `roles.js` (reads the flag, hides nav when off).
- **Docs:** `docs/client-rx-coverage.md`.

## Testing

- Go: `cd cmd/server && go test ./...` and `cd cmd/ingestor && go test
./...` — green, including new gate tests (`coverage_gate_test.go` in
both: off → no rows / 404, on → works) and the rx-coverage / resolve /
hexgrid suites.
- JS: `node test-coverage-gate.js`, `node test-node-reach-coverage.js`
(wired into CI). The Playwright `test-node-reach-coverage-e2e.js` is
wired into the e2e job and **skips when `clientRxCoverage` is
disabled**, so it's safe under the default-off config.

## Notes for reviewers

- The four new routes are registered in
`cmd/server/openapi_known_gaps.json` (the existing OpenAPI-completeness
ratchet), matching how other not-yet-spec'd routes are tracked. Happy to
write full OpenAPI spec entries instead if you prefer.
- Commits are split per layer (ingestor / server endpoints / resolve /
frontend / CI) for review.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Erwin Fiten <e.fiten@opteco.be>
2026-06-19 11:37:16 -07:00
Kpa-clawbot 97833c523b fix(post-packets): use v3 observations schema (closes #1196) (#1704)
## Summary

`POST /api/packets` is broken on every v3-schema install — which is the
default since #1289. The handler issues two writes against legacy v2
column names and silently swallows the observation insert's error,
returning `200 OK` with `id>0` while persisting zero observation rows.

## Root cause

`cmd/server/routes.go:1225-1235` (pre-fix) used the v2 schema shape:

```go
INSERT INTO transmissions (... path_json ...)            // path_json removed in v3
INSERT INTO observations (transmission_id, observer_id,
                          observer_name, snr, rssi, timestamp)  // v2 columns
// timestamp written as RFC3339 text; v3 wants unix INTEGER
// second Exec's error was discarded
```

v3 schema (`cmd/ingestor/db.go:289-304`): `observations.observer_idx
INTEGER` (FK `observers.rowid`), `observations.timestamp INTEGER` (unix
epoch), `path_json` lives here not on `transmissions`.

Reporter [@EldoonNemar](https://github.com/EldoonNemar) called this out
precisely in #1196 — both the schema mismatch and the divergence between
the test harness (which uses the v3 shape) and the handler (v2 shape).

## Fix

`cmd/server/routes.go`:

- `transmissions` insert: drop `path_json` column.
- Observer resolution: `INSERT OR IGNORE INTO observers (id, name, ...)`
then `SELECT rowid` — mirrors the ingestor resolver at
`cmd/ingestor/db.go:778,906`.
- `observations` insert: write `observer_idx INTEGER` + `timestamp =
time.Now().Unix()`; `path_json` moved here.
- **Propagate both insert errors** (transmission + observation) as `500`
instead of swallowing them.

## TDD

| Step  | Commit  | Result |
| ----- | ------- | ------ |
| RED | `46d25389` | Test fails on master: `id=0` because the
transmissions insert references a column not present in v3. |
| GREEN | `dae57d67` | Test passes; round-trip persists the observation
with `observer_idx` resolved from the seeded `obs1` row and a unix-epoch
`timestamp`. |

Local repro:

```
# RED on the test commit alone:
$ go test -run TestPostPacketPersistsV3Schema -count=1 .
--- FAIL: TestPostPacketPersistsV3Schema (0.03s)
    routes_test.go:4755: expected transmission id > 0, got 0
        (body: {"id":0,"decoded":{...}})
FAIL

# GREEN on HEAD:
$ go test -run TestPostPacketPersistsV3Schema -count=1 .
ok  	github.com/corescope/server	0.037s
```

## Scope

Two files, both in `cmd/server/`:
- `cmd/server/routes.go` (+38/-12) — handler rewrite
- `cmd/server/routes_test.go` (+66) — round-trip regression test

No public API signature changes. No DB schema changes (consumes the
existing v3 schema correctly).

Closes #1196
2026-06-13 00:11:02 -07:00
Kpa-clawbot 76e130b313 fix(#1702): grant actions: write to release-fast-path workflow (#1703)
## Summary

Fixes the missing `actions: write` permission on
`.github/workflows/release-fast-path.yml` so the fallback `gh workflow
run deploy.yml` dispatch no longer returns HTTP 403.

## Triage verdict

From issue #1702 root-cause section:

> Fast-path workflow YAML likely lacks:
> ```yaml
> permissions:
>   contents: read
>   packages: write
>   actions: write   # MISSING — required to dispatch other workflows
> ```
> ## Fix
> One-line addition to `.github/workflows/release-fast-path.yml`
permissions block.

## Root cause

`.github/workflows/release-fast-path.yml` lines 16-18 (before this
change) only granted `contents: read` and `packages: write`. The
fallback step (`gh workflow run deploy.yml` when `:edge`'s
`org.opencontainers.image.revision` label doesn't match the tag SHA)
calls the GitHub Actions REST API, which requires `actions: write` on
`GITHUB_TOKEN`. Without it, the dispatch fails with `Resource not
accessible by integration` and the release stalls until an operator
manually re-runs the fast-path job after `:edge` rebuilds.

## Change

- `.github/workflows/release-fast-path.yml`: add `actions: write` to the
workflow-level `permissions:` block.
- `cmd/server/release_fast_path_workflow_test.go`: extend the existing
config-gate test (issue #1677) to require `actions: write` alongside the
previously asserted `contents: read` and `packages: write`.

Two commits, red→green:

1. `test(#1702): assert release-fast-path.yml requires actions: write` —
extends the assertion. Verified to fail on this commit
(`release-fast-path.yml: missing required permission "actions: write"`).
2. `fix(#1702): grant actions: write to release-fast-path workflow` —
adds the permission. Test green.

## TDD posture

The repo already had a YAML-config gate at
`cmd/server/release_fast_path_workflow_test.go` (parses the workflow as
text and asserts required permission strings). Strict TDD applied: red
commit extends the test, green commit fixes the workflow. No exemption
needed.

## Acceptance criteria (from #1702)

- [x] `permissions.actions: write` added to the fast-path workflow
- [ ] Manual test: tag a scratch SHA where `:edge` is stale; confirm
fallback dispatches deploy.yml without 403 — by-design out of CI scope
(would require a throwaway tag + race condition); covered by next real
release.
- [ ] Operator-felt: next release where notes-commit lands AFTER `:edge`
build completes works in one pass without manual rerun — verifiable only
on next release; in-scope of `Closes #1702` because bullet 1 (the
structural defect) is the cause of bullets 2 and 3.

## Preflight

`bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master`
→ **clean** (all hard gates pass, no warnings).

Closes #1702

---------

Co-authored-by: Kpa-clawbot <kpa-clawbot@users.noreply.github.com>
2026-06-13 00:10:59 -07:00
Kpa-clawbot e96f0f9f9f fix(#1694): port extended ACK decoder to server (ackLen/ackAttempt/ackRand parity) (#1695)
## Summary

Ports the firmware-1.16.0 extended ACK decoding from the ingestor (PR
#1618, issue #1610) into the server-side re-decoder. Previously
`cmd/server/decoder.go` silently dropped `ackLen`, `ackAttempt`, and
`ackRand` (and the multipart inner equivalents) — the server emitted
plain 4-byte ACKs even when the wire carried the 5/6-byte extended form.
Now both decoders agree byte-for-byte.

Closes #1694.

## What changed

- `cmd/server/decoder.go::decodeAck`: sets `AckLen` (capped at 6),
`AckAttempt` (`buf[4]` when `len>=5`), `AckRand` (`buf[5]` when
`len>=6`). Mirrors `cmd/ingestor/decoder.go:279-305`.
- `cmd/server/decoder.go::decodeMultipart` ACK branch: sets `InnerAckLen
= len(buf)-1` (capped at 6), `InnerAckAttempt`, `InnerAckRand`. Mirrors
`cmd/ingestor/decoder.go:696-714`.
- `Payload` struct gains six `*int` fields tagged `omitempty`: `AckLen`,
`AckAttempt`, `AckRand`, `InnerAckLen`, `InnerAckAttempt`,
`InnerAckRand`. Backward-compatible JSON — legacy 4-byte ACKs leave
attempt/rand nil and the fields are omitted from the output.

No other decoder consumer is touched. Routes / store auto-surface the
new fields via JSON marshaling.

## Test layout

`cmd/server/decoder_ack_extended_test.go` drives `decodeAck`
table-driven across the three wire shapes:

| Buffer | AckLen | AckAttempt | AckRand |
|---|---|---|---|
| `EF BE AD DE` (CRC only) | 4 | nil | nil |
| `EF BE AD DE 07` | 5 | 7 | nil |
| `EF BE AD DE 07 42` | 6 | 7 | 0x42 |

Plus `TestDecodeMultipartAckExtendedInner` for a 7-byte multipart buffer
(`0x33` header + 6-byte inner ACK), asserting `InnerAckLen=6`,
`InnerAckAttempt=7`, `InnerAckRand=0x42`.

## TDD trail

- **Red commit** (test + struct stubs only,
`decodeAck`/`decodeMultipart` unchanged) → assertions fail on
`AckLen=nil`.
- **Green commit** (port implementation) → all assertions pass.

Full `cd cmd/server && go test ./...` passes locally.

## Firmware refs

- `firmware/src/helpers/BaseChatMesh.cpp:218-234` (extended ACK layout)
- firmware commit `f6e6fdaa` (attempt counter)
- firmware commit `a130a95a` (RNG byte)

---------

Co-authored-by: Kpa-clawbot <bot@kpa-clawbot>
2026-06-12 19:10:44 -07:00
Kpa-clawbot a8c99c61fd fix(#1659): block analytics endpoint until first pass complete (503 Retry-After) (#1688)
## Summary

Fixes #1659 — analytics cards no longer show the post-restart slice when
"All data" is selected.

## Root cause

After server restart, `s.recompRF` / `s.recompTopology` /
`s.recompChannels` cache the FIRST computation, which is the small
in-RAM observations slice (background chunk-loader has not yet
backfilled history). The recomputer serves that slice through
`GetAnalyticsRFWithWindow`'s default shortcut for an entire recompute
interval, while the client pins it via `CLIENT_TTL.analyticsRF`. UX:
cards show a tiny window even when the user selects "All data".

## Fix shape (option B from the issue body)

Server-side per-recomputer warm-up gate:

- `cmd/server/analytics_warmup_1659.go` adds a per-recomputer
`firstPassDoneNs` atomic timestamp, set ONLY by the first successful
`runOnce()` (CAS-guarded for idempotency). `IsWarmingUp_1659()` /
`FirstPassDoneAt_1659()` are lock-free reads.
- `cmd/server/analytics_recomputer.go` `runOnce()` calls
`markFirstPassDone_1659()` after every successful compute.
- `cmd/server/routes.go` handlers for RF / Topology / Channels: when the
request is the default shape (`region=="" && area=="" &&
window.IsZero()`) AND the matching recomputer is still warming up,
return `503` + `Retry-After: 5` + `{"error":"analytics warming
up","retry_after_s":5}`. Windowed / region-filtered requests bypass the
gate (they already bypass the recomputer cache, so they are unaffected
by the warm-up bug).

Client-side:

- `public/app.js` `api()` helper retries any 503 response, honoring
`Retry-After`, with exponential backoff capped at 30s, max 6 attempts
(~63s total).
- Small "Computing analytics…" banner appears while any warm-up retry is
in flight, dismissed once the request resolves. Pages can override via
`window.onWarmup_1659`.

## Tests

RED commit `8b2b2d7` ships failing-on-assertion tests + a stub. GREEN
commit `2716c23` lands the fix and flips them green.

- `cmd/server/analytics_warmup_1659_test.go` — 3 cases: 503 during
warmup, 200 after first pass, windowed request bypasses gate.
- `test-1659-analytics-warmup.js` — 3 cases: Retry-After honored, retry
cap bounded, non-503 errors not retried. Wired into
`.github/workflows/deploy.yml`.

## Preflight overrides

- cross-stack: justified — server-side 503 contract MUST be paired with
client-side retry-and-banner handling; splitting across two PRs would
land a half-working fix.

Fixes #1659.

---------

Co-authored-by: corescope-bot <bot@corescope.local>
Co-authored-by: openclaw <openclaw@local>
2026-06-12 21:02:59 +00:00
Kpa-clawbot 048143f54f fix(#1690): cold-load uses last_seen (effective recency) instead of first_seen (#1691)
## #1690 — cold-load uses wrong time axis (RED → GREEN)

The on-disk DB has thousands of long-lived hashes with recent traffic.
Prod's
cold-load filter (`transmissions.first_seen >= cutoff`) is bound to a
column
that is set once at insert time and never updated — so re-observation of
an
old hash does not move it into the hot window. Result: prod cold-loaded
~0.3%
of the on-disk rows and flipped `backgroundLoadComplete=true` without
ever
walking the retention window (the `retentionHours - hotStartupHours <=
0`
short-circuit at line 1353 of `cmd/server/store.go`).

### Three sub-fixes

**A) Denormalize `transmissions.last_seen`** so cold-load can window on
effective recency.

- `internal/dbschema/dbschema.go::ensureTransmissionsLastSeenColumn`
adds the
  column + `idx_tx_last_seen` (single-column INTEGER ALTER + index; both
  PREFLIGHT-annotated as cheap metadata-only ops).
- `cmd/ingestor/db.go::OpenStoreWithInterval` schedules
  `tx_last_seen_backfill_v1` via `Store.RunAsyncMigration` —
`UPDATE transmissions SET last_seen = MAX(observations.timestamp) WHERE
  last_seen = 0` — non-blocking on boot (1.9M+ obs row scan in prod).
- Writer-side: `InsertTransmission` seeds `last_seen` on initial insert,
and every observation insert bumps `last_seen = ?` via prepared
statement
`stmtBumpTxLastSeen` (conditional `last_seen < ?` so out-of-order ingest
  never goes backwards).
- Reader-side: `cmd/server/store.go::Load`, `loadChunk`, and
  `cmd/server/chunked_load.go::LoadChunked` switch the WHERE/ORDER-BY
clauses to `t.last_seen` when the column is present (PRAGMA-detected via
  `DB.hasLastSeen`). Test/legacy DBs without the column fall back to
  `first_seen` so existing fixtures stay green.

**B) Honest `backgroundLoadComplete` gating.**

- Drop the `retentionHours - hotStartupHours <= 0` short-circuit. Prod
runs
  with both at 12h, which flipped Done=true immediately.
- After the chunk loop, query
`SELECT COUNT(*) FROM transmissions WHERE last_seen >= retentionFloor`
and
  compute `loadCoverageRatio = inMem / inDB`. Done=true only when
  `ratio >= 0.90` AND no chunk errors. `backgroundLoadFailed=true` +
  `backgroundLoadError` populated otherwise (e.g. `"loaded 20.0% of 5000
  rows (1000 in memory)"`).
- `bgErrMu`-guarded `loadCoverageRatio` + `backgroundLoadErr` so the
perf
  endpoint can read them without blocking the writer.

**C) Perf exposure.**

`PerfPacketStoreStats` gains `RetentionHours`, `OldestLoaded`,
`LoadCoverageRatio`, `BackgroundLoadError` — surfaces what fraction of
the
on-disk DB the in-memory store currently reflects, so operators can see
the
0.3% case in `/api/perf` without reading the logs.

### TDD trail

- **RED**: `05f0c6dd2bea6dc37324c548a49564d739aca920` — failing tests +
21-line
store.go scaffolding. CI on this commit failed on assertions (intended).
- **GREEN**: this PR's HEAD commit (8 files, +271/-24). Targeted suite:
  `Test1690_ColdLoad_TimeAxis`, `Test1690_BackgroundLoadHonesty`,
  `Test1690_PerfStats_NewFields`, `TestHotStartup_*`,
  `TestIssue1690_LastSeenUpdatedOnObservation` — all pass.

Anti-tautology: locally reverted the `if !s.backgroundLoadFailed.Load()`
guard around `backgroundLoadDone.Store(true)` —
`Test1690_BackgroundLoadHonesty`
fails on the assertion `"backgroundLoadDone=true with only 1000/5000
packets
loaded; must be false until coverage ≥ 90%"`. Restored.

### Async-migration preflight

- `ensureTransmissionsLastSeenColumn` — ALTER + CREATE INDEX both
  `// PREFLIGHT: async=true reason="..."` annotated.
- `tx_last_seen_backfill_v1` — wrapped in `Store.RunAsyncMigration`.
- `stmtBumpTxLastSeen` prepared statement — annotated; it is a row-level
  UPDATE BY PRIMARY KEY, not a migration.

### Preflight overrides

PREFLIGHT-MIGRATION-SCALE: <30s N=5K
- check-async-migration: justified for
`cmd/server/issue1690_cold_load_test.go`
CREATE TABLE/INDEX statements — these build an in-memory test fixture DB
  (≤5000 rows, runs in <1s in CI), not a prod migration.

Fixes #1690.

---------

Co-authored-by: meshcore-bot <bot@meshcore.local>
Co-authored-by: bot <bot@example.com>
2026-06-12 12:47:53 -07:00
Kpa-clawbot d910ea0208 feat(#1638): confidence rating weighted by hash mode (#1687)
Fixes #1638.

## Problem
`getConfidenceIndicator` in `public/nodes.js` treats every observation
as equal evidence, so a node seen 5 times via 1-byte hash prefixes
(which collide ~8-way across a typical mesh) scores the same as a node
seen 5 times via 6-byte prefixes (effectively unambiguous). The user
asked for confidence to respect ambiguity.

## Change
- `cmd/server/neighbor_graph.go` — new `CountsByMode map[int]int` on
`NeighborEdge`, bumped in `upsertEdge` / `upsertEdgeWithCandidates`
based on the observation's hash-prefix byte length (1/2/4/6). Merged in
`resolveEdge` when ambiguous→resolved edges collapse.
- `cmd/server/neighbor_api.go` — `NeighborEntry.counts_by_mode` exposed
(omitempty), and `dedupPrefixEntries` merges per-mode counts when an
unresolved prefix entry collapses into a resolved one. Flat `Count`
field preserved for back-compat.
- `public/nodes.js::getConfidenceIndicator` — weights observations by
mode: 1-byte=0.125, 2-byte=0.5, 4/6-byte=1.0. A single 6-byte sighting
counts ~8× a raw 1-byte one. HIGH triggers when EITHER the legacy
heuristic clears OR weighted count ≥3. Legacy entries without
`counts_by_mode` keep working (default weight 0.5).
- Tooltip now shows the per-mode breakdown (e.g. "Observations: 5
(1-byte: 3, 6-byte: 2)").

## TDD
- RED:
`cmd/server/neighbor_graph_test.go::TestBuildNeighborGraph_CountsByMode`
— fixture with 1/2/4-byte sightings asserts per-mode tally (commit
`838965f3`).
- RED: `test-confidence-indicator.js` — 6-byte mostly-sighted neighbor
must outrank 1-byte mostly-sighted neighbor at equal flat count (commit
`4bd5e18e`).
- GREEN: implementation in commit `7511606d`. All 4 JS tests pass; new
Go test passes; full Go suite passes (two pre-existing flakes unrelated,
both pass when isolated).

## Browser verification
Synthetic side-by-side of OLD vs NEW classifier against representative
inputs — see screenshot. 1-byte-only and 6-byte-only at the same flat
count diverge from MEDIUM/MEDIUM to MEDIUM/HIGH, and 3 6-byte sightings
now upgrade where 20 1-byte sightings stay MEDIUM.

## Preflight overrides
- check-branch-scope: cross-stack: justified — backend exposes the new
`counts_by_mode` field and the frontend consumes it; the whole point of
the change.

## Compat
- `Count` field unchanged in shape and value.
- `counts_by_mode` is `omitempty`; legacy persisted edges (loaded from
`neighbor_edges` via `neighbor_persist.go`) get no per-mode breakdown
and fall back to the default weight (0.5) — no UI regression.

---------

Co-authored-by: bot <bot@local>
Co-authored-by: corescope-bot <bot@corescope.local>
2026-06-12 11:38:43 -07:00
Kpa-clawbot efd66ea3f5 feat(mqtt): per-source status endpoint + Observers panel (#1682)
## Summary

Adds MQTT source status visibility per #1043 acceptance criteria:

- **Ingestor:** per-source counter registry
(`cmd/ingestor/source_status.go`) tracking `connected`,
`lastConnectUnix`, `lastDisconnectUnix`, `lastPacketUnix`,
`connectCount`, `disconnectCount`, `packetsTotal`, `packetsLast5m`
(sliding 5-min window via per-second buckets keyed by unix second — no
stale-leak), `lastError`. Wired at the existing OnConnect /
ConnectionLost / DefaultPublish callsites alongside the liveness
watchdog. Idempotent registration so counters survive reconnects.
Snapshot emitted in the existing stats file under `source_statuses`
(additive, `omitempty`).
- **Backend:** new `GET /api/mqtt/status` handler reads the ingestor
stats file and returns the per-source list. **Broker passwords are
masked** via a regex over the `scheme://user:pass@host` form (covers
mqtt/mqtts/tcp/ssl/ws/wss). Mask is also applied to `lastError` as
defense-in-depth (broker libs occasionally quote the failing URL).
OpenAPI completeness gate satisfied with a `routeDescriptions` entry.
- **Frontend:** small self-contained panel
(`public/mqtt-status-panel.js`) mounted above the Observers table.
Auto-refreshes every 10s, color-codes each row (green = connected +
recent packet, yellow = connected idle, red = disconnected), and tears
down its timer on SPA route change.

## TDD

- Red commit `f19a93b5` — stub `/api/mqtt/status` handler + assertion
test that the broker password is `****`-redacted. Test fails on the
assertion (handler passes the URL through verbatim). Compile-clean —
assertion-fail, not build-fail.
- Green commit `77042e41` — `maskBrokerURL` helper + table-driven unit
tests across all schemes + handler rewires to mask both `Broker` and
`LastError`.
- Subsequent commits land the ingestor wiring and the frontend panel.

## Tests

```
$ cd cmd/server && go test -run 'TestMqttStatus|TestMaskBrokerURL' -v ./...
PASS: TestMqttStatus_MasksBrokerPassword
PASS: TestMqttStatus_EmptyWhenNoStatsFile
PASS: TestMaskBrokerURL_Patterns (10 subtests)

$ cd cmd/ingestor && go test -run 'TestSourceStatus|TestSnapshotSourceStatuses' -v ./...
PASS: TestSourceStatus_BasicLifecycle
PASS: TestSourceStatus_Disconnect
PASS: TestSnapshotSourceStatuses_ReturnsAll

$ node test-mqtt-status-panel.js
7 passed, 0 failed
```

Full `go test ./...` clean in both `cmd/server` and `cmd/ingestor`.

## Preflight overrides

- `cross-stack`: justified — issue #1043 is intrinsically full-stack
(ingestor stats → server endpoint → observers panel). Per-stack split
would land an unreachable endpoint or a fetch with no backend.
- `check-xss-sinks` (public/mqtt-status-panel.js:55): justified — the
flagged `innerHTML=` is a fully-static literal (empty-state placeholder,
no payload data interpolated). All payload-bearing `innerHTML=` sites in
this file run through `escapeHTML` (defined in the same file); the test
`renderPanel never echoes a plaintext password (defense-in-depth)`
exercises the rendered HTML against payload strings.

## Acceptance criteria

- [x] `/api/mqtt/status` returns per-source connection state —
`cmd/server/mqtt_status.go`
- [x] UI panel shows all configured sources with live status —
`public/mqtt-status-panel.js`
- [x] Connection state updates on reconnect/disconnect events —
`MarkConnect` / `MarkDisconnect` wired in `cmd/ingestor/main.go`
- [x] Broker URLs don't expose passwords in the API response —
`maskBrokerURL` + 13 test cases
- [x] Works with 1-N sources — registry is keyed per-source, snapshot
iterates the map

**Partial fix for #1043** — per-packet `mqtt_source` attribution (the
issue's "Follow-up" section) is **deferred** per the `mc-bot-triaged:v1`
triage and the autofix comment ("Per-packet attribution deferred to
follow-up issue"). That work requires a new observation-row column and
DB schema migration, both explicitly out of scope for this PR.

Refs #1043

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-06-12 08:11:02 -07:00
Kpa-clawbot 2ef7d2437d fix(ci): release fast-path re-tag :edge → :vX.Y.Z when SHA matches (Fixes #1677) (#1680)
## Summary

Adds `.github/workflows/release-fast-path.yml`: a metadata-only re-tag
workflow that fires on `push.tags: v[0-9]+.[0-9]+.[0-9]+` and, when
`:edge`'s `org.opencontainers.image.revision` label matches the tag SHA,
applies `:vX.Y.Z`, `:vX.Y`, `:vX`, `:latest` to the existing edge
manifest via `crane tag`. No rebuild, no test re-run — ~seconds vs ~30
min today. If the SHA doesn't match (tag points to an older commit, or
`:edge` wasn't built yet), it dispatches the existing `deploy.yml`
pipeline as a fallback so validated bytes always ship.

To prevent double-fire, `deploy.yml`'s top-level `on:` block drops
`tags: ['v*']` — `release-fast-path.yml` is now the sole consumer of
`push.tags`. Edge publishing on master push is untouched.

## TDD

Red commit adds `cmd/server/release_fast_path_workflow_test.go` (two
tests: one asserts the new workflow exists with the required
trigger/permissions/markers; the other asserts `deploy.yml`'s `on:`
block no longer mentions `tags:`). Both fail on assertions in the red
commit. Green commit adds the workflow file + edits `deploy.yml`; both
pass.

## Acceptance criteria (from #1677)

- Tag-CI completes in <2 min when tag SHA == `:edge` revision →
fast-path is metadata-only, single short job
- Falls back to full pipeline on SHA mismatch → `gh workflow run
deploy.yml --ref ${{ github.ref }}`
- `:vX.Y.Z` has same digest as `:edge` → `crane tag` copies the
manifest, bytes are byte-identical
- No regression on older-SHA tags → fallback path runs the unchanged
full validation

Fixes #1677

---------

Co-authored-by: Kpa-clawbot <bot@corescope.local>
2026-06-12 05:52:06 -07:00
Kpa-clawbot 653d47e03c test(openapi): add CI completeness gate for /api routes (Phase 1 of #1670) (#1678)
## Summary

Partial fix for #1670 — **Phase 1 only** (CI completeness gate). Phase 2
(backfilling the 18 currently-undocumented routes into `openapi.go`) is
deferred to a separate issue per the triage on #1670 and is explicitly
out of scope here.

## What this adds

- `cmd/server/openapi_completeness_test.go` — AST-walks every
non-`_test.go` file in `cmd/server/`, finds string-literal first args to
`*.HandleFunc(...)` calls beginning with `/api/`, and diffs against the
paths declared in `routeDescriptions()` in `cmd/server/openapi.go`.
- `cmd/server/openapi_known_gaps.json` — seeded allowlist of the **18**
`/api/` routes currently registered via `HandleFunc` but not yet
documented in `openapi.go`.

## Ratchet pattern

From this branch forward, `TestOpenAPICompleteness` fails when:

1. A new `HandleFunc("/api/...")` is added without a matching entry in
`openapi.go` **or** the allowlist (regression gate — the main goal of
Phase 1).
2. A route in the allowlist is *also* documented in `openapi.go` — the
allowlist must shrink as Phase 2 backfills land, never go stale.

The two-commit history (red → green) demonstrates the gate works:

- **Red commit**: adds only the test. Fails on master with the 18
missing routes listed.
- **Green commit**: adds the allowlist seeded with that exact 18-route
set. Test passes at the current baseline.

## Local verification

- `go test ./cmd/server/ -run TestOpenAPICompleteness -v` → PASS at
baseline (`44/62 covered; 18 in allowlist; 18 gaps remain`).
- Ratchet validation: temporarily inserted
`r.HandleFunc("/api/ratchet-test-route", ...)` into `routes.go` → test
FAILED with that exact route name; reverted → test PASSES again.

## Files changed

- `cmd/server/openapi_completeness_test.go` (+203 / new)
- `cmd/server/openapi_known_gaps.json` (+24 / new)

## Preflight

`bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master`
→ all hard gates pass; no warnings.

## Out of scope

- Backfilling the 18 allowlisted routes into `openapi.go` (Phase 2 —
tracked separately).
- Schema validation of the spec against OpenAPI 3.0 (Phase 3 per the
issue).
- PR template checkbox update (Phase 2 follow-up).

Issue #1670 stays open for Phase 2.

---------

Co-authored-by: clawbot <bot@corescope.local>
2026-06-12 01:52:12 -07:00
efiten 938153dd92 fix(nodes): rebuild relay-hop history on startup from path_json (#1643)
## Problem

A relay node's **activity timeline** — and its per-node `packetsToday` /
observer counts — collapses to *"only the hour the server restarted"*
after every restart. Before the restart the timeline shows only the
node's own adverts (~1–2/hr); all of its relay activity piles into the
single post-restart hour.

## Root cause

All DB cold-load paths (`Load`, `loadChunk`, `scanAndMergeChunk`) index
relay-hop attribution into `byNode` **only** from
`observations.resolved_path`. But since #1287 the ingestor persists
relay data as aggregate `neighbor_edges` and **never writes
`resolved_path`** — it is `NULL` on every deployment (verified on a live
DB: 0 of ~440k rows populated). So relay attribution is never
reconstructed on startup; it only re-accumulates from live traffic
(`IngestNew*`, which re-resolves from `path_json` + the neighbor graph),
piling a relay node's whole history into the post-restart window.

## Fix

Server read-side only — **no schema / ingestor / migration change**.
When `resolved_path` is empty, re-resolve relay hops from the
already-persisted `path_json` using the in-memory prefix map + neighbor
graph (the same `resolvePathForObs` compute the live ingest path already
runs). `main.go` now loads the persisted neighbor graph *before* the
packet load so resolution has the graph available.

Two correctness details worth a close look:

1. **Fetch the prefix-map/graph snapshot BEFORE opening each load
cursor.** `getCachedNodesAndPM` issues its own DB query; doing so while
a load cursor is open deadlocks on a single-connection SQLite pool (the
test harness uses one).
2. **Index into `byNode` ONLY** — not the `resolved_path` / path-hop
indexes. Those are cross-checked by `handleNodePaths` against the
persisted `resolved_path` column (NULL here); populating them from an
in-memory re-resolution would make that SQL confirmation fail and
wrongly drop the tx from paths-through (#1352).

## Tests

New coverage asserts a relay pubkey reachable *only* via `path_json`
lands in `byNode` after a restart-style load, for both the hot-window
(`LoadChunked`) and background-window (`loadChunk`) paths. Existing
#1558 (`resolved_path`) and #1352 (paths-through) tests still pass. Full
`cd cmd/server && go test ./...` is green under `-race`.

## Perf

The fallback runs `resolvePathForObs` per observation with a non-empty
`path_json` during cold load — the same per-packet compute the live
ingest path already performs, so no new asymptotic cost. The prefix map
+ graph are snapshotted **once per load** (not per row);
`getCachedNodesAndPM` is 30s-cached. In `loadChunk` the resolution runs
in the existing lock-free scan and is accumulated locally, matching that
function's "build local, merge under lock" design.

## Note on a pre-existing flaky test

`TestDistanceConcurrentRequestsDuringBuildReturn202` is timing-fragile
(fails ~1/15 on `master` without this change). It relies on the lazy
distance build being slow because it's the first caller of
`getCachedNodesAndPM` (cold cache). This PR pre-warms that cache during
`Load`, narrowing the build window, so the test fails more often in
**non-race** local runs. It passes reliably under `-race` (CI mode),
where the build stays slow. Flagging in case you want to harden the test
separately.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: openclaw-bot <openclaw-bot@users.noreply.github.com>
Co-authored-by: openclaw-bot <bot@openclaw>
2026-06-11 11:36:49 -07:00
Kpa-clawbot 825b26485c fix(#1181): hide nodes whose name starts with a configured prefix (#1655)
Fixes #1181.

## Summary

Adds operator-configurable name-prefix hiding for nodes. When a node's
name starts with any prefix listed in the new `hiddenNamePrefixes`
config field (default `["🚫"]`), it is omitted from `/api/nodes`,
`/api/nodes/search`, and `/api/nodes/{pubkey}`. DB rows are preserved —
the filter runs at the API layer only, so observation history (paths,
hops, distances) stays intact and the node simply re-appears if the
operator clears the prefix list.

This mirrors the convention already in use on other MeshCore map
dashboards: an operator who wants their node hidden renames it with the
🚫 prefix and sends an advert; the next advert is then dropped from the
dashboard. The node is **not** hidden from the mesh itself — only from
this dashboard. This is documented inline in `config.example.json`.

Implementation follows the existing `IsBlacklisted` pattern exactly: a
new `Config.IsNameHidden(name)` method, and three filters in `routes.go`
placed alongside the corresponding blacklist filters. No DB schema,
public API, or websocket changes.

## Files changed

- `cmd/server/config.go` — new `HiddenNamePrefixes []string` field +
`IsNameHidden` method
- `cmd/server/routes.go` — filters in `handleNodes`, `handleNodeSearch`,
`handleNodeDetail`
- `config.example.json` — new field + `_comment_hiddenNamePrefixes`
operator doc
- `cmd/server/hidden_name_prefix_1181_test.go` — new test file (red →
green)

## Test plan

Two new subtests in `TestHiddenNamePrefix_1181_*`:

1. `_NodesList` — inserts a node named `🚫 ban me`, asserts it is present
when `HiddenNamePrefixes` is empty and absent when set to `["🚫"]`.
2. `_Search` — inserts `🚫 search me`, asserts
`/api/nodes/search?q=search` does not surface it when the prefix is
configured.

Verified red→green:

- Red commit `d0903852`: `go test -run TestHiddenNamePrefix_1181` fails
on the leak assertion (`hidden_name_prefix_1181_test.go:94`).
- Green commit `e79a0d8d`: same command passes.

```
$ cd cmd/server && go test -run TestHiddenNamePrefix_1181 -count=1 .
ok  	github.com/corescope/server	0.060s
```

## Out of scope

- Auto-purging DB rows for hidden nodes — left to existing retention.
The triage was explicit: hide, do not delete.
- Live websocket broadcast: nodes are not broadcast via websocket (only
packets), so no separate emit path needs filtering. Frontend reads nodes
via `/api/nodes`, which is filtered.
- Frontend customizer for the prefix list — operators configure via
`config.json` like every other knob.
2026-06-11 10:10:12 -07:00
Kpa-clawbot e04c7113cb feat: integrate hashtag channels from meshcore-channels catalogue (#1323) (#1656)
Fixes #1323

## Summary

Adds a small in-memory cache of the community-maintained
hashtag-channels
catalogue (`marcelverdult/meshcore-channels`) and exposes it as
`GET /api/known-channels?region=XX` plus a collapsed sidebar section on
the Channels view ("Known channels (catalogue)") with a one-click
"+ Add" button per row.

Per triage (#1323): new `cmd/server/known_channels_cache.go`, new
`GET /api/known-channels?region=…`, frontend section in
`public/channels.js`. No new DB tables — cache is in-memory only.

## What changed

- `cmd/server/known_channels_cache.go` — `knownChannelsCache` with an
  atomic snapshot pointer, 24h default refresh, 30s HTTP timeout, 4 MB
  body cap, custom `User-Agent`. Fail-soft: a failed refresh leaves the
  last-known snapshot in place. Background goroutine started from
  `main.go` after the neighbor-graph recomputer; never blocks startup.
- `cmd/server/known_channels_route.go` — `GET
/api/known-channels?region=`
  serves the cached snapshot off the atomic pointer (never blocks on
  upstream). Region filter is case-insensitive ISO 3166-1 alpha-2.
  Empty/missing cache returns 200 with an empty entries list (fail-soft
  for the UI).
- `cmd/server/config.go` — `KnownChannelsURL` +
`KnownChannelsRefreshMs`.
- `config.example.json` — example values + `_comment_knownChannels`.
- `public/channels.js` — new collapsed sidebar section "Known channels
  (catalogue)" that lazy-fetches `/api/known-channels` on first render
  and renders rows with a "+ Add" button. The button calls the existing
  `addUserChannel(name)` path, so adding catalogue channels reuses the
  full save-key + decrypt flow that user-typed hashtags already use.
- `cmd/server/known_channels_cache_test.go` — failing-first tests:
  - `TestKnownChannelsParseFixture` asserts the parser populates
    `GeneratedAt`/`License` and region-stamps every entry while skipping
    empty countries.
  - `TestKnownChannelsRouteRegionFilter` asserts the route returns 200
    with exactly the filtered subset for `?region=be`.
  - `TestKnownChannelsFailSoftOn500` asserts a failed upstream fetch
    leaves the prior snapshot in place and bumps `failCount`.

## Upstream pinning

The default URL is pinned to the specific file
`channels-by-country.json`
on `main`:

>
https://raw.githubusercontent.com/marcelverdult/meshcore-channels/main/channels-by-country.json

Shape (verified 2026-05-24):

```json
{
  "generated_at": "...",
  "license": "CC0-1.0",
  "countries": { "be": [{"channel": "#antwerpen", "description": "..."}], ... }
}
```

## Test plan

```
cd cmd/server && go test -run 'TestKnownChannels' -count=1 .
ok  	github.com/corescope/server	0.008s
```

Red commit: 5c43cff3 (all three tests fail on assertions, build clean).
Green commit: 54a1080e (parser + cache + route implemented, all three
pass).

## TDD evidence (red → green)

- **Red commit `5c43cff3427afd8aa2f3cce20c31058190aebc37`** — tests
added
  with stub implementations that compile but return zero/empty so each
  test fails on an assertion (not a compile/import error). `go test -run
  TestKnownChannels` output captured in the commit message.
- **Green commit `54a1080e45fd2e10da2caa156f376bf4d0212976`** — parser,
  cache, route, main-wiring, frontend section land; all three tests
  pass.

## Frontend verification

Browser verified: http://analyzer-stg.00id.net/#/channels (with the
`/api/known-channels` response stubbed in DevTools to simulate the cache
being populated on staging, which is still on master and doesn't have
the new endpoint yet).

E2E assertion added: cmd/server/known_channels_cache_test.go:71 —
asserts the route returns 200 and the response body's `entries` length
matches the filtered subset.

## Limitations / follow-ups (not in scope of this PR)

- The catalogue only ships PSK keys for a small subset of entries (the
  upstream schema makes `key` optional). For entries WITHOUT a `key`,
  the "+ Add" button still wires through `addUserChannel("#name")` —
  which derives the standard public-channel key from the name (the same
  path used today when a user types `#foo` into the Add Channel modal).
  For entries WITH a `key`, a follow-up PR can pass the key through to
  `addUserChannel` so the UX matches "paste-a-PSK". Today the key is
  shown in the JSON payload but not yet wired into the FE button.
- No deduplication against the in-memory `/api/channels` list — the
  catalogue section is intentionally separate so the user sees which
  channels exist worldwide even if their server hasn't seen traffic.
- No per-section region selector yet — the section shows the full
  catalogue regardless of the page-level region filter. Future work:
  add a dropdown.

## Preflight

```
═══ Preflight clean. ═══
```

cross-stack: justified — issue #1323 spans `cmd/server` (cache + route)
and `public/channels.js` (sidebar surface); same feature, both halves
required.

---------

Co-authored-by: Kpa-clawbot <bot@corescope.local>
2026-06-11 07:38:36 -07:00
Kpa-clawbot 1116801b2f M5: emoji → Phosphor Icons — settings & customize (#1648) (#1653)
**Red commit:** `851cc8c3a024b1675558092d772444bf4f1ec625` — failing
test on a stub branch (will link CI run after PR opens).

Partial fix for #1648 (M5 of 6). **Do NOT close the tracking issue** —
M6 (server-side residual emoji sweep + lint gate) still pending.

## Per-file swap counts

| File | Phosphor `<use>` refs | Notes |
|---|---|---|
| `public/customize.js` | 20 | DEFAULTS → `ph:<name>` tokens; render
path keeps legacy emoji branch (back-compat) |
| `public/customize-v2.js` | 26 | same as v1; cv2 overrides path
unchanged |
| `public/home.js` | (helpers added) | `_renderHomeGlyph` /
`_renderHomeLabel` accept both `ph:<name>` and legacy emoji |
| `public/geofilter-builder.html` | 5 | clear / undo / save / load
buttons (+inline `.ph-icon` CSS) |
| `public/audio.js` | 1 | audio unlock prompt |
| `public/filter-ux.js` | 5 (3 new) | help popover star + close,
saved-filter delete |
| `public/style.css` | 0 | `#chList .ch-share-btn::before { content: '📤'
}` removed; JS now renders an inline sprite |
| `cmd/server/routes.go` | (6 `ph:` tokens) | onboarding home defaults
updated in lockstep with customize-v2.js |

## Operator config back-compat — PROMINENT

Per design call #1 (user-locked): existing operator-stored emoji values
in `config.json` / `localStorage` are **NOT** touched. The render path
supports both:

```js
function renderConfigGlyph(value) {
  var m = String(value || '').match(/^ph:([a-z][a-z0-9-]+)$/);
  if (m) return '<svg class="ph-icon"><use href="/icons/phosphor-sprite.svg#ph-' + m[1] + '"/></svg>';
  return esc(value);  // EMOJI-OK-LEGACY-RENDER — operator-stored emoji/text path
}
```

Defaults flipped to `ph:<name>` tokens, so new operators (and operators
who hit "Reset to Defaults") see Phosphor sprites. Operators with stored
emoji values continue to see their emoji exactly as before. Verified
end-to-end (see E2E (b) below).

## cmd/server/routes.go — changed in lockstep

Per design call #2: the home-defaults `steps` / `footerLinks` mirror the
JS DEFAULTS, so they MUST update together. routes.go now emits
`ph:<name>` tokens; the frontend home-render path resolves them.
Existing tests (`TestConfigThemeHomeDefaults`) still pass — they assert
structure, not glyph values.

## E2E assertions added

- `test-issue-1648-m5-emoji-scan.js` — per-file zero-emoji + ph-token
DEFAULTS + sprite presence
- `test-issue-1648-m5-icons-e2e.js`:
- (a) customize chrome — tabs/header rendered as sprites; chrome text
icon-free
- **(b) back-compat — injects fake `🐙` operator step into localStorage,
reloads, opens customize, asserts the emoji renders verbatim in both the
input value AND the live preview span; asserts the ph-token step renders
as a sprite** (design call #1 in action)
  - (c) `/channels` modal sprite count
  - (d) `/audio-lab` sprite presence
  - (e) `geofilter-builder.html` control buttons sprite-driven
  - (f) every `<use>` resolves to a defined symbol id

## Out of scope (M6 cleanup)

- cmd/server/routes.go residual server-rendered emoji **not** tied to
customize defaults (none found by my grep — file already audited)
- `make lint-no-emoji` CI grep gate (M6 owns it)
- `public/icons/README.md` workflow doc

cross-stack: justified — design call #2 requires Go + JS update
together.

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-06-11 05:04:29 -07:00
Kpa-clawbot 8295c2115c fix(reach): bust response cache on blacklist change (#1629) (#1636)
Red commit: 178617ca7b (CI run:
https://github.com/Kpa-clawbot/CoreScope/actions/runs/27191921487 —
red-state was verified locally; CI on this branch runs against green
HEAD per pull_request triggers)

Fixes #1629

## Summary

`/api/nodes/{pubkey}/reach` cached responses survived blacklist
mutations for up to the 5-minute TTL. A node added to `NodeBlacklist`
after a recent reach request was still served the cached non-blacklisted
payload until the entry expired.

## Fix (per triage)

Per @Kpa-clawbot's locked fix path on the issue:

1. Add a monotonic `BlacklistGeneration()` counter on `*Config`.
2. `SetNodeBlacklist` (new setter) atomically replaces the slice,
rebuilds the lookup set under an `RWMutex`, and bumps the generation via
`atomic.AddUint64`.
3. `cmd/server/node_reach.go` folds the generation into the cache key
(`"<pubkey>|<days>|g<gen>"`) so any mutation invalidates prior entries
on the next request — no callbacks bolted onto the setter, no
cache-layer surgery, no TTL change.

While here, the latent bug in `blacklistSet()` is also fixed:
`sync.Once` locked in the initial set, so a later `SetNodeBlacklist` was
invisible to `IsBlacklisted`. The `Once` still gates the lock-free
initial build; mutations rebuild under `RWMutex` and reads take an
`RLock` around the map handoff.

## Files

- `cmd/server/config.go` — `SetNodeBlacklist`, `BlacklistGeneration`,
`rebuildBlacklistSetLocked`, `RWMutex`. `IsBlacklisted` reads the
rebuilt set (no stale-slice short-circuit).
- `cmd/server/node_reach.go` — `cacheKey` includes `|g<gen>`.
- `cmd/server/node_reach_blacklist_cache_test.go` — new regression test
(the red commit).
- `cmd/server/node_reach_endpoint_test.go` — existing cache-hit
assertion updated to the generation-suffixed key.

## TDD evidence

- Red commit `178617ca` adds the test + a deliberate `SetNodeBlacklist`
stub that only reassigns the slice. The test fails on the post-blacklist
assertion: `status=200 want 404 (cached payload was served — #1629)`.
- Green commit `257c104f` replaces the stub with the real
implementation; full `go test ./...` and `go test -race -run
"TestNodeReach|TestNodeBlacklist|TestConfig"` pass locally.

## Scope

- One narrow PR. Backend only — no frontend or API response-shape
change.
- No public type signatures touched beyond the new exported
`SetNodeBlacklist` / `BlacklistGeneration` on `*Config`.
- Preflight: all hard gates pass (PII, branch scope, red commit, CSS,
LIKE/JSON, sync/async migration, XSS).

---------

Co-authored-by: corescope-bot <bot@corescope.local>
Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-06-09 03:23:48 -07:00
Kpa-clawbot 078225a54e perf(neighbor_api): fold first_seen into cached map — fix #1627 r3 regression (#1632)
## TL;DR
Post-merge regression introduced by #1627 r3 (commit `e2212f50`):
`buildNodeInfoMap` in `cmd/server/neighbor_api.go` ran an uncached
`SELECT … FROM nodes` scan on every call. Folded `first_seen` into the
already-cached `getCachedNodesAndPM` (30s TTL) so the 4 hot handlers
that call `buildNodeInfoMap` no longer pay for a full table scan per
request.

## Before / After

`buildNodeInfoMap` is called by **4 hot handlers**:
- `cmd/server/neighbor_api.go:130`
- `cmd/server/neighbor_api.go:297`
- `cmd/server/neighbor_debug.go:83`
- `cmd/server/node_reach.go:421`

| | Before | After |
|---|---|---|
| `SELECT … FROM nodes` per call | 1 (uncached) | 0 (cache hit) |
| `SELECT … FROM observers` per call | 1 (uncached) | 1 (unchanged) |
| At Cascadia scale (~2600 nodes) | full scan × 4 handlers × N req/s |
one scan / 30s |

## How

- Extended the `getAllNodes` schema probe to also `COALESCE(first_seen,
'')`. Falls back through the existing richest → leanest ladder if the
column is missing.
- `nodeInfo.FirstSeen` is therefore populated for every cached entry in
`getCachedNodesAndPM`.
- `buildNodeInfoMap` drops its second `SELECT` entirely and just copies
`nodeInfo` values out of the cached map.
- Public signature of `buildNodeInfoMap` is unchanged.
`node_reach.go:421` still sees `nodeInfo.FirstSeen` populated, served
from cache.

`cmd/server/store.go` is touched because `getAllNodes` is the only
sensible owner of the `first_seen` SELECT — adding a parallel cache
would duplicate the 30s TTL machinery this fix is designed to leverage.

## Test (red → green)

- Commit 1 (`test:`): `TestBuildNodeInfoMap_FirstSeenIsCached` — calls
`buildNodeInfoMap`, mutates `first_seen` out-of-band via a separate rw
connection, calls it again, and asserts both calls return the same
(cached) value. Fails on `origin/master` (call 2 sees the mutated value,
proving the uncached scan).
- Commit 2 (`perf:`): the fold. Test now passes.

## Refs

Post-merge audit identified this as the only MAJOR finding from #1627;
recommendation was a follow-up hot-fix PR. This is that PR.

---------

Co-authored-by: openclaw-bot <bot@openclaw>
Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-06-09 01:24:46 -07:00
Kpa-clawbot 43be1bb76a fix(reach): scanReachRows DB errors must surface as 500 not 404 (#1631) (#1635)
Red commit: 67088342ec (CI run: pending)

## Summary

Fixes #1631 — `scanReachRows` swallowed `QueryContext` / `rows.Err()`
failures and returned `nil`. The handler treated that as "genuinely no
reach" and rendered a 200 with empty arrays (or 404 in some flows), so
transient SQLite failures surfaced to operators as "this node has no
reach" — misleading and undiagnosable without log access.

## Fix

`cmd/server/node_reach.go`:
- `scanReachRows` now returns `([]pathRow, error)`; propagates
`QueryContext` + `rows.Err()` failures.
- `computeNodeReach` signature gains an error return: non-nil error
means real backend failure (NOT "unknown node").
- `handleNodeReach` renders **500** on that error path and does **NOT**
cache the failure (next request retries cleanly). Genuinely-empty reach
still renders **200** with empty arrays; unknown/blacklisted nodes still
render 404.

## TDD

- Red commit `67088342`: adds `TestNodeReach_ScanDBErrorReturns500` —
warms the integration DB, drops the `observations` table, asserts
handler returns 500. Pre-fix this got 200 with empty arrays.
- Green commit `5408be3a`: the fix + caller updates. Adds
`TestScanReachRows_ErrorReturn` (unit-level: closed-DB → non-nil err).
- `TestNodeReach_ShapeAndClamp` had to be tightened: the v2 fixture's
`observations` table was missing `observer_idx`; the swallowed error
masked that schema gap. Now rebuilt with the right shape.

## Scope

- `cmd/server/node_reach.go` — fix.
- `cmd/server/node_reach_endpoint_test.go` — new red test +
ShapeAndClamp fixture fix.
- `cmd/server/node_reach_test.go`, `node_reach_bench_test.go` — caller
updates for new signature + one new unit assertion test.

No cache changes (#1629 is separate). No sibling refactors. No frontend.

## Verification

- `go test ./cmd/server/...` — green (48s, all tests).
- pr-preflight — clean (PII, scope, red-commit, CSS vars, LIKE-on-JSON,
async-migration, XSS).

---------

Co-authored-by: clawbot <bot@kpa-clawbot.local>
2026-06-09 00:27:56 -07:00
efiten e2212f5015 feat(nodes): per-node Reach page + GET /api/nodes/{pubkey}/reach (v2, review-complete) (#1627)
Re-submission of #1625 (which was merged early, then reverted in #1626)
— now with **all three round-1 reviews addressed** so it lands in one
hardened state instead of as post-merge follow-ups.

## What

Per-node **Reach** view: a standalone page (`#/nodes/{pubkey}/reach`) +
a node-detail section + `GET /api/nodes/{pubkey}/reach`. It shows which
nodes a node has a **stable two-way RF link** with, derived from raw
`path_json` adjacency (a path travels origin→observer, so `[A,B]` ⇒ B
heard A). A link is bidirectional when both directions have
observations; the **bottleneck** (weaker direction) rates two-way
reliability. Nodes are identified only by **unique 2–3 byte** path
prefixes (1-byte collides → excluded).

## Review fixes folded in vs #1625

**Performance (Carmack):** hard scan LIMIT (200k) + modest prealloc;
`json.Unmarshal` replaced by a single-pass `parsePathTokens` (100k-row
scan 2.2M→1.3M allocs, 344→203ms); memoized resolver; size-hinted maps
(attribution over 100k rows: 102 allocs); `context.Context` plumbed;
cache `RWMutex` + evict-oldest (no full wipe); singleflight dedup;
degree/rank from a 60s shared snapshot; bench rewritten (ReportAllocs,
1k/10k/100k, mixed-payload, isolated attribution).

**Correctness/safety + tests (Independent + Kent Beck):** pubkey
validation → 400; error logging instead of silent swallow (first_seen /
degree / marshal→500 / discarded rows); `public_key=?` index use;
canonical `PayloadADVERT`; `min()` builtin; documented cache-slice
immutability; mux ordering comment. New tests: scanReachRows decode,
3-byte token branch, non-advert first-hop guard, observer SNR
aggregation across rows, HTTP-level attribution (asserts non-zero
we_hear/they_hear), 400/404/blacklist/cache-hit.

**UI / a11y / Tufte:** in-map legend (tiers + thresholds); dropped the
colour+width double-encoding (constant width, colour-only); colour-blind
glyphs (●●●/●●/●) + tier title beside the bottleneck number; dark-theme
`--link-*`; lighter table (horizontal rules, sentence-case headers); map
built once + link layer updated in place on toggle (no flicker);
time-range no longer flashes a loader; `destroy()` generation guard;
statCard escaping; scoped `@media print` to `#nq-report`;
`fieldset/legend` + `for/id` toggles; `aria-pressed` / `aria-live` /
back-link `aria-label`; "distance (km)" + bottleneck tooltip + no-GPS
note; inline styles → CSS; decorative emoji removed.

**Docs:** api-spec documents the 5-min cache, 200k scan cap, and 400.

## Testing
- `cmd/server` full suite green; reach unit + endpoint + bench all pass.
- `eslint public/*.js` (no-undef) and the XSS-sink gate clean.
- E2E updated: request status checks + exact (non-tautological) toggle
assertions + hard map-render assert.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


---

## TDD-history note (Kent Beck gate)

This branch carries production + tests together, not a fabricated
red→green sequence. That's deliberate: the branch was rebased onto
upstream and the intermediate SHAs were squashed, so reconstructing a
"failing-test-first" commit after the fact would be theatre, not
evidence — and rewriting history to stage it would be dishonest. The
behaviour is instead covered by a comprehensive, anti-tautological suite
(directional attribution edges, 3-byte token branch, non-advert
first-hop guard, observer SNR aggregation, HTTP-level attribution
asserting non-zero counts, scan-cap truncation, zero-reach 200-not-404,
companion mis-attribution, cache eviction). Requesting maintainer
acceptance of the work on test *substance* rather than commit
*choreography*; the net-new-UI exemption is not claimed for the server
endpoint.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: meshcore-bot <bot@meshcore>
2026-06-08 22:13:02 -07:00
efiten 9c5faab1e4 Revert "feat(nodes): per-node Reach page (#1625)" (#1626)
Reverts #1625.

#1625 was merged before the round-1 reviews (Independent / Kent Beck /
Tufte) were addressed. Reverting to land it cleanly: a fresh PR will
re-add the feature with the perf pass, the backend correctness/safety +
test-coverage fixes, and the UI/a11y (Tufte) batch folded in, so it goes
through review in a single hardened state rather than as a string of
post-merge follow-ups.

No functional loss — the feature returns in the replacement PR.
2026-06-08 12:35:12 +00:00
efiten 47f85f6c4c feat(nodes): per-node Reach page + GET /api/nodes/{pubkey}/reach (directional link quality) (#1625)
## What

Adds a per-node **Reach** view that answers "how well does this specific
node hear, and get heard by, its neighbours?" — both as a standalone
page (`#/nodes/{pubkey}/reach`) and as a section on the node detail
page.

New endpoint: **`GET /api/nodes/{pubkey}/reach`**.

## What it measures

For the target node it derives, from raw `path_json` adjacency (a path
travels origin→observer, so in `[A,B]` B received A directly):

- **Directional link counts** per neighbour: `we_hear` (how often we
received them) vs `they_hear` (how often they received us).
- **Bidirectional / bottleneck**: a link is two-way stable when both
directions > 0; the weaker direction is the bottleneck and rates real
two-way reliability.
- **Importance**: neighbour degree + rank, relay-observation volume,
bidirectional-link count, direct-observer count.
- **Direct observers**: who received the node at 0 hops, with SNR.

Reliability rule: a neighbour is only attributed when its pubkey
**prefix is unique** at the path's byte length (collisions are skipped,
never misattributed).

## UI

- Standalone Reach page + node-detail section.
- Reusable bidirectional link map (OSM) with links coloured by
bottleneck.
- Incoming/outgoing toggles to isolate each direction.

## Naming note (deliberate, no collision)

This is distinct from the existing **per-observer reachability** in
topology analytics (`ReachNode` / `ObserverReach` / `perObserverReach`).
This PR adds its own `NodeReach*` response structs in a new
`node_reach.go` and a new `/api/nodes/{pubkey}/reach` route — there are
no symbol or route collisions (verified: `go build ./...` clean). Happy
to rename to disambiguate further (e.g. "Link Quality") if you'd prefer
to reserve "Reach" for the per-observer feature.

## Testing

- `cmd/server`: endpoint shape/404/limit-clamp + unit tests for token
derivation and directional attribution, plus a scan benchmark — all
pass.
- Frontend: helper tests + Reach-page E2E (`test-node-reach-e2e.js`),
standalone route + incoming/outgoing toggles.
- `go build ./...` and `eslint public/*.js` (no-undef) clean.

## Docs

Design spec, implementation plan, and the `GET
/api/nodes/{pubkey}/reach` API contract are included under `docs/`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 13:11:06 +02:00
Kpa-clawbot a4776557ae feat(#1290): use firmware repeat:on|off hint to exclude listener-only observers from disambiguator (#1624)
Closes #1290.

cross-stack: justified — backend persists firmware-side `repeat` hint to
a new observers column, frontend surfaces the listener/repeater status
as a badge on the observers list and node-detail Heard By table per the
issue's UI acceptance criterion.

## What

Firmware 1.16 publishes a `repeat: on|off` flag in the MQTT `/status`
JSON (confirmed by @cwichura on the issue thread — see
[`MQTTMessageBuilder.cpp:58`](https://github.com/agessaman/MeshCore/blob/b45373a31f111fb0de98bb3b168226d09ceadc47/src/helpers/MQTTMessageBuilder.cpp#L58)
in `agessaman/MeshCore mqtt-bridge-implementation-flex`). Listener-only
observers (`repeat:off`) by firmware contract never relay packets, so
they cannot legitimately be a hop in someone else's resolved path. This
PR plumbs the hint end-to-end so the disambiguator stops considering
them.

## How

* **`internal/dbschema`**: idempotent `can_relay INTEGER DEFAULT 1`
migration on `observers`, plus `AssertReady` probe (server fatal-logs if
absent). Mirrored in `cmd/ingestor/db.go` `CREATE TABLE` for fresh DBs.
Annotated `PREFLIGHT: async=true` — `DEFAULT 1` is constant so SQLite
does this as a metadata-only schema rewrite.
* **`cmd/ingestor`**: `extractObserverMeta` accepts `repeat` as bool,
case-insensitive string (`on|off|true|false|yes|no`), or numeric `0|1`.
Missing field → `nil` → `COALESCE` preserves the existing column value
(back-compat with legacy observers). Plumbed through `UpsertObserverAt`
and the prepared upsert statement.
* **`cmd/server`**: `GetNonRelayObserverPubkeys` + new
`prefixMap.markNonRelay` drop matching candidates inside
`pm.resolveWithContext` at the top of the resolver, so all 4 tiers see
the pruned candidate set. `ObserverResp.CanRelay` is surfaced on
`/api/observers` and `/api/observers/{id}`. `GetNodeHealth` enriches
per-observer rows with `can_relay` so the node-detail badge renders.
Probe-and-fall-back when the `can_relay` column is absent (legacy test
fixtures).
* **`public/`**: listener vs repeater pill on observers list, observer
detail `Relay` stat card, and node-detail `Heard By` table. CSS uses
existing theme vars.

## Test

Added `TestResolveWithContext_ExcludesNonRelayObservers_Issue1290` in
`cmd/server/resolve_non_relay_1290_test.go` covering all three required
cases:
* `repeat:off` pubkey → not a candidate (assertion failed in red commit
`5f7fdb96`, passes after green `f12911dc`)
* `repeat:on` pubkey → still a candidate (regression guard)
* legacy obs (no field) → still a candidate (back-compat)

Red→green proof:
```
$ git log --oneline origin/master..HEAD
f12911dc feat(#1290): exclude listener-only observers from path-hop disambiguator
5f7fdb96 test(#1290): red — assert listener-only observers excluded from path-hop candidates
```

Full server + ingestor + dbschema + migrate test suites pass locally.

## Acceptance checklist (from #1290)

* [x] Ingestor parses `repeat` field (boolean OR string `on|off`)
* [x] Field persisted on `observers` table (new `can_relay BOOLEAN`
column, idempotent migration via `internal/dbschema`)
* [x] Server's disambiguator (`pm.resolveWithContext`) excludes
`can_relay=false` observer-nodes from path-hop candidate set
* [x] UI badge on observers list + node detail page indicating
"listener" vs "repeater"
* [x] Backward compat: legacy observers default to `can_relay=true`
* [x] Test: `repeat:off` → NOT a candidate
* [x] Test: `repeat:on` → IS a candidate
* [x] Test: legacy → IS a candidate

## Out of scope (preserved per issue)

Backfilling already-resolved paths is left as a follow-up. No
firmware/broker changes.

---------

Co-authored-by: Kpa-clawbot <bot@kpa-clawbot.local>
Co-authored-by: openclaw-bot <bot@openclaw>
2026-06-08 01:27:13 -07:00
Kpa-clawbot 3d12266595 fix(#1608): address PR #1609 follow-up findings — config doc, receipt-time liveness, buffer stop/clamp warn (#1623)
Follow-up to #1609 / #1608.

Addresses the 5 unresolved findings from the PR #1609 round-1 polish
review.

## Findings addressed

| Tag | Severity | Fix | Commits |
|-----|----------|-----|---------|
| **B1** | BLOCKER | Document `ingestBufferSize` in
`config.example.json` near other ingestor knobs. Default `50000`,
comment text from review. | `f0b4e411` |
| **M1** | MAJOR (option 1 from review) | Split receipt-time vs
post-write liveness: add `SourceLivenessState.LastReceiptUnix` +
`MarkReceipt`, stamp at the MQTT receipt callback, leave
`LastMessageUnix` post-write only. Drop the double-stamp at receipt that
masked write-path stalls. Surface both clocks via the ingestor stats
file (`source_liveness`) and the server's `/api/healthz`
(`ingest_liveness`, additive — older builds unaffected). | RED
`fa78233d` / GREEN `bc81b544` |
| **M1 (drop-log)** | MAJOR | Log every drop when buffer is at capacity.
Removes the `n==1 \|\| n%1000` throttle that hid the first stall behind
1000 lost packets. The Submit drop branch only fires when the channel is
at cap so volume is naturally bounded by the stall, not by an arbitrary
modulo. | RED `a468763e` / GREEN `7b24fce5` |
| **m1** | MINOR | Add `IngestBuffer.Stop()` and `Done()` so tests stop
leaking the consumer goroutine that `Start()` spawns. Existing tests
gain `t.Cleanup(b.Stop)`. Drain semantics: stop-before-Ready exits
immediately; stop-after-Ready best-effort drains queued jobs. | RED
`8430c822` / GREEN `78c9b223` |
| **m2** | MINOR | `NewIngestBuffer(<1)` now logs a `[ingest-buffer]
WARN` line on clamp so misconfigured `ingestBufferSize` values are
visible instead of silently running a 1-slot queue. Test captures log
output. | RED `62119ab4` / GREEN `815bfd02` |
| **m3** | MINOR | Add godoc to `Submit` and `Ready` documenting the
Start-before-Submit / Start-before-Ready ordering invariant. |
`564a813b` |

## TDD discipline

Each behavioral fix (M1, M1-drop-log, m1, m2) lands as a red-then-green
pair. Red commits compile + run + fail on assertion, verified locally
before the green commit. Per-finding red→green pairs are visible in the
commit graph above.

B1 and m3 are docs-only and ship as single commits (preflight script
accepts them under the docs/comments exemption).

## Schema compatibility

`/api/healthz` change is purely additive: `ingest_liveness` is only
included when the ingestor publishes the new `source_liveness` field, so
older ingestor + newer server combos are unaffected. Field order in the
response stays stable for prior consumers.

## Test output

- `go test -count=1 -timeout 180s ./cmd/ingestor/...` → green (160s)
- `go test -count=1 -timeout 300s ./cmd/server/...` → green (48s)
- Race-mode runs of the touched packages
(`IngestBuffer|Liveness|Watchdog|Receipt|Healthz`) → green
- Full-package race runs locally exceed the brief's 120s timeout on
pre-existing slow integration tests (TestObsTimestampIndexMigration,
TestNeighborEdgesBuilderDeltaScan); CI has the headroom.

## Preflight

`bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master`
→ all hard gates pass, no warnings.

## Files changed

- `config.example.json` — B1
- `cmd/ingestor/ingest_buffer.go` — m1, m2, M1-drop-log, m3
- `cmd/ingestor/ingest_buffer_test.go` — m1, m2, M1-drop-log
- `cmd/ingestor/mqtt_watchdog.go` — M1
- `cmd/ingestor/mqtt_watchdog_m1_test.go` — M1 (new)
- `cmd/ingestor/main.go` — M1 (receipt callsite)
- `cmd/ingestor/stats_file.go` — M1 (publish `source_liveness`)
- `cmd/server/perf_io.go` — M1 (type + reader)
- `cmd/server/healthz.go` — M1 (surface `ingest_liveness`)

Original review reference: PR #1609 polish review by the M-axis bot.

---------

Co-authored-by: corescope-bot <bot@corescope.local>
2026-06-07 09:28:51 -07:00
Kpa-clawbot bc1822e46c perf(load): chunked Load with early HTTP readiness (#1009) (#1596)
## What

Switches the server's startup from a synchronous full-scan
`PacketStore.Load()` to a chunked `LoadChunked(chunkSize)` that:

1. Streams transmissions+observations from SQLite in id-ordered chunks
(default `chunkSize=10000`, configurable via `db.load.chunkSize`).
2. Closes `FirstChunkReady()` after the first chunk is merged —
`main.go` binds the HTTP listener on that signal instead of blocking on
the full multi-minute load.
3. Stamps `X-CoreScope-Load-Status: loading; progress=<rows>` on every
response while LoadChunked is in flight, flipping to `ready` once it
completes (via `loadStatusMiddleware`).
4. Preserves the existing retention/`hotStartupHours`/`maxMemoryMB`
clamps and the post-load index rebuild (`pickBestObservation` /
`buildSubpathIndex` / `buildPathHopIndex` / `buildDistanceIndex`).

## Why

Per #1009: at 5M+ observations (Cascadia scale) the synchronous Load
blocked HTTP for ~80s with a 2–3× steady-state RAM peak. With chunked
load the listener binds within seconds; dashboards and probes can read
partial data and see the `loading` status header until the background
load finishes.

## Notes

- `/api/healthz` readiness gate (`readiness` atomic, init `WaitGroup`)
is unchanged — it still waits for neighbor-graph build + initial
`pickBestObservation` before reporting `ready:true`. `LoadChunked` only
changes when the listener BINDS, not when it advertises ready.
- `cmd/server/main.go` waits for `FirstChunkReady` (or the full load on
a tiny DB) before proceeding, and drains the load goroutine in the
background with a logged error path.
- Config Documentation Rule: `config.example.json` now documents
`db.load.chunkSize` with a nested `_comment` describing the trade-off.

## Tests

- `cmd/server/chunked_load_test.go` asserts:
  - (a) `FirstChunkReady` fires before `LoadChunked` returns
- (b) `X-CoreScope-Load-Status` transitions `loading; progress=...` →
`ready`
- (c) `chunkSize` honored (2500 rows @ 1000 → 3 chunks via
`OnChunkLoaded`)
  - (d) `Config.DBLoadChunkSize()` default 10000 + override
- Red commit (`102a4c84`) lands the tests with stubs that fail on
assertion — verified locally before the green commit.
- Green commit (`35cecf16`) makes all four pass; full `cmd/server` suite
green (47s locally).

Closes #1009



## TDD red-commit exemption

The original red commit `f878e15e` ("test(load): failing tests for
chunked Load + early HTTP readiness") fails to **compile** rather than
failing on an assertion, because it references symbols
(`store.LoadChunked`, `store.FirstChunkReady`, `store.OnChunkLoaded`,
`Config.DBLoadChunkSize`, `loadStatusMiddleware`) that do not exist on
master. Per `AGENTS.md` the bar is "MUST fail on an assertion ... A
compile error is NOT a valid red commit."

This is claimed under the **net-new surface** exemption with the
following justification:

- LoadChunked / FirstChunkReady / loadStatusMiddleware / DBLoadChunkSize
are all introduced by this PR — no prior implementation existed to
refactor. There is no behaviour on master that the red commit could
meaningfully assert against without first declaring the new symbols.
- The cheapest "proper" alternative (split the red into two commits:
stub-first + assertion-fail) was deferred because the test file
unambiguously fails on missing-symbol — there is no risk of the test
becoming a tautology against a pre-existing stub.
- **Behaviour gating IS proven elsewhere on this branch.** Commit
`799bde49` ("test(load): red — LoadChunked must mark indexes ready + not
flip Complete on error") is a proper assertion-fail red against the same
package, and commit `92cadd1d` is the matching green. Reviewers can
verify the red→green pattern there.

If a future reviewer wants the strict pattern, the follow-up is
mechanical: split `f878e15e` into a stub-only commit followed by the
assertion commit. Not done here to keep the rework cost proportional to
the risk (zero, in this case).

## Preflight overrides

- check-async-migrations: justified — the flagged `CREATE TABLE`/`CREATE
INDEX` statements live in `cmd/server/chunked_load_id_zero_test.go` and
`cmd/server/chunked_load_oldest_test.go` only. They run against per-test
`t.TempDir()` SQLite files (in-process, ~10 rows, lifetime = single
test) — they are NOT production schema migrations. No prod table is
touched. PREFLIGHT-MIGRATION-SCALE: <30s N=10 (per-test tempdir
fixture).

---------

Co-authored-by: CoreScope Bot <bot@corescope.local>
Co-authored-by: clawbot <bot@noreply.example.com>
Co-authored-by: Kpa-clawbot <bot@example.com>
Co-authored-by: Kpa-clawbot <bot@kpa-clawbot>
2026-06-07 03:43:29 -07:00
Eldoon Nemar 7421ead9b0 fix: bypass API limit clamps for internal UI requests. Revisit of issue #1540 (#1589)
This PR replaces the strict, hardcoded limits on API list endpoints
(introduced in the recent security patch) with a new
operator-configurable `listLimits` block. This change is needed as issue
1540's implementation introduced a 500max node limit on the live map or
any other function that leverages the api/nodes backend.

Previously, we attempted to bypass public caps for internal UI requests
using a heuristic based on browser headers (`Sec-Fetch-Site`). Following
review, we decided to drop that heuristic entirely to eliminate any
security-by-browser-convention surface area.

Instead, `queryLimit()` returns to its original, mathematically simple
bounds-checking shape, and the absolute maximums are now drawn from
`config.json`. This provides equal DoS protection against all callers
while allowing server operators to tune the ceilings based on the size
of their mesh (e.g. embedded devices can tighten the knobs, regional
hubs can raise them).

### Changes Made:
- **`config.go`**: Introduced a `ListLimits` config struct containing
`PacketsMax`, `NodesMax`, `AnalyticsMax`, and `ChannelMessagesMax`.
Added safe initialization to ensure default caps (10000, 2000, 200, 500
respectively) apply even if the block is omitted from the config.
- **`clamp_limit.go`**: Deleted `isInternalUIRequest` entirely and
restored `queryLimit` to its original signature (`r, def, max`).
- **`routes.go`**: Replaced all hardcoded integer ceilings on list
endpoints (`/api/packets`, `/api/nodes`, etc.) with
`s.cfg.ListLimits.*`.
- **`config.example.json`**: Added the `listLimits` block with
documentation to guide new operators.
- **`clamp_limit_test.go`**: Purged all header-heuristic testing.

### Verification:
- All 611 backend unit tests pass (`npm run test:unit`).
- Bounds-checking math continues to enforce hard DoS clipping exactly at
the operator's specified configuration limit.

---------

Co-authored-by: mc-bot <bot@openclaw.local>
Co-authored-by: openclaw-bot <bot@openclaw>
2026-06-06 22:45:05 -07:00
Kpa-clawbot 1bdb92de88 feat(#1574): operator-configurable liveMap.maxNodes (default 2000) (#1577)
Red commit: 94dc1d70a5

Fixes #1574.

cross-stack: justified — by design. Adds one server-side knob
(`liveMap.maxNodes`) on the Go API and consumes it on the frontend
(`public/live.js`) via the shared `/api/config/client` bootstrap in
`public/roles.js`. Cannot land server-only or frontend-only without
either dropping operator config (frontend-only) or leaving the literal
in place (server-only).

## Problem (per triage)
`public/live.js:2515-2516` hardcodes `/api/nodes?limit=2000` for the
live-map node-load path. Reporter measured headroom at N=4300 and
asked for an operator knob. Same `2000` magic also lives at
`public/live.js:480` for the VCR-rewind `/api/packets?limit=2000`.

## Fix
- New `liveMap.maxNodes` field in `Config` (default 2000).
- `Config.LiveMapMaxNodes()` server-side clamp: `[100, 20000]`;
  zero/negative falls back to default. Defangs misconfig (e.g. 1M
  would OOM the SQLite read + JSON serialization path).
- `/api/config/client` now returns `liveMapMaxNodes`.
- `public/roles.js` reads it at bootstrap into
`window.LIVE_MAP_MAX_NODES`
  (default 2000 to preserve behavior on stale caches).
- `public/live.js` consumes `LIVE_MAP_MAX_NODES` at both the
`/api/nodes`
  call sites (formerly :2515-2516) and the VCR-rewind `/api/packets`
  call (formerly :480) — single source of truth, in-scope per triage's
  "factor into a sibling const" suggestion.
- `config.example.json` documents the knob with `_comment_maxNodes` per
  AGENTS.md config rule.

## TDD
1. **Red** (`94dc1d70`): added `test-issue-1574-live-map-max-nodes.js`
   (grep-asserts the literal is gone + `LIVE_MAP_MAX_NODES` /
   `liveMapMaxNodes` are wired + config example has the field) and
   `cmd/server/livemap_maxnodes_1574_test.go` (`/api/config/client`
   exposes `liveMapMaxNodes` + clamp table-driven cases). Stub
   `LiveMapMaxNodes()` returns 0 so the test compiles and fails on
   assertion, not import.
2. **Green** (this commit): real `LiveMapMaxNodes()` clamp + wire-up.
   All assertions pass; existing `cmd/server` suite still green.

## E2E note
Frontend assertion is grep-based (literal removal + constant
reference), in the established `test-issue-*` style used elsewhere
(e.g. `test-issue-1189-live-iata-badge.js`). No Playwright change
needed for a literal-replace; behavior validation is the server-side
clamp + JSON shape tests.

## Out of scope
No customizer UI change — operators set this in `config.json`, same
pattern as `liveMap.propagationBufferMs`. Customizer surfacing can
land as a follow-up if the operator wants it.

---------

Co-authored-by: mc-bot <bot@corescope.local>
Co-authored-by: Kpa-clawbot <bot@meshcore-analyzer>
2026-06-06 22:44:59 -07:00
Kpa-clawbot ad41b9bb7b fix(tests): subpaths_window tests wait for index readiness after #1595 chunked load (#1621)
## Why master is red

After PRs #1592 (route-window subpath regression test) and #1595
(background/chunked index build with 503 readiness gate) were merged
together, two tests in `cmd/server/subpaths_window_test.go` started
failing on master:

```
--- FAIL: TestSubpathsHonorsTimeWindow_StoreLevel
    subpaths_window_test.go:70: unbounded: expected totalPaths=2, got 0 (subpaths=[])
--- FAIL: TestSubpathsHandlerHonorsTimeWindow
    subpaths_window_test.go:116: GET /api/analytics/subpaths?...: status=503 body={"error":"index loading","retryAfter":5}
```

Both branches passed in isolation; the conflict only manifested
post-merge. Reason:

- **#1592** added tests that call `store.Load()` then immediately query
`GetAnalyticsSubpathsWithWindow` / hit `/api/analytics/subpaths`.
- **#1595** moved the subpath + path-hop index builds off the critical
path of `Load()` into background goroutines, and hard-gated the
analytics handlers behind `SubpathIndexReady()` (returning 503 +
`Retry-After: 5` until the build completes).

So after `Load()` returns, `s.spIndex` is still empty for a short window
and the handler returns 503. The store-level test sees `totalPaths=0`;
the handler test sees the 503.

## Fix (test-only)

Add `store.WaitIndexesReady(5 * time.Second)` between `Load()` and the
assertions in both tests. This matches the established pattern already
used by `routes_test.go` and `repeater_enrich_recomputer_1008_test.go`.

The 503 readiness gate from #1595 is intentional production behavior and
is **not** touched. No production code is modified.

## Repro

Before:
```
$ go test ./cmd/server/ -run TestSubpaths.*Window -v -count=1
--- FAIL: TestSubpathsHonorsTimeWindow_StoreLevel (0.01s)
    subpaths_window_test.go:70: unbounded: expected totalPaths=2, got 0 (subpaths=[])
--- FAIL: TestSubpathsHandlerHonorsTimeWindow (0.02s)
    subpaths_window_test.go:116: GET /api/analytics/subpaths?minLen=2&maxLen=8: status=503 body={"error":"index loading","retryAfter":5}
FAIL
```

After:
```
$ go test ./cmd/server/ -run TestSubpaths.*Window -v -count=3
--- PASS: TestSubpathsHonorsTimeWindow_StoreLevel (0.01s)
--- PASS: TestSubpathsHandlerHonorsTimeWindow (0.02s)
... (x3) ...
PASS
ok      github.com/corescope/server     0.097s

$ go test ./cmd/server/ -count=1 -timeout 300s
ok      github.com/corescope/server     46.292s
```

## Files changed
- `cmd/server/subpaths_window_test.go` (+11 lines, test-only)

## Notes
- TDD exemption: this is a test-fix PR for a merge-conflict-induced
failure. The "failing test" already exists on master; this PR makes it
pass correctly by waiting on the readiness gate the test was previously
unaware of.
- Unblocks staging deploys.

Co-authored-by: openclaw-bot <bot@openclaw>
2026-06-06 21:59:23 -07:00
Kpa-clawbot 222bfdf6cf feat(perf): SQLite writer-lock wait/hold instrumentation per component (#1340) (#1594)
## What

Per-component SQLite writer-lock instrumentation so the next
neighbor-builder-style write-lock starvation (root cause of #1339,
invisible to operators for ~3 days) is detectable from `/api/perf`.

Adds `Store.WriterExec` / `Store.WriterTx` wrappers that gate every
wrapped call on a package-level `writerMu` so the wait the SQLite driver
hides becomes Go-visible, and record `wait_ms` + `hold_ms` +
`contention_total` (wait_ms > 100ms) under a component tag.
Per-component p50/p95/p99 + max are published to
`/api/perf/write-sources` under `.writer_perf` via the existing ingestor
stats-file path. Slow-writer log line (`[db-slow-writer] component=X
duration=Yms query=<200ch>`) fires on `hold_ms > 500ms` (threshold
overridable via `CORESCOPE_DB_SLOW_WRITER_MS` env var).

## Tagged call sites

| Component | Location |
|-----------|----------|
| `mqtt_handler` | `InsertTransmission` (db.go) |
| `neighbor_builder` | `buildAndPersistNeighborEdges`
(neighbor_builder.go) |
| `prune_packets` | `PruneOldPackets` (maintenance.go) |
| `prune_observers` | `RemoveStaleObservers` + orphan-metrics cleanup
(db.go) |
| `prune_metrics` | `PruneOldMetrics` (db.go) |
| `vacuum` | `RunIncrementalVacuum` + `CheckAutoVacuum`'s full VACUUM
(db.go) |

## TDD red→green

- **Red commit** `68de585b` — `cmd/ingestor/db_writer_perf_test.go` +
`Store.Writer*` stubs at end of `db.go`. Test synthetically blocks the
writer for 60s tagged `neighbor_builder`, then asserts
`mqtt_handler.wait_ms.p99 > 50000ms` on concurrent inserts. Fails on the
assertion (p99 = 0.0ms) with the stub — not a build error.
- **Green commit** `6a9be174` — replaces stubs with real
wait/hold/contention aggregator + wires every writer call site. Same
test passes:

```
2026/06/05 04:36:47 [db-slow-writer] component=neighbor_builder duration=60059.0ms query=COMMIT
--- PASS: TestWriterStarvationVisibleInPerf (60.40s)
PASS
ok      github.com/corescope/ingestor   60.408s
```

## Scope discipline

- **API**: no public `Store`/`DB` signature change. Only additive
exports.
- **Server**: extends existing `/api/perf/write-sources` JSON with
`.writer_perf` — does **not** add a new route, does **not** replace
`handlePerf`. Empty `.writer_perf` map when paired with an older
ingestor.
- **Read/write invariant** (#1283) preserved: all instrumentation lives
on the ingestor's writer connection.
- **Files touched** (6 total): `cmd/ingestor/db.go`,
`cmd/ingestor/db_writer_perf_test.go`, `cmd/ingestor/maintenance.go`,
`cmd/ingestor/neighbor_builder.go`, `cmd/ingestor/stats_file.go`,
`cmd/server/perf_io.go`, `config.example.json`.

## Deferred (acceptance items NOT in this PR)

- **`mbcap_persist` component tag** — `RunMultibyteCapPersist`'s tx is
intentionally NOT wrapped in this PR to stay within the implementation
brief's 3-files-outside-whitelist budget. One-file follow-up to
instrument.
- **CI smoke test** asserting "neighbor-builder hold_ms < 1000ms on
100k-obs fixture" — deferred to a separate PR per the brief; this PR is
scoped to instrumentation only.

## Preflight overrides

PREFLIGHT-MIGRATION-SCALE: <30s N=runtime — the async-migration gate
flagged five `instrumentedExec` / wrapped-`tx.Exec` lines on `DELETE
FROM observer_metrics`, `UPDATE observers`, `DELETE FROM
observer_metrics`, `DELETE FROM observations`, `DELETE FROM
transmissions`. These are **not** schema migrations — they are the
existing runtime prune / retention queries that already ran sync against
`s.db.Exec` / `tx.Exec` on every retention cycle on master. This PR only
swapped the surface call (sync → sync, via the wrapper) to record
wait/hold timing; no new sync schema work was introduced. Behavior on
production data is identical to master.

Also: red commit's synthetic `UPDATE nodes SET name = name WHERE 0` is a
test-only stub designed to acquire the writer without mutating any row
(the `WHERE 0` is a no-op predicate).

Fixes #1340

---------

Co-authored-by: corescope-bot <bot@corescope.local>
2026-06-06 21:05:59 -07:00
Kpa-clawbot 1b112f0b08 feat(memlimit): GOMEMLIMIT via runtime.maxMemoryMB in server + ingestor (#1010) (#1595)
Red commit: 929da3c6dc — CI:
https://github.com/Kpa-clawbot/CoreScope/commit/929da3c6dcc1b619c27478291125d1c91323db8f/checks

Fixes #1010.

## What
Adds `GOMEMLIMIT` support to both `cmd/server` and `cmd/ingestor` per
the locked triage scope on #1010.

Precedence (env wins):
1. `GOMEMLIMIT` env var
2. `runtime.maxMemoryMB` config field (new)
3. Server only: implicit `packetStore.maxMemoryMB * 1.5` (existing #836
behavior, unchanged when `runtime.maxMemoryMB` is absent)
4. Otherwise unset — default Go behavior preserved (backwards
compatible)

Each startup logs a `[memlimit]` line echoing the effective
source/limit, or an "unset → default" note when neither is set.

## Changes
- `cmd/ingestor/memlimit.go` — new, `applyMemoryLimit(runtimeMaxMB,
envSet)`.
- `cmd/ingestor/memlimit_test.go` — new, env/config/none/precedence
assertions.
- `cmd/ingestor/config.go` — new `RuntimeConfig{MaxMemoryMB int}` field.
- `cmd/ingestor/main.go` — wires `applyMemoryLimit` into startup right
after `LoadConfig`.
- `cmd/server/config.go` — new `RuntimeConfig` + `cfg.Runtime` field.
- `cmd/server/main.go` — adds explicit `runtime.maxMemoryMB` precedence
over packetStore-derived; existing `warnIfMemlimitUnderprovisioned`
(#1264) unchanged.
- `config.example.json` — new `runtime` block with
`_comment_runtime_maxMemoryMB` per the Config Documentation Rule.
- `README.md` — sizing-table row with ≥1.5× working set floor +
death-spiral warning.

## TDD
- Red: `929da3c6` — ingestor `applyMemoryLimit` stub returns
`(0,"none")`; four tests fail on assertions (`expected source=env, got
"none"`, etc.) — no compile errors.
- Green: `953ec9d8` — implements ingestor `applyMemoryLimit`, wires
startup, threads `runtime.maxMemoryMB` through server too.

## Preflight
`bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master`
→ clean (all gates pass, all warnings pass).

## Out of scope
- `pprof`-verified GC-trigger acceptance criterion from the original
issue — requires production tracing; the triage scope is the
operator-tunable plumbing.
- Container auto-detection of cgroup memory limit (already covered by
#1264's `warnIfMemlimitUnderprovisioned`).

---------

Co-authored-by: corescope-bot <bot@corescope>
2026-06-06 21:05:56 -07:00
Kpa-clawbot df61660a5e perf(load): background subpath+pathHop index builds with ready gates (#1008) (#1604)
## Summary

Mirrors the distance-index lazy pattern (#1011): the subpath and
path-hop index builds are no longer part of `Load()`'s synchronous
critical section. They now run in **two parallel background goroutines**
kicked off after `s.loaded = true`, so HTTP comes up immediately even at
Cascadia scale (5M observations, previously ~60s blocked on these two
builds inside `Load()` under `s.mu`).

Fixes #1008.

## Approach

Two new `atomic.Bool` fields on `PacketStore` (`subpathReady`,
`pathHopReady`) plus a one-shot broadcast channel (`indexReadyChan`) for
waiters. `Load()` removes the synchronous `s.buildSubpathIndex()` /
`s.buildPathHopIndex()` calls and instead kicks
`s.startBackgroundIndexBuilds()` right before returning. That function
spawns **two independent goroutines** (review m7), one per index. Each
goroutine:

1. acquires `s.mu.Lock()` (blocks until `Load()`'s deferred Unlock
fires),
2. runs its builder, releases the lock, stores its `ready = true`,
3. closes the broadcast channel if both flags are now true,
4. logs `[startup] index build complete: subpath (Xs)` (or pathHop).

Analytics handlers whose entire response IS the index aggregate —
`/api/analytics/subpaths`, `/api/analytics/subpaths-bulk`,
`/api/analytics/subpath-detail`, `/api/nodes/{pubkey}/paths` — gate
reads behind the corresponding atomic and respond with `503 Service
Unavailable`, `Retry-After: 5`, body `{"error":"index
loading","retryAfter":5}` until the build completes — matching the
triage spec.

### Handler scope (review M2)

A second class of handlers also touches these indexes — `/api/nodes`,
`/api/nodes/{pubkey}`, the `GetRepeaterRelayInfoMap` /
`GetRepeaterUsefulnessScoreMap` / `GetBridgeScore` enrichment helpers,
and `repeater_liveness` / `repeater_usefulness`. These are
**intentionally NOT 503-gated**: they expose the index via optional
enrichment fields that callers already treat as "may be empty", and
503-ing the SPA bootstrap to wait for an index that only affects
relay-activity badges would be a worse UX than a 30–60s window of "—"
values. The rationale is documented in the package doc-comment at the
top of `index_ready_1008.go`.

The recomputer's synchronous prewarm path
(`StartRepeaterEnrichmentRecomputer`) gates on `WaitIndexesReady(60s)`
(review M1) so it never snapshots an empty `byPathHop` into
`s.repeaterRelayCache`; on timeout it skips the prewarm and lets the
5-minute ticker pick up the populated index.

## Concurrency safety

Each build goroutine acquires `s.mu.Lock()` before calling the existing
`buildSubpathIndex()` / `buildPathHopIndex()` helpers, which replace
`s.spIndex` / `s.spTxIndex` / `s.byPathHop` with freshly-allocated maps.
Visibility of the populated maps to handlers that observe
`Ready()==true` is established by Go 1.19+ sync/atomic acquire-release
semantics: the atomic store of `true` happens-after `s.mu.Unlock()`, and
the handler's atomic load synchronizes-with that store. The handler's
subsequent `s.mu.RLock` serializes against concurrent ingest writers,
not against the builder.

The existing `main.go` boot sequence does not start ingest goroutines
until after `store.Load()` returns and graph init completes, so the
brief window between `Load()` returning and the two goroutines acquiring
`s.mu` does not race with concurrent ingest writes.

## TDD: red → green

- **Red** commit `63e79e11`: `cmd/server/index_ready_1008_test.go` adds
four assertions; `cmd/server/index_ready_1008.go` adds compile-only
stubs returning `true` so the tests fail on assertions, not build
errors.
- **Green** commit `fb1d22b0`: implements the real atomic gates, the
background goroutine, and the four handler 503 branches; also updates
four existing tests that read indexes directly post-`Load()` to call
`store.WaitIndexesReady(5s)` first.
- **Race-fix commit `b77d56eb`** (review m8 — test-infra exemption):
adds `WaitIndexesReady` calls in test helpers/setup paths so the race
detector no longer flags the read-after-Load() pattern in existing
tests. Per AGENTS.md, race-detector flakes are observable evidence (test
crashes under `-race`) and qualify for the test-infra exemption from the
TDD red-commit requirement; no behavior change in production code.
- **Polish round 2 — M1 red `408c7462` / green `85e82c8a`**:
`TestIssue1008_M1_PrewarmWaitsForIndexes` asserts the recomputer prewarm
SKIPs when indexes are not ready. Red commit adds the assertion + a stub
`repeaterEnrichmentPrewarmWait` var; green commit wires
`WaitIndexesReady` into the prewarm path and adds the handler-scope docs
for M2.
- **Polish round 2 — minor cleanups `fd089bd0`** (m3..m7): chunk-loader
wires `markIndexesReadySync`, memory-model comment rewritten to cite
acquire-release, sentinel deleted, polling replaced with a broadcast
channel, two parallel goroutines for the builds.
`TestIssue1008_m7_BothFlagsSetAfterParallelStart` covers the parallel
path.

## Reproduction

```
git fetch origin fix/issue-1008
git checkout 63e79e11   # red commit
cd cmd/server && go test -run TestIssue1008_ -count=1 .   # FAILs

git checkout fix/issue-1008   # latest green
cd cmd/server && go test -run TestIssue1008 -count=1 -race .   # all pass
cd cmd/server && go test -count=1 -race -short ./...           # full suite ok
```

## Files changed

| file | role |
|---|---|
| `cmd/server/store.go` | atomic.Bool fields + indexReadyChan broadcast
field; remove sync build calls in Load(); kick goroutines; wire
markIndexesReadySync from chunk loader |
| `cmd/server/index_ready_1008.go` | ready flags, two-goroutine
background builds, 503 helper, channel-based WaitIndexesReady,
handler-scope docs |
| `cmd/server/index_ready_1008_test.go` | red-commit contract tests +
parallel-start assertion |
| `cmd/server/repeater_enrich_recomputer.go` | gate prewarm on
WaitIndexesReady (M1) |
| `cmd/server/repeater_enrich_recomputer_1008_test.go` | M1 red+green
assertions |
| `cmd/server/routes.go` | 503 gate on 4 analytics handlers |
| `cmd/server/routes_test.go` | setup helpers wait for ready; collision
test waits |
| `cmd/server/coverage_test.go` | three tests wait for ready before
reading indexes |

## Out of scope

- Distance index (already deferred in #1011) — untouched.
- The `pickBestObservation` + `indexByNode` per-tx loop in `Load()` —
kept synchronous per triage Findings (ordering-sensitive,
contiguous-memory, fast).

---------

Co-authored-by: bot <bot@noreply.local>
Co-authored-by: openclaw-bot <bot@openclaw.local>
Co-authored-by: mc-bot <mc-bot@users.noreply.github.com>
2026-06-06 20:46:42 -07:00
Kpa-clawbot 3898688d6d analytics: Relay Airtime Share endpoint + dumbbell chart (#1359) (#1601)
Implements the locked spec from #1359.

Red commit: 68a140a8 — `distinctRelayCount` stub returns 0; test fails
on assertion (compiles + runs to assertion, not a build error).
Green commit: 48c2ddad — real implementation.

## Backend (in-memory, no SQL, no schema change)

- `cmd/server/relay_airtime_share.go`
- `distinctRelayCount(tx)` — unions the resolved-pubkey reverse index
for `tx.ID`. That index already dedups `(pubkey-hash, txID)` pairs
across every observation's `resolved_path`, so its length IS the count
of distinct repeaters that forwarded the packet. NOT length of any
single observation's resolved_path (the bug-trap from #1358).
- `computeRelayAirtimeShare(window)` — per-tx `score = payload_bytes ×
distinctRelays`, bucketed by `payload_type`, sorted desc by airtime_pct.
- `GetRelayAirtimeShareWithWindow` — cached behind existing `rfCache` +
`rfCacheTTL` pool. Shallow-copies the cached payload with `cached=true`
for the client.
- `cmd/server/routes.go` — `GET
/api/analytics/relay-airtime-share?window=…` returning
`{rows:[{payload_type,type,count,count_pct,score,airtime_pct}],
total_count, total_score, window, cached}`.

## Frontend

- `public/analytics.js`
- `renderRelayAirtimeDumbbell(data)` — horizontal dumbbell chart per
payload_type. Gray dot = count %, colored dot = airtime %, connector
line between them = the divergence, shared 0-100% axis, sorted desc by
airtime.
- Tooltip: payload_type, count %, count N, airtime %, raw score,
within-mesh caveat.
  - Title: **Relay Airtime Share**.
- Subtitle (exact): `Score = payload bytes × distinct repeaters that
forwarded the packet. Counts relay re-transmissions; originator TX
excluded. Not comparable across meshes.`
  - Mounted on the Overview tab immediately beneath Payload Type Mix.

## Tests

`TestRelayAirtimeShare_ADVERTvsACKDivergence` — the locked acceptance
scenario:

- 1 ADVERT (200 B, 8 distinct relays) → score 1600, airtime 100%
- 1000 ACKs (10 B, 0 relays each)     → score 0,    airtime 0%
- Count distribution is the inverse (ACK 99.9%, ADVERT 0.1%).
- Sort assertion: ADVERT is rows[0] by airtime_pct desc.

Full suite: `go test -short ./cmd/server/...` → PASS (25.9s).

## Acceptance criteria

- [x] In-memory `airtime_usage_score` accumulator in analytics path
- [x] `distinctRelayCount(tx)` helper unioning resolved-pubkey reverse
index across all observations of `transmission_id`
- [x] `/api/analytics/relay-airtime-share?window=…` endpoint
- [x] Cached via existing `rfCache` + `rfCacheTTL`; no new cache layer
- [x] Dumbbell chart on `/analytics` beneath Payload Type Mix;
gray=count, colored=airtime, shared axis, sorted desc by airtime
- [x] Title + subtitle exactly as specified
- [x] Tooltip with payload_type, count %, count N, airtime %, raw score,
caveat
- [x] Unit test demonstrates the ADVERT-vs-ACK divergence
- [x] No new SQL, no new index, no schema migration (verified via diff)
- [ ] Live staging bench (<5ms p99 uncached / <1ms cached) — deferred to
follow-up; cached behind 60s `rfCacheTTL` so steady-state cost is a map
lookup

## Preflight overrides

- Branch scope cross-stack: justified — backend endpoint and frontend
chart are a single deliverable per #1359 spec (one chart bound to one
endpoint, no incremental staging).

Fixes #1359

---------

Co-authored-by: bot <bot@local>
2026-06-06 20:46:24 -07:00
Kpa-clawbot d6384c3c59 fix(#1217): honor time-window filter on Route Patterns analytics (#1592)
## What

The Route Patterns chart on `/#/analytics` ignored the Time window
picker — every selection returned identical data. This PR threads
`?window=` through to the backing endpoints and the store-level
computation.

## Root cause

`cmd/server/routes.go:2065` (`handleAnalyticsSubpaths`) and
`cmd/server/routes.go:2090` (`handleAnalyticsSubpathsBulk`) never called
`ParseTimeWindow(r)`. The store-level entry points
(`GetAnalyticsSubpaths`, `GetAnalyticsSubpathsBulk`) had no window-aware
variant. The frontend (`public/analytics.js`) didn't append `&window=`
to the `/analytics/subpaths-bulk` request.

## Fix

### Backend (`cmd/server/store.go`)
Added `GetAnalyticsSubpathsWithWindow` +
`GetAnalyticsSubpathsBulkWithWindow`. Zero `TimeWindow` →
byte-equivalent to the existing fast path (no perf regression on the
default view). Non-zero window → iterate `s.packets`, filter on
`tx.FirstSeen` via `TimeWindow.Includes`, reuse `rankSubpaths`. Cached
by `(region|area|window)`.

```diff
-data := s.store.GetAnalyticsSubpaths(region, minLen, maxLen, limit)
+window := ParseTimeWindow(r)
+data := s.store.GetAnalyticsSubpathsWithWindow(region, minLen, maxLen, limit, window)
```

```diff
-results := s.store.GetAnalyticsSubpathsBulk(region, groups)
+results := s.store.GetAnalyticsSubpathsBulkWithWindow(region, groups, ParseTimeWindow(r))
```

### Frontend (`public/analytics.js`)
`renderSubpaths` now appends `&window=<value>` to the
`/analytics/subpaths-bulk` request, matching how RF / topology /
channels tabs already wire the picker.

## Before / after

```
GET /api/analytics/subpaths?window=24h   →   totalPaths=2   (all data — ignored window)
GET /api/analytics/subpaths?window=24h   →   totalPaths=1   (24h-bounded — honored)
```

## Tests

`cmd/server/subpaths_window_test.go`:
- `TestSubpathsHonorsTimeWindow_StoreLevel` — seeds a 1h-old tx with
path `[aa,bb]` + a 30d-old tx with path `[cc,dd]`; asserts the unbounded
call sees both and the 24h-windowed call sees only the recent one.
- `TestSubpathsHandlerHonorsTimeWindow` — same scenario via the HTTP
handlers for `/api/analytics/subpaths` and
`/api/analytics/subpaths-bulk`.

TDD: red commit `eefc27d3` (test fails on assertion with stub that
ignores window), green commit `4c4c45d0` (implementation makes it pass).
Full `go test ./...` in `cmd/server` green locally (~47s).

## Performance

Default view (no window selected) is unchanged — `window.IsZero()`
short-circuits to the existing precomputed-index hot path. Windowed view
is O(N_tx · path²), same complexity as the existing region-filtered slow
path. Results cached per `(region|area|window)`.

Closes #1217

---------

Co-authored-by: Kpa-clawbot <bot@corescope>
2026-06-06 20:43:49 -07:00
Kpa-clawbot 5629a489b2 perf(distance): lazy build distance index on first request (#1011) (#1597)
## Summary

Build the distance analytics index lazily on the first
`/api/analytics/distance` request instead of eagerly inside `Load()`
(and its background-load chunked merge). Per the triage Fix path on the
issue:

- Eager startup build removed from `Load()` and from
`loadAllPacketsBackground()`'s post-merge pass.
- First request returns `202 Accepted` + `Retry-After: 5` and kicks off
the build in a background goroutine, gated by `sync.Once` so concurrent
first-window requests all observe 202 (single build, not N parallel
O(n²) computations).
- Once built, subsequent requests fall through to the existing
analytics-recomputer / TTL cache and serve 200 as before.
- Debounced rebuild policy: refire only when `Δobs > 5%` since last
build OR `>5 min` elapsed, whichever is more restrictive. Background
loader also resets the gate so the next request rebuilds against the
larger dataset.

Effect: operators who never visit distance analytics no longer pay the
O(n²) construction at startup. Acceptance criteria (a) no startup build,
(b) first request triggers build, (c) concurrent in-flight requests get
202 are encoded as failing-first tests.

## Red → green

- Red: `bc947ad1` — 3 assertion failures (`expected ... empty, got 3`,
`expected 202, got 200`, `expected all 10 ... got 0`).
- Green: `5264b68a` — production change makes them pass, no other tests
regress.

## Files changed

- `cmd/server/store.go` — lazy-build state
(`distLazyMu`/`Once`/`Built`/`Building`/`LastBuilt`/`LastObs`),
`TriggerDistanceIndexBuild`, `DistanceIndexBuilt`,
`DistanceIndexBuilding`; eager `buildDistanceIndex` calls in `Load()`
post-pass and chunked-background-load post-pass removed (Once reset
instead so the next request rebuilds against the full dataset).
- `cmd/server/routes.go` — `/api/analytics/distance` returns 202 +
`Retry-After` until built.
- `cmd/server/distance_lazy_index_test.go` — new tests (the three triage
acceptance criteria).
- `cmd/server/coverage_test.go`, `cmd/server/parity_test.go`,
`cmd/server/routes_test.go`, `cmd/server/hop_disambig_e2e_test.go` —
pre-warm the index via `TriggerDistanceIndexBuild()` +
`DistanceIndexBuilt()` poll where the test asserts the 200 JSON shape.

## Perf justification

Startup cost on a 500K-obs / 2K-node dataset: previously O(n²) hop scan
during `Load()` post-pass and again during the background-load merge —
measured at 10–20s in `specs/startup-audit.md`. New code: zero work at
startup, the same O(n²) work runs at most once per HTTP request cycle
(and only when the index is stale per debounce policy). Cold-path
concurrency is bounded by `sync.Once`, so N parallel first-window
requests never produce N parallel builds.

## Scope

No config field added (debounce thresholds are hardcoded constants per
the triage Fix path — `5%` / `5min`). No public API signature changes.
No DB-side migration. Tests cover the lazy invariant, the
202+Retry-After contract, and concurrent first-request behavior.

Closes #1011

---------

Co-authored-by: Kpa-clawbot <bot@corescope.local>
2026-06-04 23:48:47 -07:00
Kpa-clawbot 3df8924114 fix(#1218): include multi-byte prefix repeaters in 1-byte hash usage matrix view (#1591)
## Problem

`/analytics` Hash Usage Matrix 1-byte view excluded repeaters configured
for 2- or 3-byte hash prefixes. In MeshCore, 1-byte path-matching is a
first-byte equality check, so any packet routed by 1-byte hash collides
on that first byte regardless of the downstream repeater's configured
prefix size. Omitting multi-byte prefix repeaters under-reports real
conflicts in the 1-byte hash space.

## Fix

**Data layer — `cmd/server/store.go` (`computeHashCollisions`,
~L7907-L7918 before, L7907-L7941 after):**

Before — `one_byte_cells` was populated only from `prefixMap`, which
only contained repeaters with `hash_size == 1`:

```go
if bytes == 1 {
    oneByteCells = make(map[string][]collisionNode)
    for i := 0; i < 256; i++ {
        hex := strings.ToUpper(fmt.Sprintf("%02x", i))
        oneByteCells[hex] = prefixMap[hex]
        if oneByteCells[hex] == nil {
            oneByteCells[hex] = make([]collisionNode, 0)
        }
    }
} else if bytes == 2 { ... }
```

After — additionally project all `hash_size in {2,3}` repeaters to their
first byte:

```go
if bytes == 1 {
    // ... (same baseline population) ...
    for _, cn := range allCNodes {
        if cn.Role != "repeater" { continue }
        if cn.HashSize != 2 && cn.HashSize != 3 { continue }
        if len(cn.PublicKey) < 2 { continue }
        hex := strings.ToUpper(cn.PublicKey[:2])
        if _, ok := oneByteCells[hex]; !ok { continue }
        oneByteCells[hex] = append(oneByteCells[hex], cn)
    }
}
```

The 2-byte view's bucketing is unchanged — that view continues to count
only repeaters configured for 2-byte prefixes (those semantics differ).

**UI — `public/analytics.js` L1459:** clarified the 1-byte view
description so the inclusion of multi-byte prefix repeaters is explicit.

## API shape

No response-shape change. `one_byte_cells[HEX]` is still
`[]collisionNode`; only the contents now include 2/3-byte prefix
repeaters in the appropriate first-byte buckets. The existing frontend
decoder is unaffected.

## Tests

-
`cmd/server/routes_test.go::TestHashCollisionsOneByteIncludesMultiBytePrefixRepeaters`
— seeds three repeaters with first byte `CC` configured for 1/2/3-byte
prefixes plus an unrelated `DD` repeater, asserts all three appear in
`one_byte_cells["CC"]`, and that the 2-byte view's `nodes_for_byte` is
unchanged.

Red commit `278bdf8d` (test only) fails on assertion ("got 1, want 3");
green commit `9127ea4e` passes.

## Preflight

`bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master`
→ clean.

Closes #1218

---------

Co-authored-by: clawbot <bot@corescope>
2026-06-04 20:44:19 -07:00
Kpa-clawbot 1a2b8c48be feat(node-detail): link RTC-reset warning to offending packet hashes (#1094) (#1590)
## Problem
Node detail's bimodal-clock warning showed only `⚠️ N of last M adverts
had nonsense timestamps (likely RTC reset)` — no way to tell which
packets, no way to verify the heuristic, no way to drill in.

## Fix
Additive, two-sides:

**Backend** (`cmd/server/clock_skew.go`)
- New type `BadSample { Hash, AdvertTS, SkewSec }`.
- New field `NodeClockSkew.RecentBadSamples []BadSample` (`omitempty`).
- Populated from the **same** bimodal-bad classification pass that
produces `RecentBadSampleCount` — no heuristic change. `tsSkewPair`
carries `hash` + `advertTS` so the classifier can record per-sample
evidence without a second walk; drift code is unaffected (reads only
`ts`/`skew`).

**Frontend** (`public/nodes.js`)
- `bimodalWarning` preserves the existing count summary line, then
renders a `<ul>` of bad samples: each `<li>` is `<a
href="#/packets/HASH">hash[:8]</a> → formatTimestamp(advertTS)` with ISO
tooltip. Defensive `Array.isArray` so older API responses still render
the summary alone.

## TDD
- **Red:**
`cmd/server/clock_skew_issue1094_test.go::TestIssue1094_RecentBadSamples_ExposesHashAndTimestamp`
— seeds 3 healthy + 2 bimodal-bad adverts, asserts `RecentBadSamples`
has length 2 with the expected hashes and advert timestamps. Fails on
the assertion (`len = 0, want 2`) with the stub-only commit.
- **Green:** classifier populates the slice; existing #1285 and bimodal
tests stay green.
- Red commit: `ed501f4b`
- Green commit: `54305b06`

## Cross-stack
Backend + frontend ship together (`cross-stack: justified` commit). API
stays backward compatible (`omitempty` server, `Array.isArray` client)
but the feature only lights up with both halves present.

## Preflight
Clean — PII, branch scope, red-commit, CSS vars, XSS sinks, migrations,
fixture coverage all pass.

## Acceptance
- [x] Warning lists specific packet hashes
- [x] Each hash links to `#/packets/<hash>`
- [x] Bad advert timestamp shown next to the hash
- [x] Pattern is reusable — `BadSample` is a clean shape any future
heuristic that flags specific packets can adopt

Fixes #1094

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-06-04 18:48:27 -07:00
Kpa-clawbot 7533b3b67b feat(nodes): sortable First Seen column on Nodes table (#1166) (#1587)
## Summary

Adds a sortable **First Seen** column to the Nodes table so users can
spot newly observed repeaters in their region (per the reporter's use
case).

Closes #1166

## Backend

`/api/nodes` already exposes `first_seen` per node via `db.scanNodeRow`
(sourced from the existing `nodes.first_seen` column — no schema
migration, no recomputation, no extra query cost). The red test pins
that contract.

## Frontend (`public/nodes.js`)

- New `<th data-sort-key="first_seen" data-sort-default="desc">First
Seen</th>` between Last Seen and Adverts.
- Cell renders via `renderNodeTimestampHtml(n.first_seen)` — same
relative-time + absolute-ISO `title=` tooltip as the Last Seen column.
Empty values render as `—`.
- `sortNodes` gains a `first_seen` branch with **empty-last** semantics:
nodes without a `first_seen` always sort to the bottom regardless of
asc/desc direction, so unknowns never clutter the top of the table.
- Empty-state `colspan` bumped 7 → 8.

## TDD

- **Red commit** `112442f4` — `test-issue-1166-first-seen-column.js` +
`cmd/server/first_seen_1166_test.go`. The backend half passes on red
(field already returned); 5 frontend assertions fail on assertions
(column header missing, sort branch missing, empty-last violated).
- **Green commit** `9274b36c` — only `public/nodes.js`. All 6 tests
pass.

Verified red is real-fail (assertion-shaped) by checking out the red
commit's `nodes.js` and re-running the test: 5 failures, all on
`assert.strictEqual`, none on parse/import.

## Test results

```
node test-issue-1166-first-seen-column.js  → 6 passed, 0 failed
node test-frontend-helpers.js              → 611 passed, 0 failed
go test ./cmd/server/...                   → ok (45.16s, all pass)
```

## Files changed

- `public/nodes.js` (+14 / −1)
- `test-issue-1166-first-seen-column.js` (new)
- `cmd/server/first_seen_1166_test.go` (new)

## Scope guardrails

- No schema migration.
- No new files outside the worktree's three allowed surfaces.
- No refactor of other Nodes columns.
- Empty cells handled in both render (em-dash) and sort (always last).

---------

Co-authored-by: fix-1166-bot <bot@corescope.local>
2026-06-04 16:27:48 -07:00