mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-06-27 05:51:43 +00:00
efd66ea3f527cb9ec243dcdf72ea3170f94af968
378 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
efd66ea3f5 |
feat(mqtt): per-source status endpoint + Observers panel (#1682)
## Summary Adds MQTT source status visibility per #1043 acceptance criteria: - **Ingestor:** per-source counter registry (`cmd/ingestor/source_status.go`) tracking `connected`, `lastConnectUnix`, `lastDisconnectUnix`, `lastPacketUnix`, `connectCount`, `disconnectCount`, `packetsTotal`, `packetsLast5m` (sliding 5-min window via per-second buckets keyed by unix second — no stale-leak), `lastError`. Wired at the existing OnConnect / ConnectionLost / DefaultPublish callsites alongside the liveness watchdog. Idempotent registration so counters survive reconnects. Snapshot emitted in the existing stats file under `source_statuses` (additive, `omitempty`). - **Backend:** new `GET /api/mqtt/status` handler reads the ingestor stats file and returns the per-source list. **Broker passwords are masked** via a regex over the `scheme://user:pass@host` form (covers mqtt/mqtts/tcp/ssl/ws/wss). Mask is also applied to `lastError` as defense-in-depth (broker libs occasionally quote the failing URL). OpenAPI completeness gate satisfied with a `routeDescriptions` entry. - **Frontend:** small self-contained panel (`public/mqtt-status-panel.js`) mounted above the Observers table. Auto-refreshes every 10s, color-codes each row (green = connected + recent packet, yellow = connected idle, red = disconnected), and tears down its timer on SPA route change. ## TDD - Red commit `f19a93b5` — stub `/api/mqtt/status` handler + assertion test that the broker password is `****`-redacted. Test fails on the assertion (handler passes the URL through verbatim). Compile-clean — assertion-fail, not build-fail. - Green commit `77042e41` — `maskBrokerURL` helper + table-driven unit tests across all schemes + handler rewires to mask both `Broker` and `LastError`. - Subsequent commits land the ingestor wiring and the frontend panel. ## Tests ``` $ cd cmd/server && go test -run 'TestMqttStatus|TestMaskBrokerURL' -v ./... PASS: TestMqttStatus_MasksBrokerPassword PASS: TestMqttStatus_EmptyWhenNoStatsFile PASS: TestMaskBrokerURL_Patterns (10 subtests) $ cd cmd/ingestor && go test -run 'TestSourceStatus|TestSnapshotSourceStatuses' -v ./... PASS: TestSourceStatus_BasicLifecycle PASS: TestSourceStatus_Disconnect PASS: TestSnapshotSourceStatuses_ReturnsAll $ node test-mqtt-status-panel.js 7 passed, 0 failed ``` Full `go test ./...` clean in both `cmd/server` and `cmd/ingestor`. ## Preflight overrides - `cross-stack`: justified — issue #1043 is intrinsically full-stack (ingestor stats → server endpoint → observers panel). Per-stack split would land an unreachable endpoint or a fetch with no backend. - `check-xss-sinks` (public/mqtt-status-panel.js:55): justified — the flagged `innerHTML=` is a fully-static literal (empty-state placeholder, no payload data interpolated). All payload-bearing `innerHTML=` sites in this file run through `escapeHTML` (defined in the same file); the test `renderPanel never echoes a plaintext password (defense-in-depth)` exercises the rendered HTML against payload strings. ## Acceptance criteria - [x] `/api/mqtt/status` returns per-source connection state — `cmd/server/mqtt_status.go` - [x] UI panel shows all configured sources with live status — `public/mqtt-status-panel.js` - [x] Connection state updates on reconnect/disconnect events — `MarkConnect` / `MarkDisconnect` wired in `cmd/ingestor/main.go` - [x] Broker URLs don't expose passwords in the API response — `maskBrokerURL` + 13 test cases - [x] Works with 1-N sources — registry is keyed per-source, snapshot iterates the map **Partial fix for #1043** — per-packet `mqtt_source` attribution (the issue's "Follow-up" section) is **deferred** per the `mc-bot-triaged:v1` triage and the autofix comment ("Per-packet attribution deferred to follow-up issue"). That work requires a new observation-row column and DB schema migration, both explicitly out of scope for this PR. Refs #1043 --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
2ef7d2437d |
fix(ci): release fast-path re-tag :edge → :vX.Y.Z when SHA matches (Fixes #1677) (#1680)
## Summary Adds `.github/workflows/release-fast-path.yml`: a metadata-only re-tag workflow that fires on `push.tags: v[0-9]+.[0-9]+.[0-9]+` and, when `:edge`'s `org.opencontainers.image.revision` label matches the tag SHA, applies `:vX.Y.Z`, `:vX.Y`, `:vX`, `:latest` to the existing edge manifest via `crane tag`. No rebuild, no test re-run — ~seconds vs ~30 min today. If the SHA doesn't match (tag points to an older commit, or `:edge` wasn't built yet), it dispatches the existing `deploy.yml` pipeline as a fallback so validated bytes always ship. To prevent double-fire, `deploy.yml`'s top-level `on:` block drops `tags: ['v*']` — `release-fast-path.yml` is now the sole consumer of `push.tags`. Edge publishing on master push is untouched. ## TDD Red commit adds `cmd/server/release_fast_path_workflow_test.go` (two tests: one asserts the new workflow exists with the required trigger/permissions/markers; the other asserts `deploy.yml`'s `on:` block no longer mentions `tags:`). Both fail on assertions in the red commit. Green commit adds the workflow file + edits `deploy.yml`; both pass. ## Acceptance criteria (from #1677) - Tag-CI completes in <2 min when tag SHA == `:edge` revision → fast-path is metadata-only, single short job - Falls back to full pipeline on SHA mismatch → `gh workflow run deploy.yml --ref ${{ github.ref }}` - `:vX.Y.Z` has same digest as `:edge` → `crane tag` copies the manifest, bytes are byte-identical - No regression on older-SHA tags → fallback path runs the unchanged full validation Fixes #1677 --------- Co-authored-by: Kpa-clawbot <bot@corescope.local> |
||
|
|
653d47e03c |
test(openapi): add CI completeness gate for /api routes (Phase 1 of #1670) (#1678)
## Summary Partial fix for #1670 — **Phase 1 only** (CI completeness gate). Phase 2 (backfilling the 18 currently-undocumented routes into `openapi.go`) is deferred to a separate issue per the triage on #1670 and is explicitly out of scope here. ## What this adds - `cmd/server/openapi_completeness_test.go` — AST-walks every non-`_test.go` file in `cmd/server/`, finds string-literal first args to `*.HandleFunc(...)` calls beginning with `/api/`, and diffs against the paths declared in `routeDescriptions()` in `cmd/server/openapi.go`. - `cmd/server/openapi_known_gaps.json` — seeded allowlist of the **18** `/api/` routes currently registered via `HandleFunc` but not yet documented in `openapi.go`. ## Ratchet pattern From this branch forward, `TestOpenAPICompleteness` fails when: 1. A new `HandleFunc("/api/...")` is added without a matching entry in `openapi.go` **or** the allowlist (regression gate — the main goal of Phase 1). 2. A route in the allowlist is *also* documented in `openapi.go` — the allowlist must shrink as Phase 2 backfills land, never go stale. The two-commit history (red → green) demonstrates the gate works: - **Red commit**: adds only the test. Fails on master with the 18 missing routes listed. - **Green commit**: adds the allowlist seeded with that exact 18-route set. Test passes at the current baseline. ## Local verification - `go test ./cmd/server/ -run TestOpenAPICompleteness -v` → PASS at baseline (`44/62 covered; 18 in allowlist; 18 gaps remain`). - Ratchet validation: temporarily inserted `r.HandleFunc("/api/ratchet-test-route", ...)` into `routes.go` → test FAILED with that exact route name; reverted → test PASSES again. ## Files changed - `cmd/server/openapi_completeness_test.go` (+203 / new) - `cmd/server/openapi_known_gaps.json` (+24 / new) ## Preflight `bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master` → all hard gates pass; no warnings. ## Out of scope - Backfilling the 18 allowlisted routes into `openapi.go` (Phase 2 — tracked separately). - Schema validation of the spec against OpenAPI 3.0 (Phase 3 per the issue). - PR template checkbox update (Phase 2 follow-up). Issue #1670 stays open for Phase 2. --------- Co-authored-by: clawbot <bot@corescope.local> |
||
|
|
938153dd92 |
fix(nodes): rebuild relay-hop history on startup from path_json (#1643)
## Problem A relay node's **activity timeline** — and its per-node `packetsToday` / observer counts — collapses to *"only the hour the server restarted"* after every restart. Before the restart the timeline shows only the node's own adverts (~1–2/hr); all of its relay activity piles into the single post-restart hour. ## Root cause All DB cold-load paths (`Load`, `loadChunk`, `scanAndMergeChunk`) index relay-hop attribution into `byNode` **only** from `observations.resolved_path`. But since #1287 the ingestor persists relay data as aggregate `neighbor_edges` and **never writes `resolved_path`** — it is `NULL` on every deployment (verified on a live DB: 0 of ~440k rows populated). So relay attribution is never reconstructed on startup; it only re-accumulates from live traffic (`IngestNew*`, which re-resolves from `path_json` + the neighbor graph), piling a relay node's whole history into the post-restart window. ## Fix Server read-side only — **no schema / ingestor / migration change**. When `resolved_path` is empty, re-resolve relay hops from the already-persisted `path_json` using the in-memory prefix map + neighbor graph (the same `resolvePathForObs` compute the live ingest path already runs). `main.go` now loads the persisted neighbor graph *before* the packet load so resolution has the graph available. Two correctness details worth a close look: 1. **Fetch the prefix-map/graph snapshot BEFORE opening each load cursor.** `getCachedNodesAndPM` issues its own DB query; doing so while a load cursor is open deadlocks on a single-connection SQLite pool (the test harness uses one). 2. **Index into `byNode` ONLY** — not the `resolved_path` / path-hop indexes. Those are cross-checked by `handleNodePaths` against the persisted `resolved_path` column (NULL here); populating them from an in-memory re-resolution would make that SQL confirmation fail and wrongly drop the tx from paths-through (#1352). ## Tests New coverage asserts a relay pubkey reachable *only* via `path_json` lands in `byNode` after a restart-style load, for both the hot-window (`LoadChunked`) and background-window (`loadChunk`) paths. Existing #1558 (`resolved_path`) and #1352 (paths-through) tests still pass. Full `cd cmd/server && go test ./...` is green under `-race`. ## Perf The fallback runs `resolvePathForObs` per observation with a non-empty `path_json` during cold load — the same per-packet compute the live ingest path already performs, so no new asymptotic cost. The prefix map + graph are snapshotted **once per load** (not per row); `getCachedNodesAndPM` is 30s-cached. In `loadChunk` the resolution runs in the existing lock-free scan and is accumulated locally, matching that function's "build local, merge under lock" design. ## Note on a pre-existing flaky test `TestDistanceConcurrentRequestsDuringBuildReturn202` is timing-fragile (fails ~1/15 on `master` without this change). It relies on the lazy distance build being slow because it's the first caller of `getCachedNodesAndPM` (cold cache). This PR pre-warms that cache during `Load`, narrowing the build window, so the test fails more often in **non-race** local runs. It passes reliably under `-race` (CI mode), where the build stays slow. Flagging in case you want to harden the test separately. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: openclaw-bot <openclaw-bot@users.noreply.github.com> Co-authored-by: openclaw-bot <bot@openclaw> |
||
|
|
825b26485c |
fix(#1181): hide nodes whose name starts with a configured prefix (#1655)
Fixes #1181. ## Summary Adds operator-configurable name-prefix hiding for nodes. When a node's name starts with any prefix listed in the new `hiddenNamePrefixes` config field (default `["🚫"]`), it is omitted from `/api/nodes`, `/api/nodes/search`, and `/api/nodes/{pubkey}`. DB rows are preserved — the filter runs at the API layer only, so observation history (paths, hops, distances) stays intact and the node simply re-appears if the operator clears the prefix list. This mirrors the convention already in use on other MeshCore map dashboards: an operator who wants their node hidden renames it with the 🚫 prefix and sends an advert; the next advert is then dropped from the dashboard. The node is **not** hidden from the mesh itself — only from this dashboard. This is documented inline in `config.example.json`. Implementation follows the existing `IsBlacklisted` pattern exactly: a new `Config.IsNameHidden(name)` method, and three filters in `routes.go` placed alongside the corresponding blacklist filters. No DB schema, public API, or websocket changes. ## Files changed - `cmd/server/config.go` — new `HiddenNamePrefixes []string` field + `IsNameHidden` method - `cmd/server/routes.go` — filters in `handleNodes`, `handleNodeSearch`, `handleNodeDetail` - `config.example.json` — new field + `_comment_hiddenNamePrefixes` operator doc - `cmd/server/hidden_name_prefix_1181_test.go` — new test file (red → green) ## Test plan Two new subtests in `TestHiddenNamePrefix_1181_*`: 1. `_NodesList` — inserts a node named `🚫 ban me`, asserts it is present when `HiddenNamePrefixes` is empty and absent when set to `["🚫"]`. 2. `_Search` — inserts `🚫 search me`, asserts `/api/nodes/search?q=search` does not surface it when the prefix is configured. Verified red→green: - Red commit `d0903852`: `go test -run TestHiddenNamePrefix_1181` fails on the leak assertion (`hidden_name_prefix_1181_test.go:94`). - Green commit `e79a0d8d`: same command passes. ``` $ cd cmd/server && go test -run TestHiddenNamePrefix_1181 -count=1 . ok github.com/corescope/server 0.060s ``` ## Out of scope - Auto-purging DB rows for hidden nodes — left to existing retention. The triage was explicit: hide, do not delete. - Live websocket broadcast: nodes are not broadcast via websocket (only packets), so no separate emit path needs filtering. Frontend reads nodes via `/api/nodes`, which is filtered. - Frontend customizer for the prefix list — operators configure via `config.json` like every other knob. |
||
|
|
e04c7113cb |
feat: integrate hashtag channels from meshcore-channels catalogue (#1323) (#1656)
Fixes #1323 ## Summary Adds a small in-memory cache of the community-maintained hashtag-channels catalogue (`marcelverdult/meshcore-channels`) and exposes it as `GET /api/known-channels?region=XX` plus a collapsed sidebar section on the Channels view ("Known channels (catalogue)") with a one-click "+ Add" button per row. Per triage (#1323): new `cmd/server/known_channels_cache.go`, new `GET /api/known-channels?region=…`, frontend section in `public/channels.js`. No new DB tables — cache is in-memory only. ## What changed - `cmd/server/known_channels_cache.go` — `knownChannelsCache` with an atomic snapshot pointer, 24h default refresh, 30s HTTP timeout, 4 MB body cap, custom `User-Agent`. Fail-soft: a failed refresh leaves the last-known snapshot in place. Background goroutine started from `main.go` after the neighbor-graph recomputer; never blocks startup. - `cmd/server/known_channels_route.go` — `GET /api/known-channels?region=` serves the cached snapshot off the atomic pointer (never blocks on upstream). Region filter is case-insensitive ISO 3166-1 alpha-2. Empty/missing cache returns 200 with an empty entries list (fail-soft for the UI). - `cmd/server/config.go` — `KnownChannelsURL` + `KnownChannelsRefreshMs`. - `config.example.json` — example values + `_comment_knownChannels`. - `public/channels.js` — new collapsed sidebar section "Known channels (catalogue)" that lazy-fetches `/api/known-channels` on first render and renders rows with a "+ Add" button. The button calls the existing `addUserChannel(name)` path, so adding catalogue channels reuses the full save-key + decrypt flow that user-typed hashtags already use. - `cmd/server/known_channels_cache_test.go` — failing-first tests: - `TestKnownChannelsParseFixture` asserts the parser populates `GeneratedAt`/`License` and region-stamps every entry while skipping empty countries. - `TestKnownChannelsRouteRegionFilter` asserts the route returns 200 with exactly the filtered subset for `?region=be`. - `TestKnownChannelsFailSoftOn500` asserts a failed upstream fetch leaves the prior snapshot in place and bumps `failCount`. ## Upstream pinning The default URL is pinned to the specific file `channels-by-country.json` on `main`: > https://raw.githubusercontent.com/marcelverdult/meshcore-channels/main/channels-by-country.json Shape (verified 2026-05-24): ```json { "generated_at": "...", "license": "CC0-1.0", "countries": { "be": [{"channel": "#antwerpen", "description": "..."}], ... } } ``` ## Test plan ``` cd cmd/server && go test -run 'TestKnownChannels' -count=1 . ok github.com/corescope/server 0.008s ``` Red commit: |
||
|
|
1116801b2f |
M5: emoji → Phosphor Icons — settings & customize (#1648) (#1653)
**Red commit:** `851cc8c3a024b1675558092d772444bf4f1ec625` — failing test on a stub branch (will link CI run after PR opens). Partial fix for #1648 (M5 of 6). **Do NOT close the tracking issue** — M6 (server-side residual emoji sweep + lint gate) still pending. ## Per-file swap counts | File | Phosphor `<use>` refs | Notes | |---|---|---| | `public/customize.js` | 20 | DEFAULTS → `ph:<name>` tokens; render path keeps legacy emoji branch (back-compat) | | `public/customize-v2.js` | 26 | same as v1; cv2 overrides path unchanged | | `public/home.js` | (helpers added) | `_renderHomeGlyph` / `_renderHomeLabel` accept both `ph:<name>` and legacy emoji | | `public/geofilter-builder.html` | 5 | clear / undo / save / load buttons (+inline `.ph-icon` CSS) | | `public/audio.js` | 1 | audio unlock prompt | | `public/filter-ux.js` | 5 (3 new) | help popover star + close, saved-filter delete | | `public/style.css` | 0 | `#chList .ch-share-btn::before { content: '📤' }` removed; JS now renders an inline sprite | | `cmd/server/routes.go` | (6 `ph:` tokens) | onboarding home defaults updated in lockstep with customize-v2.js | ## Operator config back-compat — PROMINENT Per design call #1 (user-locked): existing operator-stored emoji values in `config.json` / `localStorage` are **NOT** touched. The render path supports both: ```js function renderConfigGlyph(value) { var m = String(value || '').match(/^ph:([a-z][a-z0-9-]+)$/); if (m) return '<svg class="ph-icon"><use href="/icons/phosphor-sprite.svg#ph-' + m[1] + '"/></svg>'; return esc(value); // EMOJI-OK-LEGACY-RENDER — operator-stored emoji/text path } ``` Defaults flipped to `ph:<name>` tokens, so new operators (and operators who hit "Reset to Defaults") see Phosphor sprites. Operators with stored emoji values continue to see their emoji exactly as before. Verified end-to-end (see E2E (b) below). ## cmd/server/routes.go — changed in lockstep Per design call #2: the home-defaults `steps` / `footerLinks` mirror the JS DEFAULTS, so they MUST update together. routes.go now emits `ph:<name>` tokens; the frontend home-render path resolves them. Existing tests (`TestConfigThemeHomeDefaults`) still pass — they assert structure, not glyph values. ## E2E assertions added - `test-issue-1648-m5-emoji-scan.js` — per-file zero-emoji + ph-token DEFAULTS + sprite presence - `test-issue-1648-m5-icons-e2e.js`: - (a) customize chrome — tabs/header rendered as sprites; chrome text icon-free - **(b) back-compat — injects fake `🐙` operator step into localStorage, reloads, opens customize, asserts the emoji renders verbatim in both the input value AND the live preview span; asserts the ph-token step renders as a sprite** (design call #1 in action) - (c) `/channels` modal sprite count - (d) `/audio-lab` sprite presence - (e) `geofilter-builder.html` control buttons sprite-driven - (f) every `<use>` resolves to a defined symbol id ## Out of scope (M6 cleanup) - cmd/server/routes.go residual server-rendered emoji **not** tied to customize defaults (none found by my grep — file already audited) - `make lint-no-emoji` CI grep gate (M6 owns it) - `public/icons/README.md` workflow doc cross-stack: justified — design call #2 requires Go + JS update together. --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
8295c2115c |
fix(reach): bust response cache on blacklist change (#1629) (#1636)
Red commit:
|
||
|
|
078225a54e |
perf(neighbor_api): fold first_seen into cached map — fix #1627 r3 regression (#1632)
## TL;DR Post-merge regression introduced by #1627 r3 (commit `e2212f50`): `buildNodeInfoMap` in `cmd/server/neighbor_api.go` ran an uncached `SELECT … FROM nodes` scan on every call. Folded `first_seen` into the already-cached `getCachedNodesAndPM` (30s TTL) so the 4 hot handlers that call `buildNodeInfoMap` no longer pay for a full table scan per request. ## Before / After `buildNodeInfoMap` is called by **4 hot handlers**: - `cmd/server/neighbor_api.go:130` - `cmd/server/neighbor_api.go:297` - `cmd/server/neighbor_debug.go:83` - `cmd/server/node_reach.go:421` | | Before | After | |---|---|---| | `SELECT … FROM nodes` per call | 1 (uncached) | 0 (cache hit) | | `SELECT … FROM observers` per call | 1 (uncached) | 1 (unchanged) | | At Cascadia scale (~2600 nodes) | full scan × 4 handlers × N req/s | one scan / 30s | ## How - Extended the `getAllNodes` schema probe to also `COALESCE(first_seen, '')`. Falls back through the existing richest → leanest ladder if the column is missing. - `nodeInfo.FirstSeen` is therefore populated for every cached entry in `getCachedNodesAndPM`. - `buildNodeInfoMap` drops its second `SELECT` entirely and just copies `nodeInfo` values out of the cached map. - Public signature of `buildNodeInfoMap` is unchanged. `node_reach.go:421` still sees `nodeInfo.FirstSeen` populated, served from cache. `cmd/server/store.go` is touched because `getAllNodes` is the only sensible owner of the `first_seen` SELECT — adding a parallel cache would duplicate the 30s TTL machinery this fix is designed to leverage. ## Test (red → green) - Commit 1 (`test:`): `TestBuildNodeInfoMap_FirstSeenIsCached` — calls `buildNodeInfoMap`, mutates `first_seen` out-of-band via a separate rw connection, calls it again, and asserts both calls return the same (cached) value. Fails on `origin/master` (call 2 sees the mutated value, proving the uncached scan). - Commit 2 (`perf:`): the fold. Test now passes. ## Refs Post-merge audit identified this as the only MAJOR finding from #1627; recommendation was a follow-up hot-fix PR. This is that PR. --------- Co-authored-by: openclaw-bot <bot@openclaw> Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
43be1bb76a |
fix(reach): scanReachRows DB errors must surface as 500 not 404 (#1631) (#1635)
Red commit:
|
||
|
|
e2212f5015 |
feat(nodes): per-node Reach page + GET /api/nodes/{pubkey}/reach (v2, review-complete) (#1627)
Re-submission of #1625 (which was merged early, then reverted in #1626) — now with **all three round-1 reviews addressed** so it lands in one hardened state instead of as post-merge follow-ups. ## What Per-node **Reach** view: a standalone page (`#/nodes/{pubkey}/reach`) + a node-detail section + `GET /api/nodes/{pubkey}/reach`. It shows which nodes a node has a **stable two-way RF link** with, derived from raw `path_json` adjacency (a path travels origin→observer, so `[A,B]` ⇒ B heard A). A link is bidirectional when both directions have observations; the **bottleneck** (weaker direction) rates two-way reliability. Nodes are identified only by **unique 2–3 byte** path prefixes (1-byte collides → excluded). ## Review fixes folded in vs #1625 **Performance (Carmack):** hard scan LIMIT (200k) + modest prealloc; `json.Unmarshal` replaced by a single-pass `parsePathTokens` (100k-row scan 2.2M→1.3M allocs, 344→203ms); memoized resolver; size-hinted maps (attribution over 100k rows: 102 allocs); `context.Context` plumbed; cache `RWMutex` + evict-oldest (no full wipe); singleflight dedup; degree/rank from a 60s shared snapshot; bench rewritten (ReportAllocs, 1k/10k/100k, mixed-payload, isolated attribution). **Correctness/safety + tests (Independent + Kent Beck):** pubkey validation → 400; error logging instead of silent swallow (first_seen / degree / marshal→500 / discarded rows); `public_key=?` index use; canonical `PayloadADVERT`; `min()` builtin; documented cache-slice immutability; mux ordering comment. New tests: scanReachRows decode, 3-byte token branch, non-advert first-hop guard, observer SNR aggregation across rows, HTTP-level attribution (asserts non-zero we_hear/they_hear), 400/404/blacklist/cache-hit. **UI / a11y / Tufte:** in-map legend (tiers + thresholds); dropped the colour+width double-encoding (constant width, colour-only); colour-blind glyphs (●●●/●●/●) + tier title beside the bottleneck number; dark-theme `--link-*`; lighter table (horizontal rules, sentence-case headers); map built once + link layer updated in place on toggle (no flicker); time-range no longer flashes a loader; `destroy()` generation guard; statCard escaping; scoped `@media print` to `#nq-report`; `fieldset/legend` + `for/id` toggles; `aria-pressed` / `aria-live` / back-link `aria-label`; "distance (km)" + bottleneck tooltip + no-GPS note; inline styles → CSS; decorative emoji removed. **Docs:** api-spec documents the 5-min cache, 200k scan cap, and 400. ## Testing - `cmd/server` full suite green; reach unit + endpoint + bench all pass. - `eslint public/*.js` (no-undef) and the XSS-sink gate clean. - E2E updated: request status checks + exact (non-tautological) toggle assertions + hard map-render assert. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- ## TDD-history note (Kent Beck gate) This branch carries production + tests together, not a fabricated red→green sequence. That's deliberate: the branch was rebased onto upstream and the intermediate SHAs were squashed, so reconstructing a "failing-test-first" commit after the fact would be theatre, not evidence — and rewriting history to stage it would be dishonest. The behaviour is instead covered by a comprehensive, anti-tautological suite (directional attribution edges, 3-byte token branch, non-advert first-hop guard, observer SNR aggregation, HTTP-level attribution asserting non-zero counts, scan-cap truncation, zero-reach 200-not-404, companion mis-attribution, cache eviction). Requesting maintainer acceptance of the work on test *substance* rather than commit *choreography*; the net-new-UI exemption is not claimed for the server endpoint. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: meshcore-bot <bot@meshcore> |
||
|
|
9c5faab1e4 |
Revert "feat(nodes): per-node Reach page (#1625)" (#1626)
Reverts #1625. #1625 was merged before the round-1 reviews (Independent / Kent Beck / Tufte) were addressed. Reverting to land it cleanly: a fresh PR will re-add the feature with the perf pass, the backend correctness/safety + test-coverage fixes, and the UI/a11y (Tufte) batch folded in, so it goes through review in a single hardened state rather than as a string of post-merge follow-ups. No functional loss — the feature returns in the replacement PR. |
||
|
|
47f85f6c4c |
feat(nodes): per-node Reach page + GET /api/nodes/{pubkey}/reach (directional link quality) (#1625)
## What
Adds a per-node **Reach** view that answers "how well does this specific
node hear, and get heard by, its neighbours?" — both as a standalone
page (`#/nodes/{pubkey}/reach`) and as a section on the node detail
page.
New endpoint: **`GET /api/nodes/{pubkey}/reach`**.
## What it measures
For the target node it derives, from raw `path_json` adjacency (a path
travels origin→observer, so in `[A,B]` B received A directly):
- **Directional link counts** per neighbour: `we_hear` (how often we
received them) vs `they_hear` (how often they received us).
- **Bidirectional / bottleneck**: a link is two-way stable when both
directions > 0; the weaker direction is the bottleneck and rates real
two-way reliability.
- **Importance**: neighbour degree + rank, relay-observation volume,
bidirectional-link count, direct-observer count.
- **Direct observers**: who received the node at 0 hops, with SNR.
Reliability rule: a neighbour is only attributed when its pubkey
**prefix is unique** at the path's byte length (collisions are skipped,
never misattributed).
## UI
- Standalone Reach page + node-detail section.
- Reusable bidirectional link map (OSM) with links coloured by
bottleneck.
- Incoming/outgoing toggles to isolate each direction.
## Naming note (deliberate, no collision)
This is distinct from the existing **per-observer reachability** in
topology analytics (`ReachNode` / `ObserverReach` / `perObserverReach`).
This PR adds its own `NodeReach*` response structs in a new
`node_reach.go` and a new `/api/nodes/{pubkey}/reach` route — there are
no symbol or route collisions (verified: `go build ./...` clean). Happy
to rename to disambiguate further (e.g. "Link Quality") if you'd prefer
to reserve "Reach" for the per-observer feature.
## Testing
- `cmd/server`: endpoint shape/404/limit-clamp + unit tests for token
derivation and directional attribution, plus a scan benchmark — all
pass.
- Frontend: helper tests + Reach-page E2E (`test-node-reach-e2e.js`),
standalone route + incoming/outgoing toggles.
- `go build ./...` and `eslint public/*.js` (no-undef) clean.
## Docs
Design spec, implementation plan, and the `GET
/api/nodes/{pubkey}/reach` API contract are included under `docs/`.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
|
||
|
|
a4776557ae |
feat(#1290): use firmware repeat:on|off hint to exclude listener-only observers from disambiguator (#1624)
Closes #1290. cross-stack: justified — backend persists firmware-side `repeat` hint to a new observers column, frontend surfaces the listener/repeater status as a badge on the observers list and node-detail Heard By table per the issue's UI acceptance criterion. ## What Firmware 1.16 publishes a `repeat: on|off` flag in the MQTT `/status` JSON (confirmed by @cwichura on the issue thread — see [`MQTTMessageBuilder.cpp:58`](https://github.com/agessaman/MeshCore/blob/b45373a31f111fb0de98bb3b168226d09ceadc47/src/helpers/MQTTMessageBuilder.cpp#L58) in `agessaman/MeshCore mqtt-bridge-implementation-flex`). Listener-only observers (`repeat:off`) by firmware contract never relay packets, so they cannot legitimately be a hop in someone else's resolved path. This PR plumbs the hint end-to-end so the disambiguator stops considering them. ## How * **`internal/dbschema`**: idempotent `can_relay INTEGER DEFAULT 1` migration on `observers`, plus `AssertReady` probe (server fatal-logs if absent). Mirrored in `cmd/ingestor/db.go` `CREATE TABLE` for fresh DBs. Annotated `PREFLIGHT: async=true` — `DEFAULT 1` is constant so SQLite does this as a metadata-only schema rewrite. * **`cmd/ingestor`**: `extractObserverMeta` accepts `repeat` as bool, case-insensitive string (`on|off|true|false|yes|no`), or numeric `0|1`. Missing field → `nil` → `COALESCE` preserves the existing column value (back-compat with legacy observers). Plumbed through `UpsertObserverAt` and the prepared upsert statement. * **`cmd/server`**: `GetNonRelayObserverPubkeys` + new `prefixMap.markNonRelay` drop matching candidates inside `pm.resolveWithContext` at the top of the resolver, so all 4 tiers see the pruned candidate set. `ObserverResp.CanRelay` is surfaced on `/api/observers` and `/api/observers/{id}`. `GetNodeHealth` enriches per-observer rows with `can_relay` so the node-detail badge renders. Probe-and-fall-back when the `can_relay` column is absent (legacy test fixtures). * **`public/`**: listener vs repeater pill on observers list, observer detail `Relay` stat card, and node-detail `Heard By` table. CSS uses existing theme vars. ## Test Added `TestResolveWithContext_ExcludesNonRelayObservers_Issue1290` in `cmd/server/resolve_non_relay_1290_test.go` covering all three required cases: * `repeat:off` pubkey → not a candidate (assertion failed in red commit `5f7fdb96`, passes after green `f12911dc`) * `repeat:on` pubkey → still a candidate (regression guard) * legacy obs (no field) → still a candidate (back-compat) Red→green proof: ``` $ git log --oneline origin/master..HEAD |
||
|
|
3d12266595 |
fix(#1608): address PR #1609 follow-up findings — config doc, receipt-time liveness, buffer stop/clamp warn (#1623)
Follow-up to #1609 / #1608. Addresses the 5 unresolved findings from the PR #1609 round-1 polish review. ## Findings addressed | Tag | Severity | Fix | Commits | |-----|----------|-----|---------| | **B1** | BLOCKER | Document `ingestBufferSize` in `config.example.json` near other ingestor knobs. Default `50000`, comment text from review. | `f0b4e411` | | **M1** | MAJOR (option 1 from review) | Split receipt-time vs post-write liveness: add `SourceLivenessState.LastReceiptUnix` + `MarkReceipt`, stamp at the MQTT receipt callback, leave `LastMessageUnix` post-write only. Drop the double-stamp at receipt that masked write-path stalls. Surface both clocks via the ingestor stats file (`source_liveness`) and the server's `/api/healthz` (`ingest_liveness`, additive — older builds unaffected). | RED `fa78233d` / GREEN `bc81b544` | | **M1 (drop-log)** | MAJOR | Log every drop when buffer is at capacity. Removes the `n==1 \|\| n%1000` throttle that hid the first stall behind 1000 lost packets. The Submit drop branch only fires when the channel is at cap so volume is naturally bounded by the stall, not by an arbitrary modulo. | RED `a468763e` / GREEN `7b24fce5` | | **m1** | MINOR | Add `IngestBuffer.Stop()` and `Done()` so tests stop leaking the consumer goroutine that `Start()` spawns. Existing tests gain `t.Cleanup(b.Stop)`. Drain semantics: stop-before-Ready exits immediately; stop-after-Ready best-effort drains queued jobs. | RED `8430c822` / GREEN `78c9b223` | | **m2** | MINOR | `NewIngestBuffer(<1)` now logs a `[ingest-buffer] WARN` line on clamp so misconfigured `ingestBufferSize` values are visible instead of silently running a 1-slot queue. Test captures log output. | RED `62119ab4` / GREEN `815bfd02` | | **m3** | MINOR | Add godoc to `Submit` and `Ready` documenting the Start-before-Submit / Start-before-Ready ordering invariant. | `564a813b` | ## TDD discipline Each behavioral fix (M1, M1-drop-log, m1, m2) lands as a red-then-green pair. Red commits compile + run + fail on assertion, verified locally before the green commit. Per-finding red→green pairs are visible in the commit graph above. B1 and m3 are docs-only and ship as single commits (preflight script accepts them under the docs/comments exemption). ## Schema compatibility `/api/healthz` change is purely additive: `ingest_liveness` is only included when the ingestor publishes the new `source_liveness` field, so older ingestor + newer server combos are unaffected. Field order in the response stays stable for prior consumers. ## Test output - `go test -count=1 -timeout 180s ./cmd/ingestor/...` → green (160s) - `go test -count=1 -timeout 300s ./cmd/server/...` → green (48s) - Race-mode runs of the touched packages (`IngestBuffer|Liveness|Watchdog|Receipt|Healthz`) → green - Full-package race runs locally exceed the brief's 120s timeout on pre-existing slow integration tests (TestObsTimestampIndexMigration, TestNeighborEdgesBuilderDeltaScan); CI has the headroom. ## Preflight `bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master` → all hard gates pass, no warnings. ## Files changed - `config.example.json` — B1 - `cmd/ingestor/ingest_buffer.go` — m1, m2, M1-drop-log, m3 - `cmd/ingestor/ingest_buffer_test.go` — m1, m2, M1-drop-log - `cmd/ingestor/mqtt_watchdog.go` — M1 - `cmd/ingestor/mqtt_watchdog_m1_test.go` — M1 (new) - `cmd/ingestor/main.go` — M1 (receipt callsite) - `cmd/ingestor/stats_file.go` — M1 (publish `source_liveness`) - `cmd/server/perf_io.go` — M1 (type + reader) - `cmd/server/healthz.go` — M1 (surface `ingest_liveness`) Original review reference: PR #1609 polish review by the M-axis bot. --------- Co-authored-by: corescope-bot <bot@corescope.local> |
||
|
|
bc1822e46c |
perf(load): chunked Load with early HTTP readiness (#1009) (#1596)
## What Switches the server's startup from a synchronous full-scan `PacketStore.Load()` to a chunked `LoadChunked(chunkSize)` that: 1. Streams transmissions+observations from SQLite in id-ordered chunks (default `chunkSize=10000`, configurable via `db.load.chunkSize`). 2. Closes `FirstChunkReady()` after the first chunk is merged — `main.go` binds the HTTP listener on that signal instead of blocking on the full multi-minute load. 3. Stamps `X-CoreScope-Load-Status: loading; progress=<rows>` on every response while LoadChunked is in flight, flipping to `ready` once it completes (via `loadStatusMiddleware`). 4. Preserves the existing retention/`hotStartupHours`/`maxMemoryMB` clamps and the post-load index rebuild (`pickBestObservation` / `buildSubpathIndex` / `buildPathHopIndex` / `buildDistanceIndex`). ## Why Per #1009: at 5M+ observations (Cascadia scale) the synchronous Load blocked HTTP for ~80s with a 2–3× steady-state RAM peak. With chunked load the listener binds within seconds; dashboards and probes can read partial data and see the `loading` status header until the background load finishes. ## Notes - `/api/healthz` readiness gate (`readiness` atomic, init `WaitGroup`) is unchanged — it still waits for neighbor-graph build + initial `pickBestObservation` before reporting `ready:true`. `LoadChunked` only changes when the listener BINDS, not when it advertises ready. - `cmd/server/main.go` waits for `FirstChunkReady` (or the full load on a tiny DB) before proceeding, and drains the load goroutine in the background with a logged error path. - Config Documentation Rule: `config.example.json` now documents `db.load.chunkSize` with a nested `_comment` describing the trade-off. ## Tests - `cmd/server/chunked_load_test.go` asserts: - (a) `FirstChunkReady` fires before `LoadChunked` returns - (b) `X-CoreScope-Load-Status` transitions `loading; progress=...` → `ready` - (c) `chunkSize` honored (2500 rows @ 1000 → 3 chunks via `OnChunkLoaded`) - (d) `Config.DBLoadChunkSize()` default 10000 + override - Red commit (`102a4c84`) lands the tests with stubs that fail on assertion — verified locally before the green commit. - Green commit (`35cecf16`) makes all four pass; full `cmd/server` suite green (47s locally). Closes #1009 ## TDD red-commit exemption The original red commit `f878e15e` ("test(load): failing tests for chunked Load + early HTTP readiness") fails to **compile** rather than failing on an assertion, because it references symbols (`store.LoadChunked`, `store.FirstChunkReady`, `store.OnChunkLoaded`, `Config.DBLoadChunkSize`, `loadStatusMiddleware`) that do not exist on master. Per `AGENTS.md` the bar is "MUST fail on an assertion ... A compile error is NOT a valid red commit." This is claimed under the **net-new surface** exemption with the following justification: - LoadChunked / FirstChunkReady / loadStatusMiddleware / DBLoadChunkSize are all introduced by this PR — no prior implementation existed to refactor. There is no behaviour on master that the red commit could meaningfully assert against without first declaring the new symbols. - The cheapest "proper" alternative (split the red into two commits: stub-first + assertion-fail) was deferred because the test file unambiguously fails on missing-symbol — there is no risk of the test becoming a tautology against a pre-existing stub. - **Behaviour gating IS proven elsewhere on this branch.** Commit `799bde49` ("test(load): red — LoadChunked must mark indexes ready + not flip Complete on error") is a proper assertion-fail red against the same package, and commit `92cadd1d` is the matching green. Reviewers can verify the red→green pattern there. If a future reviewer wants the strict pattern, the follow-up is mechanical: split `f878e15e` into a stub-only commit followed by the assertion commit. Not done here to keep the rework cost proportional to the risk (zero, in this case). ## Preflight overrides - check-async-migrations: justified — the flagged `CREATE TABLE`/`CREATE INDEX` statements live in `cmd/server/chunked_load_id_zero_test.go` and `cmd/server/chunked_load_oldest_test.go` only. They run against per-test `t.TempDir()` SQLite files (in-process, ~10 rows, lifetime = single test) — they are NOT production schema migrations. No prod table is touched. PREFLIGHT-MIGRATION-SCALE: <30s N=10 (per-test tempdir fixture). --------- Co-authored-by: CoreScope Bot <bot@corescope.local> Co-authored-by: clawbot <bot@noreply.example.com> Co-authored-by: Kpa-clawbot <bot@example.com> Co-authored-by: Kpa-clawbot <bot@kpa-clawbot> |
||
|
|
7421ead9b0 |
fix: bypass API limit clamps for internal UI requests. Revisit of issue #1540 (#1589)
This PR replaces the strict, hardcoded limits on API list endpoints (introduced in the recent security patch) with a new operator-configurable `listLimits` block. This change is needed as issue 1540's implementation introduced a 500max node limit on the live map or any other function that leverages the api/nodes backend. Previously, we attempted to bypass public caps for internal UI requests using a heuristic based on browser headers (`Sec-Fetch-Site`). Following review, we decided to drop that heuristic entirely to eliminate any security-by-browser-convention surface area. Instead, `queryLimit()` returns to its original, mathematically simple bounds-checking shape, and the absolute maximums are now drawn from `config.json`. This provides equal DoS protection against all callers while allowing server operators to tune the ceilings based on the size of their mesh (e.g. embedded devices can tighten the knobs, regional hubs can raise them). ### Changes Made: - **`config.go`**: Introduced a `ListLimits` config struct containing `PacketsMax`, `NodesMax`, `AnalyticsMax`, and `ChannelMessagesMax`. Added safe initialization to ensure default caps (10000, 2000, 200, 500 respectively) apply even if the block is omitted from the config. - **`clamp_limit.go`**: Deleted `isInternalUIRequest` entirely and restored `queryLimit` to its original signature (`r, def, max`). - **`routes.go`**: Replaced all hardcoded integer ceilings on list endpoints (`/api/packets`, `/api/nodes`, etc.) with `s.cfg.ListLimits.*`. - **`config.example.json`**: Added the `listLimits` block with documentation to guide new operators. - **`clamp_limit_test.go`**: Purged all header-heuristic testing. ### Verification: - All 611 backend unit tests pass (`npm run test:unit`). - Bounds-checking math continues to enforce hard DoS clipping exactly at the operator's specified configuration limit. --------- Co-authored-by: mc-bot <bot@openclaw.local> Co-authored-by: openclaw-bot <bot@openclaw> |
||
|
|
1bdb92de88 |
feat(#1574): operator-configurable liveMap.maxNodes (default 2000) (#1577)
Red commit:
|
||
|
|
ad41b9bb7b |
fix(tests): subpaths_window tests wait for index readiness after #1595 chunked load (#1621)
## Why master is red After PRs #1592 (route-window subpath regression test) and #1595 (background/chunked index build with 503 readiness gate) were merged together, two tests in `cmd/server/subpaths_window_test.go` started failing on master: ``` --- FAIL: TestSubpathsHonorsTimeWindow_StoreLevel subpaths_window_test.go:70: unbounded: expected totalPaths=2, got 0 (subpaths=[]) --- FAIL: TestSubpathsHandlerHonorsTimeWindow subpaths_window_test.go:116: GET /api/analytics/subpaths?...: status=503 body={"error":"index loading","retryAfter":5} ``` Both branches passed in isolation; the conflict only manifested post-merge. Reason: - **#1592** added tests that call `store.Load()` then immediately query `GetAnalyticsSubpathsWithWindow` / hit `/api/analytics/subpaths`. - **#1595** moved the subpath + path-hop index builds off the critical path of `Load()` into background goroutines, and hard-gated the analytics handlers behind `SubpathIndexReady()` (returning 503 + `Retry-After: 5` until the build completes). So after `Load()` returns, `s.spIndex` is still empty for a short window and the handler returns 503. The store-level test sees `totalPaths=0`; the handler test sees the 503. ## Fix (test-only) Add `store.WaitIndexesReady(5 * time.Second)` between `Load()` and the assertions in both tests. This matches the established pattern already used by `routes_test.go` and `repeater_enrich_recomputer_1008_test.go`. The 503 readiness gate from #1595 is intentional production behavior and is **not** touched. No production code is modified. ## Repro Before: ``` $ go test ./cmd/server/ -run TestSubpaths.*Window -v -count=1 --- FAIL: TestSubpathsHonorsTimeWindow_StoreLevel (0.01s) subpaths_window_test.go:70: unbounded: expected totalPaths=2, got 0 (subpaths=[]) --- FAIL: TestSubpathsHandlerHonorsTimeWindow (0.02s) subpaths_window_test.go:116: GET /api/analytics/subpaths?minLen=2&maxLen=8: status=503 body={"error":"index loading","retryAfter":5} FAIL ``` After: ``` $ go test ./cmd/server/ -run TestSubpaths.*Window -v -count=3 --- PASS: TestSubpathsHonorsTimeWindow_StoreLevel (0.01s) --- PASS: TestSubpathsHandlerHonorsTimeWindow (0.02s) ... (x3) ... PASS ok github.com/corescope/server 0.097s $ go test ./cmd/server/ -count=1 -timeout 300s ok github.com/corescope/server 46.292s ``` ## Files changed - `cmd/server/subpaths_window_test.go` (+11 lines, test-only) ## Notes - TDD exemption: this is a test-fix PR for a merge-conflict-induced failure. The "failing test" already exists on master; this PR makes it pass correctly by waiting on the readiness gate the test was previously unaware of. - Unblocks staging deploys. Co-authored-by: openclaw-bot <bot@openclaw> |
||
|
|
222bfdf6cf |
feat(perf): SQLite writer-lock wait/hold instrumentation per component (#1340) (#1594)
## What Per-component SQLite writer-lock instrumentation so the next neighbor-builder-style write-lock starvation (root cause of #1339, invisible to operators for ~3 days) is detectable from `/api/perf`. Adds `Store.WriterExec` / `Store.WriterTx` wrappers that gate every wrapped call on a package-level `writerMu` so the wait the SQLite driver hides becomes Go-visible, and record `wait_ms` + `hold_ms` + `contention_total` (wait_ms > 100ms) under a component tag. Per-component p50/p95/p99 + max are published to `/api/perf/write-sources` under `.writer_perf` via the existing ingestor stats-file path. Slow-writer log line (`[db-slow-writer] component=X duration=Yms query=<200ch>`) fires on `hold_ms > 500ms` (threshold overridable via `CORESCOPE_DB_SLOW_WRITER_MS` env var). ## Tagged call sites | Component | Location | |-----------|----------| | `mqtt_handler` | `InsertTransmission` (db.go) | | `neighbor_builder` | `buildAndPersistNeighborEdges` (neighbor_builder.go) | | `prune_packets` | `PruneOldPackets` (maintenance.go) | | `prune_observers` | `RemoveStaleObservers` + orphan-metrics cleanup (db.go) | | `prune_metrics` | `PruneOldMetrics` (db.go) | | `vacuum` | `RunIncrementalVacuum` + `CheckAutoVacuum`'s full VACUUM (db.go) | ## TDD red→green - **Red commit** `68de585b` — `cmd/ingestor/db_writer_perf_test.go` + `Store.Writer*` stubs at end of `db.go`. Test synthetically blocks the writer for 60s tagged `neighbor_builder`, then asserts `mqtt_handler.wait_ms.p99 > 50000ms` on concurrent inserts. Fails on the assertion (p99 = 0.0ms) with the stub — not a build error. - **Green commit** `6a9be174` — replaces stubs with real wait/hold/contention aggregator + wires every writer call site. Same test passes: ``` 2026/06/05 04:36:47 [db-slow-writer] component=neighbor_builder duration=60059.0ms query=COMMIT --- PASS: TestWriterStarvationVisibleInPerf (60.40s) PASS ok github.com/corescope/ingestor 60.408s ``` ## Scope discipline - **API**: no public `Store`/`DB` signature change. Only additive exports. - **Server**: extends existing `/api/perf/write-sources` JSON with `.writer_perf` — does **not** add a new route, does **not** replace `handlePerf`. Empty `.writer_perf` map when paired with an older ingestor. - **Read/write invariant** (#1283) preserved: all instrumentation lives on the ingestor's writer connection. - **Files touched** (6 total): `cmd/ingestor/db.go`, `cmd/ingestor/db_writer_perf_test.go`, `cmd/ingestor/maintenance.go`, `cmd/ingestor/neighbor_builder.go`, `cmd/ingestor/stats_file.go`, `cmd/server/perf_io.go`, `config.example.json`. ## Deferred (acceptance items NOT in this PR) - **`mbcap_persist` component tag** — `RunMultibyteCapPersist`'s tx is intentionally NOT wrapped in this PR to stay within the implementation brief's 3-files-outside-whitelist budget. One-file follow-up to instrument. - **CI smoke test** asserting "neighbor-builder hold_ms < 1000ms on 100k-obs fixture" — deferred to a separate PR per the brief; this PR is scoped to instrumentation only. ## Preflight overrides PREFLIGHT-MIGRATION-SCALE: <30s N=runtime — the async-migration gate flagged five `instrumentedExec` / wrapped-`tx.Exec` lines on `DELETE FROM observer_metrics`, `UPDATE observers`, `DELETE FROM observer_metrics`, `DELETE FROM observations`, `DELETE FROM transmissions`. These are **not** schema migrations — they are the existing runtime prune / retention queries that already ran sync against `s.db.Exec` / `tx.Exec` on every retention cycle on master. This PR only swapped the surface call (sync → sync, via the wrapper) to record wait/hold timing; no new sync schema work was introduced. Behavior on production data is identical to master. Also: red commit's synthetic `UPDATE nodes SET name = name WHERE 0` is a test-only stub designed to acquire the writer without mutating any row (the `WHERE 0` is a no-op predicate). Fixes #1340 --------- Co-authored-by: corescope-bot <bot@corescope.local> |
||
|
|
1b112f0b08 |
feat(memlimit): GOMEMLIMIT via runtime.maxMemoryMB in server + ingestor (#1010) (#1595)
Red commit:
|
||
|
|
18810b5c13 |
fix(ingestor): subscribe to MQTT before startup maintenance, buffer until writer is free (#1608) (#1609)
## Summary Closes #1608. The ingestor's MQTT connect/subscribe loop ran **last** in `main()`, after the synchronous startup-maintenance block. Because all writes share a single SQLite writer (#1283), that maintenance — and the connect loop after it — serialize behind any long-running async migration. The subscription therefore came up minutes late (observed ~4.5 min after the v3.8.3 `obs_observer_ts_idx_v1` index build over ~4.9M rows), and QoS-0 packets published in that window were dropped. This decouples **receipt** from **write**: - New `IngestBuffer` — a bounded FIFO drained by a **single** gated consumer goroutine. - The MQTT subscription is brought up first; its publish handler stamps source liveness at receipt and enqueues a `handleMessage` closure. - Startup maintenance runs, then `WaitForAsyncMigrations()`, then `IngestBuffer.Ready()` opens the gate and the backlog drains. A single consumer preserves the single-writer invariant (#1283); buffering replays the original messages, so it introduces **no duplicates** (unlike a QoS-1 broker queue). Broker-agnostic — helps direct-connect and bridged operators alike. ## Changes - `cmd/ingestor/ingest_buffer.go` — `IngestBuffer` (`Submit`/`Start`/`Ready`/`Dropped`/`Pending`); non-blocking submit with drop-on-full counter; single consumer. - `cmd/ingestor/config.go` — `ingestBufferSize` knob (default 50000). - `cmd/ingestor/main.go` — reorder boot: connect/subscribe **before** startup maintenance; stamp liveness at receipt; `Ready()` after maintenance + `WaitForAsyncMigrations()`; periodic stats log buffer `pending`/`dropped`. ## Test plan - [x] `go test ./...` in `cmd/ingestor` — `IngestBuffer` suite covers gating-until-ready, FIFO order, drop-on-full, serial execution (single-writer), and concurrent-submit. - [ ] `go test -race` in CI (concurrency on `IngestBuffer`). - [ ] Manual: restart with a pending heavy migration → `subscribed to meshcore/#` appears within seconds; `[ingest-buffer] write path ready` after the migration; packets received during the window are written after `Ready()` (0 dropped under normal traffic); stall watchdog stays quiet (liveness stamped at receipt). ## Out of scope A hard crash while messages sit in the in-memory buffer still loses them; crash-durability requires broker-side persistence, which is topology-specific. This PR closes the startup-migration and deploy loss windows. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
9612f08e46 |
fix(#1610): decode firmware 1.16.0 extended ACK (5/6-byte payloads) (#1618)
## Summary Firmware 1.16.0 (`companion-v1.16.0`) ships variable-length `PAYLOAD_TYPE_ACK` payloads: 4 bytes (legacy) → 5 bytes (4-byte CRC + 1-byte attempt, commit `f6e6fdaa`) → 6 bytes (+ 1-byte RNG, commit `a130a95a`). CoreScope's decoder previously truncated past the 4-byte CRC and discarded the attempt + RNG bytes. This PR teaches `cmd/ingestor/decoder.go` to surface the extended bytes on the decoded payload so the DB/UI can distinguish v1.15 vs v1.16 senders, with no schema or wire-compat changes. Partial fix for #1610 — top-level ACK + multipart-inner ACK are covered. PATH-extra ACK parsing (`decodePathPayload`) is deferred to #1612 per triage. ## Changes - `decodeAck` reads 4/5/6-byte payloads. Keeps `extraHash` (4-byte CRC) for compat; adds optional `ackLen`, `ackAttempt`, `ackRand` JSON fields. Legacy 4-byte ACKs leave attempt/rand `nil`. - `decodeMultipart` ACK branch relaxes the `len >= 5` floor so the inner blob can be 4/5/6 bytes (multipart `payload_len` 5/6/7). Adds `innerAckLen`, `innerAckAttempt`, `innerAckRand`. - All additions are `omitempty` — backwards-compatible JSON only. No DB column, no schema migration, no frontend change. ## Out of scope (per issue triage) - `decodePathPayload` PATH-extra parsing — tracked separately in #1612. - Frontend rendering of attempt counter — leave for a follow-up if the DB/UI eventually wants to display it. ## TDD - **Red commit `3fce0465`** adds `cmd/ingestor/issue1610_test.go` with 6 new assertions (legacy 4-byte, extended 5/6-byte, multipart variants of each). New fields are declared on `Payload` so the test compiles, but no decoder populates them yet — tests fail on `ackLen=<nil> want 4` etc. Verified isolation with `git stash` of decoder.go + re-run. - **Green commit `5165c202`** implements the decoder changes. `go test ./...` in `cmd/ingestor` passes. ## Fixtures Synthetic wire vectors built by hand against the firmware spec — the issue did not provide real captures. Each test cites the firmware ref + commit it derives from (`BaseChatMesh.cpp:218-234`, commits `f6e6fdaa` and `a130a95a`). ## References - Issue #1610 - Firmware tag `companion-v1.16.0` @ `07a3ca9e` - Upstream PR meshcore-dev/MeshCore#2594 - Blog: https://blog.meshcore.io/2026/06/06/release-1-16-0 --------- Co-authored-by: corescope-bot <bot@corescope.local> |
||
|
|
df61660a5e |
perf(load): background subpath+pathHop index builds with ready gates (#1008) (#1604)
## Summary
Mirrors the distance-index lazy pattern (#1011): the subpath and
path-hop index builds are no longer part of `Load()`'s synchronous
critical section. They now run in **two parallel background goroutines**
kicked off after `s.loaded = true`, so HTTP comes up immediately even at
Cascadia scale (5M observations, previously ~60s blocked on these two
builds inside `Load()` under `s.mu`).
Fixes #1008.
## Approach
Two new `atomic.Bool` fields on `PacketStore` (`subpathReady`,
`pathHopReady`) plus a one-shot broadcast channel (`indexReadyChan`) for
waiters. `Load()` removes the synchronous `s.buildSubpathIndex()` /
`s.buildPathHopIndex()` calls and instead kicks
`s.startBackgroundIndexBuilds()` right before returning. That function
spawns **two independent goroutines** (review m7), one per index. Each
goroutine:
1. acquires `s.mu.Lock()` (blocks until `Load()`'s deferred Unlock
fires),
2. runs its builder, releases the lock, stores its `ready = true`,
3. closes the broadcast channel if both flags are now true,
4. logs `[startup] index build complete: subpath (Xs)` (or pathHop).
Analytics handlers whose entire response IS the index aggregate —
`/api/analytics/subpaths`, `/api/analytics/subpaths-bulk`,
`/api/analytics/subpath-detail`, `/api/nodes/{pubkey}/paths` — gate
reads behind the corresponding atomic and respond with `503 Service
Unavailable`, `Retry-After: 5`, body `{"error":"index
loading","retryAfter":5}` until the build completes — matching the
triage spec.
### Handler scope (review M2)
A second class of handlers also touches these indexes — `/api/nodes`,
`/api/nodes/{pubkey}`, the `GetRepeaterRelayInfoMap` /
`GetRepeaterUsefulnessScoreMap` / `GetBridgeScore` enrichment helpers,
and `repeater_liveness` / `repeater_usefulness`. These are
**intentionally NOT 503-gated**: they expose the index via optional
enrichment fields that callers already treat as "may be empty", and
503-ing the SPA bootstrap to wait for an index that only affects
relay-activity badges would be a worse UX than a 30–60s window of "—"
values. The rationale is documented in the package doc-comment at the
top of `index_ready_1008.go`.
The recomputer's synchronous prewarm path
(`StartRepeaterEnrichmentRecomputer`) gates on `WaitIndexesReady(60s)`
(review M1) so it never snapshots an empty `byPathHop` into
`s.repeaterRelayCache`; on timeout it skips the prewarm and lets the
5-minute ticker pick up the populated index.
## Concurrency safety
Each build goroutine acquires `s.mu.Lock()` before calling the existing
`buildSubpathIndex()` / `buildPathHopIndex()` helpers, which replace
`s.spIndex` / `s.spTxIndex` / `s.byPathHop` with freshly-allocated maps.
Visibility of the populated maps to handlers that observe
`Ready()==true` is established by Go 1.19+ sync/atomic acquire-release
semantics: the atomic store of `true` happens-after `s.mu.Unlock()`, and
the handler's atomic load synchronizes-with that store. The handler's
subsequent `s.mu.RLock` serializes against concurrent ingest writers,
not against the builder.
The existing `main.go` boot sequence does not start ingest goroutines
until after `store.Load()` returns and graph init completes, so the
brief window between `Load()` returning and the two goroutines acquiring
`s.mu` does not race with concurrent ingest writes.
## TDD: red → green
- **Red** commit `63e79e11`: `cmd/server/index_ready_1008_test.go` adds
four assertions; `cmd/server/index_ready_1008.go` adds compile-only
stubs returning `true` so the tests fail on assertions, not build
errors.
- **Green** commit `fb1d22b0`: implements the real atomic gates, the
background goroutine, and the four handler 503 branches; also updates
four existing tests that read indexes directly post-`Load()` to call
`store.WaitIndexesReady(5s)` first.
- **Race-fix commit `b77d56eb`** (review m8 — test-infra exemption):
adds `WaitIndexesReady` calls in test helpers/setup paths so the race
detector no longer flags the read-after-Load() pattern in existing
tests. Per AGENTS.md, race-detector flakes are observable evidence (test
crashes under `-race`) and qualify for the test-infra exemption from the
TDD red-commit requirement; no behavior change in production code.
- **Polish round 2 — M1 red `408c7462` / green `85e82c8a`**:
`TestIssue1008_M1_PrewarmWaitsForIndexes` asserts the recomputer prewarm
SKIPs when indexes are not ready. Red commit adds the assertion + a stub
`repeaterEnrichmentPrewarmWait` var; green commit wires
`WaitIndexesReady` into the prewarm path and adds the handler-scope docs
for M2.
- **Polish round 2 — minor cleanups `fd089bd0`** (m3..m7): chunk-loader
wires `markIndexesReadySync`, memory-model comment rewritten to cite
acquire-release, sentinel deleted, polling replaced with a broadcast
channel, two parallel goroutines for the builds.
`TestIssue1008_m7_BothFlagsSetAfterParallelStart` covers the parallel
path.
## Reproduction
```
git fetch origin fix/issue-1008
git checkout
|
||
|
|
3898688d6d |
analytics: Relay Airtime Share endpoint + dumbbell chart (#1359) (#1601)
Implements the locked spec from #1359. Red commit: |
||
|
|
d6384c3c59 |
fix(#1217): honor time-window filter on Route Patterns analytics (#1592)
## What The Route Patterns chart on `/#/analytics` ignored the Time window picker — every selection returned identical data. This PR threads `?window=` through to the backing endpoints and the store-level computation. ## Root cause `cmd/server/routes.go:2065` (`handleAnalyticsSubpaths`) and `cmd/server/routes.go:2090` (`handleAnalyticsSubpathsBulk`) never called `ParseTimeWindow(r)`. The store-level entry points (`GetAnalyticsSubpaths`, `GetAnalyticsSubpathsBulk`) had no window-aware variant. The frontend (`public/analytics.js`) didn't append `&window=` to the `/analytics/subpaths-bulk` request. ## Fix ### Backend (`cmd/server/store.go`) Added `GetAnalyticsSubpathsWithWindow` + `GetAnalyticsSubpathsBulkWithWindow`. Zero `TimeWindow` → byte-equivalent to the existing fast path (no perf regression on the default view). Non-zero window → iterate `s.packets`, filter on `tx.FirstSeen` via `TimeWindow.Includes`, reuse `rankSubpaths`. Cached by `(region|area|window)`. ```diff -data := s.store.GetAnalyticsSubpaths(region, minLen, maxLen, limit) +window := ParseTimeWindow(r) +data := s.store.GetAnalyticsSubpathsWithWindow(region, minLen, maxLen, limit, window) ``` ```diff -results := s.store.GetAnalyticsSubpathsBulk(region, groups) +results := s.store.GetAnalyticsSubpathsBulkWithWindow(region, groups, ParseTimeWindow(r)) ``` ### Frontend (`public/analytics.js`) `renderSubpaths` now appends `&window=<value>` to the `/analytics/subpaths-bulk` request, matching how RF / topology / channels tabs already wire the picker. ## Before / after ``` GET /api/analytics/subpaths?window=24h → totalPaths=2 (all data — ignored window) GET /api/analytics/subpaths?window=24h → totalPaths=1 (24h-bounded — honored) ``` ## Tests `cmd/server/subpaths_window_test.go`: - `TestSubpathsHonorsTimeWindow_StoreLevel` — seeds a 1h-old tx with path `[aa,bb]` + a 30d-old tx with path `[cc,dd]`; asserts the unbounded call sees both and the 24h-windowed call sees only the recent one. - `TestSubpathsHandlerHonorsTimeWindow` — same scenario via the HTTP handlers for `/api/analytics/subpaths` and `/api/analytics/subpaths-bulk`. TDD: red commit `eefc27d3` (test fails on assertion with stub that ignores window), green commit `4c4c45d0` (implementation makes it pass). Full `go test ./...` in `cmd/server` green locally (~47s). ## Performance Default view (no window selected) is unchanged — `window.IsZero()` short-circuits to the existing precomputed-index hot path. Windowed view is O(N_tx · path²), same complexity as the existing region-filtered slow path. Results cached per `(region|area|window)`. Closes #1217 --------- Co-authored-by: Kpa-clawbot <bot@corescope> |
||
|
|
5629a489b2 |
perf(distance): lazy build distance index on first request (#1011) (#1597)
## Summary Build the distance analytics index lazily on the first `/api/analytics/distance` request instead of eagerly inside `Load()` (and its background-load chunked merge). Per the triage Fix path on the issue: - Eager startup build removed from `Load()` and from `loadAllPacketsBackground()`'s post-merge pass. - First request returns `202 Accepted` + `Retry-After: 5` and kicks off the build in a background goroutine, gated by `sync.Once` so concurrent first-window requests all observe 202 (single build, not N parallel O(n²) computations). - Once built, subsequent requests fall through to the existing analytics-recomputer / TTL cache and serve 200 as before. - Debounced rebuild policy: refire only when `Δobs > 5%` since last build OR `>5 min` elapsed, whichever is more restrictive. Background loader also resets the gate so the next request rebuilds against the larger dataset. Effect: operators who never visit distance analytics no longer pay the O(n²) construction at startup. Acceptance criteria (a) no startup build, (b) first request triggers build, (c) concurrent in-flight requests get 202 are encoded as failing-first tests. ## Red → green - Red: `bc947ad1` — 3 assertion failures (`expected ... empty, got 3`, `expected 202, got 200`, `expected all 10 ... got 0`). - Green: `5264b68a` — production change makes them pass, no other tests regress. ## Files changed - `cmd/server/store.go` — lazy-build state (`distLazyMu`/`Once`/`Built`/`Building`/`LastBuilt`/`LastObs`), `TriggerDistanceIndexBuild`, `DistanceIndexBuilt`, `DistanceIndexBuilding`; eager `buildDistanceIndex` calls in `Load()` post-pass and chunked-background-load post-pass removed (Once reset instead so the next request rebuilds against the full dataset). - `cmd/server/routes.go` — `/api/analytics/distance` returns 202 + `Retry-After` until built. - `cmd/server/distance_lazy_index_test.go` — new tests (the three triage acceptance criteria). - `cmd/server/coverage_test.go`, `cmd/server/parity_test.go`, `cmd/server/routes_test.go`, `cmd/server/hop_disambig_e2e_test.go` — pre-warm the index via `TriggerDistanceIndexBuild()` + `DistanceIndexBuilt()` poll where the test asserts the 200 JSON shape. ## Perf justification Startup cost on a 500K-obs / 2K-node dataset: previously O(n²) hop scan during `Load()` post-pass and again during the background-load merge — measured at 10–20s in `specs/startup-audit.md`. New code: zero work at startup, the same O(n²) work runs at most once per HTTP request cycle (and only when the index is stale per debounce policy). Cold-path concurrency is bounded by `sync.Once`, so N parallel first-window requests never produce N parallel builds. ## Scope No config field added (debounce thresholds are hardcoded constants per the triage Fix path — `5%` / `5min`). No public API signature changes. No DB-side migration. Tests cover the lazy invariant, the 202+Retry-After contract, and concurrent first-request behavior. Closes #1011 --------- Co-authored-by: Kpa-clawbot <bot@corescope.local> |
||
|
|
3df8924114 |
fix(#1218): include multi-byte prefix repeaters in 1-byte hash usage matrix view (#1591)
## Problem
`/analytics` Hash Usage Matrix 1-byte view excluded repeaters configured
for 2- or 3-byte hash prefixes. In MeshCore, 1-byte path-matching is a
first-byte equality check, so any packet routed by 1-byte hash collides
on that first byte regardless of the downstream repeater's configured
prefix size. Omitting multi-byte prefix repeaters under-reports real
conflicts in the 1-byte hash space.
## Fix
**Data layer — `cmd/server/store.go` (`computeHashCollisions`,
~L7907-L7918 before, L7907-L7941 after):**
Before — `one_byte_cells` was populated only from `prefixMap`, which
only contained repeaters with `hash_size == 1`:
```go
if bytes == 1 {
oneByteCells = make(map[string][]collisionNode)
for i := 0; i < 256; i++ {
hex := strings.ToUpper(fmt.Sprintf("%02x", i))
oneByteCells[hex] = prefixMap[hex]
if oneByteCells[hex] == nil {
oneByteCells[hex] = make([]collisionNode, 0)
}
}
} else if bytes == 2 { ... }
```
After — additionally project all `hash_size in {2,3}` repeaters to their
first byte:
```go
if bytes == 1 {
// ... (same baseline population) ...
for _, cn := range allCNodes {
if cn.Role != "repeater" { continue }
if cn.HashSize != 2 && cn.HashSize != 3 { continue }
if len(cn.PublicKey) < 2 { continue }
hex := strings.ToUpper(cn.PublicKey[:2])
if _, ok := oneByteCells[hex]; !ok { continue }
oneByteCells[hex] = append(oneByteCells[hex], cn)
}
}
```
The 2-byte view's bucketing is unchanged — that view continues to count
only repeaters configured for 2-byte prefixes (those semantics differ).
**UI — `public/analytics.js` L1459:** clarified the 1-byte view
description so the inclusion of multi-byte prefix repeaters is explicit.
## API shape
No response-shape change. `one_byte_cells[HEX]` is still
`[]collisionNode`; only the contents now include 2/3-byte prefix
repeaters in the appropriate first-byte buckets. The existing frontend
decoder is unaffected.
## Tests
-
`cmd/server/routes_test.go::TestHashCollisionsOneByteIncludesMultiBytePrefixRepeaters`
— seeds three repeaters with first byte `CC` configured for 1/2/3-byte
prefixes plus an unrelated `DD` repeater, asserts all three appear in
`one_byte_cells["CC"]`, and that the 2-byte view's `nodes_for_byte` is
unchanged.
Red commit `278bdf8d` (test only) fails on assertion ("got 1, want 3");
green commit `9127ea4e` passes.
## Preflight
`bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master`
→ clean.
Closes #1218
---------
Co-authored-by: clawbot <bot@corescope>
|
||
|
|
1a2b8c48be |
feat(node-detail): link RTC-reset warning to offending packet hashes (#1094) (#1590)
## Problem Node detail's bimodal-clock warning showed only `⚠️ N of last M adverts had nonsense timestamps (likely RTC reset)` — no way to tell which packets, no way to verify the heuristic, no way to drill in. ## Fix Additive, two-sides: **Backend** (`cmd/server/clock_skew.go`) - New type `BadSample { Hash, AdvertTS, SkewSec }`. - New field `NodeClockSkew.RecentBadSamples []BadSample` (`omitempty`). - Populated from the **same** bimodal-bad classification pass that produces `RecentBadSampleCount` — no heuristic change. `tsSkewPair` carries `hash` + `advertTS` so the classifier can record per-sample evidence without a second walk; drift code is unaffected (reads only `ts`/`skew`). **Frontend** (`public/nodes.js`) - `bimodalWarning` preserves the existing count summary line, then renders a `<ul>` of bad samples: each `<li>` is `<a href="#/packets/HASH">hash[:8]</a> → formatTimestamp(advertTS)` with ISO tooltip. Defensive `Array.isArray` so older API responses still render the summary alone. ## TDD - **Red:** `cmd/server/clock_skew_issue1094_test.go::TestIssue1094_RecentBadSamples_ExposesHashAndTimestamp` — seeds 3 healthy + 2 bimodal-bad adverts, asserts `RecentBadSamples` has length 2 with the expected hashes and advert timestamps. Fails on the assertion (`len = 0, want 2`) with the stub-only commit. - **Green:** classifier populates the slice; existing #1285 and bimodal tests stay green. - Red commit: `ed501f4b` - Green commit: `54305b06` ## Cross-stack Backend + frontend ship together (`cross-stack: justified` commit). API stays backward compatible (`omitempty` server, `Array.isArray` client) but the feature only lights up with both halves present. ## Preflight Clean — PII, branch scope, red-commit, CSS vars, XSS sinks, migrations, fixture coverage all pass. ## Acceptance - [x] Warning lists specific packet hashes - [x] Each hash links to `#/packets/<hash>` - [x] Bad advert timestamp shown next to the hash - [x] Pattern is reusable — `BadSample` is a clean shape any future heuristic that flags specific packets can adopt Fixes #1094 --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
af669438ff |
docs+test(ingestor): document writeStatsAtomic symlink-replace semantics + regression test (#1170) (#1588)
Fixes #1170. ## What 1. **Doc comment** on `writeStatsAtomic` (`cmd/ingestor/stats_file.go`) spelling out the two-sided symlink story: - tmp side (`path+".tmp"`): protected by `O_NOFOLLOW` (existing behavior, already noted). - rename side (`path` itself): NOT protected by `O_NOFOLLOW`; instead `os.Rename` semantics are relied upon — rename atomically replaces any existing entry at `path` (including a symlink) with the new regular file. The symlink target is never written through because all writes happened to the unrelated tmp file before rename. 2. **Regression guardrail test** `TestWriteStatsAtomic_SymlinkAtDestIsReplaced` in `cmd/ingestor/stats_file_test.go` that pre-plants a symlink at the destination path pointing to an unrelated target file, calls `writeStatsAtomic`, and asserts: - (a) `os.Lstat(path).Mode()&os.ModeSymlink == 0` (post-write path is a regular file, not a symlink) - (b) the original symlink target's sentinel bytes are unchanged. If a future refactor swaps `os.Rename` for a destination-symlink-following primitive (e.g. `open(path, O_WRONLY)` without `O_NOFOLLOW`, or a copy-then-truncate), the test fails loudly. ## TDD note (red-commit exemption) The current `writeStatsAtomic` ALREADY satisfies the new test's assertions — `os.Rename` does the right thing today. Per the fix-issue skill's exemption for pure-documentation / guardrail tests on already-correct behavior, no fabricated red commit was constructed; the test stands as a pinning regression guard. The two commits are therefore: (1) test addition, (2) doc comment. ## Scope - `cmd/ingestor/stats_file.go` — doc comment only - `cmd/ingestor/stats_file_test.go` — one new test function No production behavior change. No public API change. No new dependencies. No CI workflow changes. `O_NOFOLLOW` and the existing tmp-side behavior are untouched. ## Preflight All hard gates pass (PII, branch scope, red commit, CSS vars, LIKE-on-JSON, sync/async migration, XSS sinks). No warnings. --------- Co-authored-by: meshcore-bot <bot@meshcore.local> |
||
|
|
7533b3b67b |
feat(nodes): sortable First Seen column on Nodes table (#1166) (#1587)
## Summary Adds a sortable **First Seen** column to the Nodes table so users can spot newly observed repeaters in their region (per the reporter's use case). Closes #1166 ## Backend `/api/nodes` already exposes `first_seen` per node via `db.scanNodeRow` (sourced from the existing `nodes.first_seen` column — no schema migration, no recomputation, no extra query cost). The red test pins that contract. ## Frontend (`public/nodes.js`) - New `<th data-sort-key="first_seen" data-sort-default="desc">First Seen</th>` between Last Seen and Adverts. - Cell renders via `renderNodeTimestampHtml(n.first_seen)` — same relative-time + absolute-ISO `title=` tooltip as the Last Seen column. Empty values render as `—`. - `sortNodes` gains a `first_seen` branch with **empty-last** semantics: nodes without a `first_seen` always sort to the bottom regardless of asc/desc direction, so unknowns never clutter the top of the table. - Empty-state `colspan` bumped 7 → 8. ## TDD - **Red commit** `112442f4` — `test-issue-1166-first-seen-column.js` + `cmd/server/first_seen_1166_test.go`. The backend half passes on red (field already returned); 5 frontend assertions fail on assertions (column header missing, sort branch missing, empty-last violated). - **Green commit** `9274b36c` — only `public/nodes.js`. All 6 tests pass. Verified red is real-fail (assertion-shaped) by checking out the red commit's `nodes.js` and re-running the test: 5 failures, all on `assert.strictEqual`, none on parse/import. ## Test results ``` node test-issue-1166-first-seen-column.js → 6 passed, 0 failed node test-frontend-helpers.js → 611 passed, 0 failed go test ./cmd/server/... → ok (45.16s, all pass) ``` ## Files changed - `public/nodes.js` (+14 / −1) - `test-issue-1166-first-seen-column.js` (new) - `cmd/server/first_seen_1166_test.go` (new) ## Scope guardrails - No schema migration. - No new files outside the worktree's three allowed surfaces. - No refactor of other Nodes columns. - Empty cells handled in both render (em-dash) and sort (always last). --------- Co-authored-by: fix-1166-bot <bot@corescope.local> |
||
|
|
f7571a261e |
fix(#1546): remove dead server-side backfill flag (stuck backfilling=true) (#1583)
## Summary Closes #1546. `/api/stats` reported `{"backfilling":true,"backfillProgress":0}` on every fully-converged server, and `X-CoreScope-Status: backfilling` was sent on every request. Root cause: the `Store` had three atomic fields — `backfillComplete` / `backfillTotal` / `backfillProcessed` — read by `handleStats` and `backfillStatusMiddleware`, but **nothing ever wrote to them**. They are leftovers from the server-side async backfill added in #612/#614. That work moved to the **ingestor** in #1289 (server is now read-only) and the writer `backfillResolvedPathsAsync` was deleted, orphaning the readers. `backfillComplete.Load()` therefore always returned `false`, so `backfilling := !false` was permanently `true`. This is the leftover of an intentional architecture change, not an unfinished feature — the server no longer does backfill by design, so the correct fix is to delete the dead flag (per triage recommendation; zero consumers). ## Changes - `store.go` — drop the 3 dead atomic fields. - `routes.go` — drop `backfillStatusMiddleware` (+ its registration) and the backfill-progress computation in `handleStats`. - `types.go` — drop `Backfilling` / `BackfillProgress` from `StatsResponse`. **API change:** `/api/stats` no longer emits `backfilling` / `backfillProgress`; the `X-CoreScope-Status` header is removed. Verified no frontend or other consumer reads them. - `resolved_index.go` — remove stale comment referencing the deleted `backfillResolvedPathsAsync`. ## Test Regression assertion added to `TestStatsEndpoint` (#1546): asserts the response no longer carries `backfilling` / `backfillProgress` and that `X-CoreScope-Status` is unset. Verified red→green — against pre-fix code all three assertions fail; with the fix they pass. Full `cmd/server` suite green locally. ## Out of scope If a real server-side backfill/migration status indicator is wanted, that's a new feature on top of the ingestor stats pipe — tracked separately, not by reviving these dead fields. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
9465949e79 |
fix(#1558): mirror Load's resolved_path indexing into loadChunk (#1582)
## Summary Closes #1558. The background-backfill path (`loadChunk`) silently dropped the resolved-path indexing branch that `Load` performs per observation. Same SQL rows, two different post-conditions — a contract violation between the hot-startup load and the background chunk load. ## Root cause (the differential matters) The reporter's hypothesis — `indexByNode` not invoked on background-loaded transmissions — was 90% right but pointed at the wrong line. - `cmd/server/store.go:1116` already calls `s.indexByNode(tx)` inside the loadChunk per-batch merge lock for every backfilled tx. Decoded `pubKey` / `destPubKey` / `srcPubKey` ARE indexed. - `indexByNode` (store.go:1313 pre-patch) only reads three fields from `decoded_json`. It does NOT and cannot touch `resolved_path`. - `Load` (store.go:783-799) per-observation unmarshals `o.resolved_path`, extracts every relay-hop pubkey, and feeds them through `addToByNode` + `addResolvedPubkeysToPathHopIndex` + `addToResolvedPubkeyIndex`. - `loadChunk` (store.go:937-1023 pre-patch) selects `o.resolved_path` into `resolvedPathStr`… then never touches it. Result: after a container restart, every transmission older than `hotStartupHours` ends up present in `s.packets` / `s.byHash` / `s.byTxID` but missing from `s.byNode[relayPK]` for every relay pubkey. Home-page per-node `packetsToday` / `totalTransmissions` / `observers` / `avgHops` / `avgSnr` collapse for relay-heavy nodes (753 → 8 in the reporter's trace). Stats only self-heal as live ingest re-populates `byNode` through the ingest path (which DID call the full sequence inline). ## Fix shape 1. **Extract a shared `(s *PacketStore) indexResolvedPathHops(tx, pks, hopsSeen)` helper.** Owns the `addToByNode` + `addResolvedPubkeysToPathHopIndex` + `addToResolvedPubkeyIndex` sequence. Single point of truth so the "feed decode-window consumers for resolved-path pubkeys" invariant is structural, not duplicated. 2. **Re-point `Load` and both ingest sites at the helper.** Load's semantic behaviour is byte-identical with the prior inline block. 3. **Add the missing call in `loadChunk`.** Per AGENTS.md performance rule #0 ("no expensive work under locks"), unmarshal `resolved_path` and dedupe relay pubkeys per txID **outside** the merge critical section (`localResolvedPKsByTx`), then feed the pre-built slice through `indexResolvedPathHops` inside the existing per-batch lock alongside `indexByNode`. Mirrors `loadChunk`'s "build local, merge under lock" shape. ## TDD: red → green commits ``` |
||
|
|
7292d60fbe |
feat(#1508): config-driven disabled tabs in customizer modal (#1579)
# feat(#1508): config-driven disabled tabs in customizer modal Fixes #1508. ## Why The customizer modal mixes one-shot operator chrome (`branding`, `home`, `geofilter`, `export`) with daily-use viewer toggles (`theme`, `nodes`, `display`). Non-technical users get confused by the admin tabs and skip past the controls they actually need. There's no current way to hide individual tabs server-side — only via CSS, which doesn't prevent state mutation. ## What Adds a single operator knob: `customizer.disabledTabs` in `config.json`. The named tab ids are filtered out of `_renderTabs()` in `public/customize-v2.js` before render. - `config.example.json` — new `customizer` block, default `disabledTabs: []` (zero behavior change for existing operators). - `cmd/server/config.go` — new `CustomizerConfig` type, optional pointer on `Config`. - `cmd/server/routes.go` + `cmd/server/types.go` — `/api/config/client` now surfaces `customizer.disabledTabs` (always an array, empty when unset). - `public/customize-v2.js` — `_renderTabs()` filters by id. - `cmd/server/customizer_disabled_tabs_test.go` — RED-then-green tests covering both the configured-and-defaulted shapes. ## TDD trail 1. RED commit adds the failing tests + minimal `CustomizerConfig` stub so the package still compiles; both tests fail on the assertion (`body.customizer` is `<nil>`) — not on import. 2. GREEN commit wires the field through `/api/config/client` and the frontend tab filter; both tests pass. ## Scope 5 files. No new API surface, no UI for editing the list (operator edits `config.json` directly per the issue body). Backward-compatible: missing `customizer` block defaults the list to empty. --------- Co-authored-by: bot <bot@local> |
||
|
|
cd19285f7f |
fix(ingestor): defense-in-depth empty-scope guard in UpdateNodeDefaultScope (#1534) (#1575)
## Summary Follow-up to PR #1569 (merged). Adds defense-in-depth at the DB layer for the #1534 default_scope-overwrite class of bug. PR #1569 fixed #1534 by guarding the call site in `handleMessage` with `if shouldUpdateDefaultScope(pktData)`. Adversarial review of #1569 flagged this as one-layer defense: a future refactor that drops the call-site `if` and calls `store.UpdateNodeDefaultScope(pubkey, pktData.ScopeName)` unconditionally would silently re-introduce the bug — overwriting a previously-correct `default_scope` (e.g. `#belgium`) with the empty string. This PR adds the belt-and-braces guard recommended by that review: - `Store.UpdateNodeDefaultScope(pk, "")` is now a silent no-op (early `return nil`) - New DB-layer regression test that fails on `master` and proves the DB function used to write `""` straight through - Two new call-site anchor tests that drive a transport-scoped ADVERT end-to-end through `handleMessage` (matched + unmatched region key) so the existing call-site guard from #1569 can't be deleted without a test going red Net production change: 8 lines in `cmd/ingestor/db.go`. No behavior change for any non-empty scope. ## Why this is a follow-up, not a re-fix Issue #1534 is already closed by #1569 and `master` no longer regresses for users (the call-site guard is in place). This PR is purely belt-and-braces — it adds the second layer of defense the adversarial reviewer asked for and the test coverage that anchors both layers. ## Files changed | File | Change | |------|--------| | `cmd/ingestor/db.go` | +8 — empty-scope early return in `UpdateNodeDefaultScope` | | `cmd/ingestor/db_test.go` | +43 — `TestUpdateNodeDefaultScope_EmptyScopeIsNoop` | | `cmd/ingestor/main_test.go` | +97 — `TestHandleMessageAdvert_EmptyScopeSkipsDefaultScopeUpdate` + `TestHandleMessageAdvert_MatchedScopeUpdatesDefaultScope` | ## Red → green commits - **red** `c062af59` — `test(ingestor): red — DB-layer empty-scope guard regression test for #1534` - Adds three tests; `TestUpdateNodeDefaultScope_EmptyScopeIsNoop` fails on assertion (`default_scope` overwritten with `""`) - Two call-site tests pass already (call-site guard merged in #1569) — they anchor that behavior against future refactors - **green** `7ab12d53` — `fix(ingestor): defense-in-depth empty-scope guard in UpdateNodeDefaultScope (#1534)` - Adds the early-return; all three tests green ## Operator remediation (from issue #1534) Operators whose production DB still has rows where `default_scope` was overwritten with the empty string before #1569 deployed can clean up with: ```sql -- Inspect affected rows first SELECT public_key, name, default_scope FROM nodes WHERE default_scope = ''; SELECT public_key, name, default_scope FROM inactive_nodes WHERE default_scope = ''; -- Convert empty-string default_scope back to NULL so the next valid -- matched-scope advert can re-populate it cleanly. UPDATE nodes SET default_scope = NULL WHERE default_scope = ''; UPDATE inactive_nodes SET default_scope = NULL WHERE default_scope = ''; ``` After #1569 + this PR are deployed, no new rows can be created with `default_scope = ''` from this code path. ## Test plan ```bash cd cmd/ingestor && go test ./... -count=1 # ok github.com/corescope/ingestor ~98s ``` ## Preflight Clean — PII, branch scope, red commit, CSS-var defined, CSS self-fallback, LIKE-on-JSON, sync migration, async-migration gate, XSS sinks all pass. No warnings. --------- Co-authored-by: Kpa-clawbot <bot@meshcore-analyzer> |
||
|
|
05af6c6ee5 |
fix(ingestor): skip default_scope update when ScopeName is empty (#1534) (#1569)
Red commit:
|
||
|
|
3feb97f16f |
fix(ingestor): write resolved_path on new observations (regression from #1289) (#1548)
# fix(ingestor): write resolved_path on new observations (full restore — closes #1547 + #1560) Fixes #1547. Closes #1560. ## Root cause PR #1289 (the "ingestor owns the neighbor graph; server is read-only" refactor, ~2026-05-21) moved the neighbor graph + schema writes to the ingestor, and as a side-effect removed the server-side writer that populated `observations.resolved_path` AND the context-aware `pm.resolveWithContext` that disambiguated 1-byte prefix collisions. Result: every observation inserted after the deploy has `resolved_path = NULL` (3.1M/6.3M NULL on staging; 100% NULL on fresh deploys; symptom on Cascadia: hops fail to resolve because the small-mesh client-side fallback breaks on prefix collisions). ## Full restore This PR resolves both single-byte and multi-byte prefix paths. Single-byte disambiguation uses NeighborGraph adjacency and ADVERT `from_pubkey` anchoring, ported from pre-#1289 `pm.resolveWithContext` logic (last good at cmd/server/store.go @ commit 450236d5) and the #1144 / #1352 fixes. New file `cmd/ingestor/path_resolver.go`: - `NeighborGraph` + `neighborGraphHolder` — in-memory adjacency snapshot, atomic-published. - `loadNeighborGraph(db)` — one-shot SELECT from `neighbor_edges`. - `resolveHopWithContext(hop, anchor, graph, idx, exclude) *string` — single-hop, tier-1 disambiguator. - `resolvePathWithContext(hops, fromPubkey, graph, idx) []*string` — walks the path, anchoring hop 0 on `from_pubkey` (ADVERTs) and each subsequent hop on the previous resolved hop, excluding already-resolved pubkeys. - `Store.RefreshNeighborGraph()` — called on warm-up and every 60s tick in the neighbor-edges builder alongside `RefreshPrefixIndex`. Existing file `cmd/ingestor/resolved_path.go` (PR #1547 base) is untouched: `resolvePath` + `marshalResolvedPath` + the all-nil → empty-string clobber-guard contract are preserved verbatim. `cmd/ingestor/db.go` — `InsertTransmission` now calls `resolvePathWithContext` instead of the naive `resolvePath`. ## Algorithm (per hop) 1. Look up candidate pubkeys by prefix-match (existing `prefixIndex`). 2. `len==0 → nil`; `len==1 → that pubkey`. 3. `len>1` → filter by `NeighborGraph` adjacency to the anchor. Anchor is `from_pubkey` for hop 0 on ADVERTs, the previous resolved hop otherwise. Exactly 1 surviving candidate → use it; else nil. 4. Previously resolved hops (and the originator) are excluded from downstream candidate pools — a packet does not revisit a node. Tier-2/3/4 from pre-#1289 (geo proximity, GPS preference, observation-count fallback) are intentionally NOT ported — those were noisy in practice and belong in a separate enhancement, not in this regression restore. ## Out of scope - The ~3.1M existing NULL rows from the regression window. Filed as a follow-up backfill task — too risky to bundle here (touches a 6M-row table). - The dead-flag bug #1546 — separate concern. ## TDD red → green - Red commit `80b0f476` — adds five new context-resolver tests; stub `resolvePathWithContext` falls back to naive `resolvePath`. CI run 26946935615 → **failure** with assertion errors on the three collision tests (`TestResolveHopWithContext_OneByteCollision_AdjacencyResolves`, `TestResolvePathWithContext_TwoHopChainAnchoredOnFromNode`, `TestResolvePathWithContext_AdvertAnchoring`); the two regression tests (multi-byte still works + all-nil contract) stayed green. - Green commit `7b4950ce` — real algorithm + InsertTransmission wiring + RefreshNeighborGraph in the builder tick. All five new tests pass; original four `resolved_path` tests stay green. ## Verification - `go test -race ./cmd/ingestor/...` for the 11 affected tests — pass. - `bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master` — exit 0 (all gates clean). - PII grep on body + diff: clean. Tested with: existing `TestInsertTransmissionWritesResolvedPath` + `TestInsertTransmissionDoesNotClobberResolvedPathOnAllNil` (PR #1547 base) plus the new collision-resolution suite: - `TestResolveHopWithContext_OneByteCollision_AdjacencyResolves` — 3-of-5 nodes share `0x5c`, chain A↔B↔C↔D↔E; anchored on A, hop `5c` → B. - `TestResolvePathWithContext_TwoHopChainAnchoredOnFromNode` — path `[5c, 5c]` from_node A → `[B, C]`. - `TestResolveHopWithContext_NoAdjacencyContext_ReturnsNil` — 3 ambiguous candidates, no anchor / non-adjacent anchor → nil. - `TestResolvePathWithContext_AdvertAnchoring` — ADVERT, `from_pubkey=A`, path `[5c]` → only-adjacent neighbor B. - `TestResolvePathWithContext_RegressionMultiByteStillWorks` — unique-prefix path with no graph context still resolves. - `TestResolvePathWithContext_AllNilContractPreserved` — unresolvable path → `marshalResolvedPath==""` (clobber-guard from PR #1548 untouched). ## Browser-validated N/A — backend-only change. Frontend already handles populated `resolved_path` via `getResolvedPath` in `cmd/server/db.go` and `public/packets.js`. ## Round-1 fixes addressed - **MUST-FIX #1 (data-loss clobber on all-nil resolution):** when every hop fails to resolve, `marshalResolvedPath` returns `""` instead of `"[null,null,...]"`, so `nilIfEmpty` → SQL NULL and the `COALESCE(excluded.resolved_path, resolved_path)` UPSERT preserves any previously stored good value on re-ingest. Regression test asserts: insert a transmission, observe `resolved_path` populated, wipe the prefix index, re-ingest the same packet, assert the existing `resolved_path` is unchanged. --------- Co-authored-by: corescope-bot <bot@corescope> Co-authored-by: openclaw-bot <bot@openclaw> Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
d7cd9203ca |
Fixes #1165: add OSM/Stamen tile providers with per-provider Leaflet layer control. (#1533)
List of changes too long to describe, so I'll hit high level. - Config now supports the json map tiles that were suggested by @Kpa-clawbot. - Leaflet map layer button appears in the top right of live.js and map.js (because all the work was already done on live.js... Added bonus) - Allows users to enter creds for OSM and Stamen to get enterprise related perks, in the config file - Added a default light map under customizer. Still suggest removing them all together and relying on the config - You can enable OSM and Stamen in the config without a license, but at your own risk!!! - Config comment explains where to register and the providers for osm, as well as the general limits per X interval - Updated tests (28) to address the changes made to the maps ### TDD Exemption **Reason**: Net-new UI surfaces (per `AGENTS.md`) This PR introduces a net-new UI surface (the multi-provider map tile selector). Under the `AGENTS.md` exemption for net-new UI surfaces, the absence of an initial failing (red) commit is permitted, as the UI was built first. However, the underlying public APIs are fully covered. The following tests serve as the first assertions for these new APIs: - `window.MC_createLayerControl`: Asserted in `MC_createLayerControl handles Auto mode and explicit layers correctly` - `window.MC_setDarkTileProvider` & `window.MC_getDarkTileProvider`: Asserted in `MC_setDarkTileProvider persists to localStorage...` - `window.MC_setLightTileProvider` & `window.MC_getLightTileProvider`: Asserted in `MC_setLightTileProvider persists to localStorage...` - `window.MC_initTileRegistry`: Asserted in `MC_initTileRegistry(true) dispatches mc-tile-provider-changed` - `applyTileFilter`: Asserted in `applyTileFilter sets invert CSS for inverted dark provider...` - Cross-tab synchronization: Asserted in `Cross-tab storage event re-dispatches mc-tile-provider-changed` |
||
|
|
63bfa3d910 |
feat(security): detect CDN-fronted deployment + document bypass requirement (closes #1561) (#1564)
Closes #1561. Follow-up to #1551. ## Why #1551 added `Cache-Control: no-store` to all `/api/*` responses. That's sufficient for CDNs that honour origin headers (Varnish, nginx). It is **not** sufficient for Cloudflare zones where Cache Rules / Page Rules override origin Cache-Control. Field evidence from the meshat.se diagnosis (2026-06-04): observers behind Cloudflare were returning `cf-cache-status: HIT` with `age` up to ~6 hours despite the origin emitting `no-store`. The CDN was caching per zone policy and ignoring the upstream directive — exactly the failure mode #1551 cannot reach. The application has no way to inject CDN rules; the only durable fix is operator-side. This PR makes that operator step discoverable and verifiable. ## What ### Server-side detection (log-only) `cmd/server/cdn_detection.go` adds a middleware wired into the `/api/*` chain after `noStoreAPIMiddleware`. On the **first** request bearing any CDN-typical header (`CF-Connecting-IP`, `CF-Ray`, `X-Forwarded-For`, `X-Real-IP`, `Fastly-Client-IP`, `True-Client-IP`) it logs: ``` [security] WARNING: detected request via CDN (CF-Ray header present). Ensure /api/* is bypassed in your CDN config — see docs/deployment-behind-cdn.md. Cached API responses cause observer-flap and incorrect dashboards. ``` `sync.Once` guarantees the warning fires at most once per process boot. The middleware never blocks, never modifies the response, never adds headers. Detection is observational only — operators who run behind a CDN without bypass have a real bug; the warning is appropriate. ### Operator documentation `docs/deployment.md` gains a new **"Behind a CDN (Cloudflare, Fastly)"** section covering: 1. Curl verification command + healthy vs unhealthy output examples 2. Cloudflare Cache Rule creation (URI Path starts-with `/api/` → Bypass cache) 3. Legacy Page Rules equivalent 4. Fastly note 5. Re-verification 6. Meaning of the startup log warning 7. Why we can't fix this server-side `docs/deployment-behind-cdn.md` is the canonical path the log message references — it's a short TL;DR that links back to the full section. ### Healthcheck script `scripts/check-cdn-bypass.sh` — POSIX sh, no dependencies beyond curl + grep + awk. Operators run: ```sh scripts/check-cdn-bypass.sh https://your-domain.example.com ``` Exits `0` with `OK: no CDN caching detected ...` or `1` with a precise diagnostic naming the offending header (`cf-cache-status: HIT` or stale `age`). ## TDD - **Red commit `e90ccaba`** (`test(security): RED ...`) — `cmd/server/cdn_detection_test.go` (4 Go tests + 6 subtests for each header) and `scripts/test-check-cdn-bypass.sh` (3 shell harness cases). Middleware stub returns `next` unchanged so tests compile and fail on assertions, not build errors. - **Green commit `5e6a60b5`** (`feat(security): GREEN ...`) — real middleware, wiring in `routes.go`, healthcheck script, doc. ## Deliverables | File | Status | Purpose | |------|--------|---------| | `cmd/server/cdn_detection.go` | new | middleware + sync.Once warning | | `cmd/server/cdn_detection_test.go` | new | 4 Go tests (1 stand-alone + 1 silence + 1 once + 1 table-driven over 6 headers) | | `cmd/server/routes.go` | modified | `r.Use(cdnDetectionMiddleware)` after no-store | | `docs/deployment.md` | modified | TOC entry + "Behind a CDN" section | | `docs/deployment-behind-cdn.md` | new | canonical path referenced by log message + script output | | `scripts/check-cdn-bypass.sh` | new | operator-runnable healthcheck | | `scripts/test-check-cdn-bypass.sh` | new | shell harness with fake curl | ## What this PR explicitly does NOT do - Does not block requests based on CDN detection (log-only). - Does not enforce CDN bypass (impossible — operator-controlled). - Does not spoof, strip or modify CDN headers. - Does not add CSP / HSTS / other security headers (out of scope). - Warning is not configurable — operators behind a CDN without bypass have a real bug, surfacing it is correct. ## Verification - `go test ./...` in `cmd/server/` — full suite green. - `sh scripts/test-check-cdn-bypass.sh` — 3/3 pass. - Preflight checklist — all 11 gates clean (PII, branch scope, red commit, CSS vars, CSS self-fallback, LIKE-on-JSON, sync migration, async-migration annotation, XSS sinks, img/SVG ratio, themed-img/SVG, fixture coverage). --------- Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: clawbot <bot@clawbot.invalid> |
||
|
|
65bd954b17 |
feat(config): make observer health thresholds configurable (closes #1552) (#1556)
Closes #1552. ## What Make observer `Online` / `Stale` / `Offline` thresholds operator-configurable via `config.json`'s existing `healthThresholds` block — and **raise the defaults** from 10 min / 60 min to **60 min / 1440 min (1 h / 24 h)** so they match the node thresholds and stop producing flap out of the box. ⚠️ **This is a default behavior change.** Operators who want the old aggressive 10-min Online threshold must opt in via: ```json "healthThresholds": { "observerOnlineMinutes": 10 } ``` ## Why Per #1552: the `600000` / `3600000` constants in `public/observers.js` were not tunable, *and* 10 min is wrong as a default. Wide-geo, low-traffic meshes legitimately see observers go quiet for >10 min between reports, and operators behind a CDN (#1551) get cached `last_seen` values that can push the observer 15+ min behind reality — guaranteeing flap at the 10-min threshold. The meshat.se operator (43 observers, v3.8.3) reports exactly this pattern. Defaults raised from 10 / 60 minutes to 60 / 1440 minutes (1 h / 24 h) to match the node thresholds for consistency and eliminate flap on low-traffic / CDN-fronted instances. Operators wanting the old 10-min Online behavior can set `observerOnlineMinutes: 10` in config. ## Changes Backend (`cmd/server/config.go`): - `HealthThresholds` gains `ObserverOnlineMinutes` / `ObserverStaleMinutes` (int). - `GetHealthThresholds()` defaults to **60 / 1440** when zero/absent. - `ToClientMs()` emits `observerOnlineMs` / `observerStaleMs`, picked up by the existing `/api/config-public` → `roles.js` `Object.assign(HEALTH_THRESHOLDS, …)` pipeline. `config.example.json`: new `observerOnlineMinutes` / `observerStaleMinutes` keys (60 / 1440) + `_comment_observerThresholds` explaining the rationale and opt-out. Frontend: - `public/observers.js` `healthStatus()` — reads from `window.HEALTH_THRESHOLDS.observerOnlineMs / observerStaleMs`, falls back to **3600000 / 86400000** (matching the new Go defaults for the pre-`/api/config-public` window). - `public/observer-detail.js` — same refactor (was previously hardcoded `600000` + misusing `nodeDegradedMs` for the Stale boundary). ## Backward compat - API shape: unchanged — only adds two optional keys. - Config: unchanged keys / no renames. - Default behavior: **changed** — operators relying on the implicit 10/60 must opt in (one config line). ## TDD - RED 1 (`ee19058f`): assertions on the new fields + `ToClientMs` keys + `healthStatus` reading from `window.HEALTH_THRESHOLDS`. CI: [failure](https://github.com/Kpa-clawbot/CoreScope/actions/runs/26945264822). - GREEN 1 (`30cfbf7a`): configurability landed (defaults still old 10/60). CI: [success](https://github.com/Kpa-clawbot/CoreScope/actions/runs/26945220598). - RED 2 (`2649cf35`): pin new 60/1440 defaults — empty-config Go path + JS `healthStatus` with no `HEALTH_THRESHOLDS`. CI must fail. - GREEN 2 (`5ef85bca`): bump Go defaults to 60/1440, JS fallbacks to 3600000/86400000, `config.example.json` updated. CI must pass. ## Preflight Clean (exit 0). `cross-stack` ack in commit messages — single feature spans Go + JSON + JS readers. ## Not in scope - Customizer UI for editing the thresholds (config-only per issue). - Node/infra thresholds (unchanged). - The deeper observer-flap root cause (#1551 cache-control is a separate PR in flight). --------- Co-authored-by: corescope-bot <bot@corescope> Co-authored-by: mc-bot <bot@meshcore.local> |
||
|
|
0c908d2bca |
fix(api): emit Cache-Control: no-store on /api/* responses (#1551) (#1553)
Closes #1551. ## Problem `/api/*` Go responses emit no `Cache-Control` header. CDNs (Cloudflare, nginx, Varnish) default to caching `application/json` for **15 min – 4 h** when no directive is set. Observed against a public Cloudflare-fronted CoreScope instance (`meshcore.meshat.se`): - 17 consecutive polls of `/api/observers` over ~10 min returned byte-identical responses - Response headers showed `cf-cache-status: HIT`, `age: 878` (~15 min) - Cache-busting query param → `cf-cache-status: MISS` with fresh `last_seen` values This causes WebSocket pushes to diverge from REST GETs (WS fresh, REST stale) and produces false-positive stale/online flips for observers near the 10-min threshold. ## Fix New `noStoreAPIMiddleware` in `cmd/server/routes.go` wired into the gorilla/mux chain alongside the existing `backfillStatusMiddleware`. Sets `Cache-Control: no-store` on every response whose request path starts with `/api/`. ## Design choice: `no-store` vs `private, max-age=0` Chose `no-store`. CoreScope's REST endpoints are fresh-on-every-request by contract (WS pushes diff against REST GETs), so any intermediary cache is wrong. `no-store` forbids **any** cache (CDN, browser, intermediary). `private, max-age=0` still permits short browser caches and some intermediaries — no benefit here. ## Scope discipline - `/api/` prefix only. - Static assets (`/`, `/app.js`, `/style.css`, …) keep their existing `no-cache, no-store, must-revalidate` headers from `spaHandler` in `main.go`. Hashed assets stay CDN-cacheable by design. - The middleware runs for **all** registered routes including the websocket upgrade HTTP request, since `/ws` is served through the same mux. ## TDD - **Red** `1beb5432`: `cmd/server/cache_control_api_test.go` asserts `Cache-Control: no-store` on `/api/stats`, `/api/observers`, `/api/packets`, `/api/nodes`, and asserts the middleware does NOT leak onto `/` or `/app.js`. Fails on assertion (no Cache-Control header emitted) — not a compile error. - **Green** `13be675f`: middleware + wiring. All assertions pass; full `cmd/server` suite stays green. ## Files - `cmd/server/routes.go` — middleware definition + `r.Use(noStoreAPIMiddleware)` - `cmd/server/cache_control_api_test.go` — 6 sub-tests across 2 top-level tests ## Preflight `bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master` → clean (exit 0). --------- Co-authored-by: corescope-bot <bot@corescope> |
||
|
|
7b43045043 |
fix(security): sanitize 3 more log-injection sites missed by #1540 (#1544)
Follow-up to merged #1540. Self-review of #1540 found 3 additional `log.Printf` sites interpolating MQTT-controlled strings without `sanitizeLogString` — fixing here for completeness. ## Sites fixed | File:line | Format | MQTT-controlled fields | Attacker scenario | |---|---|---|---| | `cmd/ingestor/main.go:531` | `status: %s (%s)` | `name`, `iata` | Hostile node sends status with `name="evil\r\n[security] forged-line"` — appears as a fake log line in operator dashboards / journalctl. | | `cmd/ingestor/main.go:854` | `channel message: ch%s from %s` | `channelIdx`, `sender` | Attacker spoofs `sender="evil\r\n[security] backdoor-installed"` on any channel message — same forged-line outcome. | | `cmd/ingestor/main.go:940` | `direct message from %s` | `sender` | DM injection via crafted sender field, same outcome. | All three now route through `sanitizeLogString` from `cmd/ingestor/sanitize_log.go` (added by #1540) which replaces CR/LF/control bytes with `?`. ## TDD Red commit (`8b3ad398`) adds 3 testable format helpers (`formatStatusLog`, `formatChannelMessageLog`, `formatDirectMessageLog`) plus tests pinning CR/LF stripping. Helpers return raw `fmt.Sprintf` output, so tests fail on assertion (not build). Green commit applies `sanitizeLogString` inside the helpers and swaps the 3 call sites in `main.go` to use them. Tests red-on-revert (verified locally). ## Scope Strictly the 3 sites above. No other refactors. No changes to `sanitizeLogString` itself. --------- Co-authored-by: clawbot <clawbot@users.noreply.github.com> |
||
|
|
e438451dc9 |
feat(preflight): hard-fail gate on sync schema migrations + async runner (#1541)
Closes the recurring "sync migration on large table" regression class
(#791-style, #1483-style).
## Problem
Pattern that keeps repeating:
1. A perf/feature PR adds `CREATE INDEX` / `ALTER TABLE` / `UPDATE ...
WHERE` in a migration file (typically `cmd/ingestor/db.go`).
2. Local dev DB has ~100 rows. Migration returns in milliseconds. CI is
green.
3. Reviewers approve on plan correctness; nobody knows what the prod
table size is.
4. First prod boot at scale (Cascadia: ~2600 nodes, 80K+ obs; previous
prod: 1.9M+ obs) pins the ingestor at `[migration] Adding index...` for
minutes.
5. Healthcheck times out → container restart → loop. Operator pages.
Hotfix.
Most recent case: `obs_observer_ts_idx_v1` in v3.8.3 — release notes
already document an "expect a longer first boot" warning because we knew
it would hit prod hard.
## What this PR adds
**Async helper (`cmd/ingestor/async_migration.go`):**
- `Store.RunAsyncMigration(ctx, name, fn)` — registers the migration as
`pending_async` in a new `_async_migrations` bookkeeping table, returns
to caller immediately, schedules `fn` in a goroutine on the shared
backfill `WaitGroup`, transitions to `done` (or `failed` with error
captured) on completion.
- `Store.AsyncMigrationStatus(name)` and
`Store.WaitForAsyncMigrations()` for tests/shutdown.
- Idempotent: `done` rows short-circuit; `pending_async`/`failed` rows
are retried on next boot.
**Retroactive #1483 conversion (`cmd/ingestor/db.go`):**
- `obs_observer_ts_idx_v1` (the composite `(observer_idx, timestamp)`
index build on `observations`) is now scheduled via `RunAsyncMigration`
from `OpenStore()` so the ingestor accepts packets immediately while the
index builds in the background.
- Legacy `_migrations` gate is preserved by the async fn → DBs that
already completed the sync build stay no-op.
**Annotation convention (`MIGRATIONS.md`):**
Every new `CREATE INDEX` / `ALTER TABLE` / data-rewrite in a migration
file must do ONE of:
1. Run via `Store.RunAsyncMigration(...)` (preferred for backfills).
2. Carry a `// PREFLIGHT: async=true reason="..."` comment directly
above the migration block.
3. Include a `PREFLIGHT-MIGRATION-SCALE: <30s N=<scale>` line in the PR
body.
**TDD pair:**
- Red commit `2c6744cc` — `TestRunAsyncMigration_PendingThenDone`
against a stub helper. Build passes, assertion fails (`async migration
fn did not start within 2s`).
- Green commit `38354f32` — real helper + retroactive fix + docs. Test
green.
**Fixtures (`cmd/ingestor/testdata/preflight-migrations/`):**
- `bad_sync_migration.go` — known-bad sample with no annotation.
- `good_annotated_migration.go` — known-good sample with annotation.
The preflight gate script can be unit-tested against these.
## Gate location (NOT in this PR)
The actual `check-async-migrations.sh` lives in the OpenClaw skills
directory at `~/.openclaw/skills/pr-preflight/scripts/` (separate from
the repo) and is wired into `run-all.sh`. It greps the diff for
new/modified migration blocks and hard-fails (exit 1) on any sync schema
mutation lacking one of the three opt-outs above. The fixtures in this
PR give maintainers a reproducible target.
## Why annotation-discipline, not size detection
You cannot determine table size from a diff. The gate enforces that
every author who adds a schema migration must consciously decide which
bucket it falls into and write that down. That is the cheapest possible
intervention that breaks the cycle.
## Testing
- `go test ./...` in `cmd/ingestor` — all tests pass including the new
`TestRunAsyncMigration_PendingThenDone`.
- Manual: red commit fails on assertion (not build), green commit passes
— verifiable by `git checkout
|
||
|
|
800d61c382 |
fix(security): uniform limit-clamp, log-injection sanitization, SPA path validation (#1540)
Follow-up to v3.8.3 security train. Found by non-XSS input-validation audit. Three findings closed in one PR — all defense-in-depth: medium is genuinely DoS-only (no data exposure), lows tighten log hygiene and SPA path handling so future router changes can't silently expose the filesystem. ## Findings addressed ### MEDIUM — unbounded `limit` on list endpoints - **What:** four list endpoints accepted `limit=999999999` and passed the value straight to SQL `LIMIT ?` and Go `make(..., 0, limit)`. - **Where:** `cmd/server/routes.go` — handlePackets (incl. multi-node branch), handleNodes, handleChannelMessages, handleAnalyticsSubpaths, handleAnalyticsSubpathsBulk per-group lim, handleDroppedPackets. - **Fix:** new `clampLimit(raw, def, max)` helper in `cmd/server/clamp_limit.go` plus `queryLimit(r, def, max)` HTTP wrapper. Caps: packets/nodes/channels/dropped = 500, analytics buckets / bulk-health = 200. Already-clamped endpoints (handleBulkHealth) migrated to the helper for uniformity. Silent clamp — no response-shape change. Negative / zero / non-numeric → default. ### LOW — log injection via newline in advert name - **What:** advert `name` field allows `\n` / `\t` (sanitizeName intentionally preserves them for display). Logged at two MQTT-ingest sites, an attacker with publish ACL could forge log lines. - **Where:** `cmd/ingestor/main.go:659,690`. - **Fix:** new `sanitizeLogString` in `cmd/ingestor/sanitize_log.go` strips control bytes < 0x20 and DEL with `?`. Wrapped at the two log call sites that interpolate `name=` and `observer=`. Stored display values untouched. ### LOW — SPA static handler depends on default mux path-cleaning - **What:** `cmd/server/main.go:469` joins `r.URL.Path` to root; safe today only because gorilla/mux runs `path.Clean` and `http.FileServer` rejects `..`. A future `SkipClean(true)` or router swap would silently expose the filesystem. - **Where:** `cmd/server/main.go` (spaHandler). - **Fix:** new `isSafeStaticPath` rejects requests whose decoded or raw path contains `..`, `%2e%2e`, `\\`, or `%5c` with a 400. Legit asset names with dots (`/app.js`, `/customize-v2.js`, `/themes/dark.css`) are unaffected. ## TDD - Commit 1 (red): adds `TestClampLimit`, `TestSpaHandlerPathTraversal`, `TestSanitizeLogString` with stub helpers — tests fail on assertions (not build errors), proving they gate the change. - Commit 2 (green): production fix. Revert the green commit and the red commit's assertions fail. ## Audit reference Source: non-XSS input-validation audit dated 2026-06-03 (workspace). Sibling PR `fix/xss-r2-trace-obs-anl` owns the XSS findings — not included here. --------- Co-authored-by: clawbot <clawbot@users.noreply.github.com> |
||
|
|
3850600130 |
perf(server): TTL-cache /api/stats observations aggregate — eliminate per-request full-table scan (#1460) (#1516)
## Problem `GetStoreStats` ran a `SUM(CASE WHEN timestamp > ?)` over the full `observations` table on **every** `/api/stats` call. The staging pprof analysis (#1460) identified this as rank #9 CPU consumer: `GetStoreStats.func2` at 920ms cumulative = ~10% of all server CPU. The query: ```sql SELECT COALESCE(SUM(CASE WHEN timestamp > ? THEN 1 ELSE 0 END), 0), COALESCE(SUM(CASE WHEN timestamp > ? THEN 1 ELSE 0 END), 0) FROM observations WHERE timestamp > ? ``` scans ~1.9M rows each time `/api/stats` is polled (every 15s from the dashboard). ## Fix Add a **30-second TTL cache** on `PacketStore` for `PacketsLastHour` and `PacketsLast24h`: - Cache hit → skip the observations goroutine entirely, use stored values - Cache miss → run the query, update cache with result - The node/observer `COUNT(*)` query is unchanged and always runs fresh The hour/24h counts are display-only values; 30s accuracy is sufficient. ## Changes `cmd/server/store.go`: - 4 new fields on `PacketStore`: `statsCacheMu sync.Mutex`, `statsCacheTime time.Time`, `statsLastHour int`, `statsLast24h int` - `GetStoreStats`: check cache before launching goroutines; conditional `wg.Add`; update cache after successful query Builds clean. No tests changed. Closes #1460 (P1#1 from staging CPU profile). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
367265eb59 |
feat(#1369): cross-domain embed support (CORS env override + ?embed=1 chrome suppression) (#1500)
Closes #1369. ## What Cross-domain embed support, shipped as two halves: ### Part A — CORS env override + read-only contract * `applyCORSEnv()` reads `CORS_ALLOWED_ORIGINS` (comma-separated, trimmed, empties dropped). Set in env → overrides `cfg.CORSAllowedOrigins`. Unset/empty → config.json value wins. * `Access-Control-Allow-Methods` tightened from `GET, POST, OPTIONS` → `GET, HEAD, OPTIONS`. The cross-domain surface is read-only by contract; same-origin admin writes don't go through preflight and are unaffected. * `config.example.json` adds `corsAllowedOrigins: []` + a comment explaining the env override and the embed URL pattern. * No wildcards introduced (still supported as `["*"]` for ops that opt in). No credentialed CORS. ### Part B — `?embed=1` chrome suppression * `shouldEmbedRoute(basePage, hashSearch)` — pure helper, allowlisted to `map` and `channels`, requires `embed=1` in the hash querystring. * `navigate()` toggles `body.embed` based on the helper. * CSS hides `.top-nav`, `[data-bottom-nav]`, `.nav-drawer`, `.nav-drawer-backdrop`, zeroes body padding/margin, reclaims `100dvh` for `#app.app-fixed`. Use: `<iframe src="https://analyzer.example/#/map?embed=1">`. For iframe-only display, no CORS entry is needed (the iframe loads the document, not a JSON API). The CORS allowlist only matters when the embedding origin's own JS calls `/api/*` directly. ## Tests | File | Asserts | Status | |---|---|---| | `cmd/server/cors_embed_1369_test.go` | 4 (env override, env-empty, env-trim, GET/HEAD contract, preflight POST rejected) | green | | `test-embed-mode-1369.js` | 9 (helper allowlist + param parsing) | green | | `cmd/server/cors_test.go` | existing | updated to read-only method-set assertion | TDD: 2 red commits (one per part, both compile, both fail on assertions) → 2 green commits. ## Out of scope (per the issue's narrow ask) * Other SPA routes do not honor `?embed=1` (their chrome makes layout assumptions; defer until requested). * No iframe sandboxing recommendation — that's the embedder's responsibility. * No CSP / `X-Frame-Options` change in this PR — frames are already permitted; add an explicit `frame-ancestors` policy in a follow-up if operators want to whitelist embedders at the HTTP layer too. ## Security notes (DJB lens) * Allowlist is exact-match, case-sensitive string compare — no normalization, no scheme/host parsing, no surprises. * No `Access-Control-Allow-Credentials` (would let third parties read auth'd state via cookies). * No reflection of arbitrary origins (every echoed origin came from the allowlist). * Methods narrowed to read-only; even a misconfigured allowlist can't grant cross-origin writes through this middleware. 🤖 Generated with OpenClaw --------- Co-authored-by: bot <bot@corescope.local> |
||
|
|
ca2c3d6c79 |
feat(1488): customize marker stroke (color, width, opacity) (#1494)
## Summary Reporter (@EldoonNemar in #1488) found the new white marker stroke overwhelming with hundreds of nodes on screen. This PR exposes the stroke through CSS vars + a customizer panel so operators can dial color/width/opacity (or remove it) without code edits. **Scope:** ship stroke customization only. The reporter also asked for the old glow-style highlight ring as an alternative — that's a separate visual feature that needs design discussion, so it's deferred to a follow-up issue. ## Changes - **`public/style.css`** `:root` declares `--mc-marker-stroke-color` / `--mc-marker-stroke-width` / `--mc-marker-stroke-opacity` with sensible defaults (white, 1, 1) that match current behavior. - **`public/roles.js`** `makeRoleMarkerSVG` — replaced the 6 baked `stroke="#fff" stroke-width="1"` literals with a single shared `strokeAttr` referencing the CSS vars. One source of truth for all role shapes. - **`public/map.js`** `makeMarkerIcon` — same migration. The observer star overlay keeps its narrow 0.8 width but routes color + opacity through the same vars. - **`public/live.js`** `addNodeMarker` fallback SVG — same migration. - **`public/customize-v2.js`** — new `markerStroke` object section (color/width/opacity) with validation, `applyCSS` writes, three controls on the Colors tab → "Marker Stroke" panel (color picker + width slider 0–4 + opacity slider 0–100%). Optimistic CSS-var writes on the `input` event so markers repaint live as the operator drags. - **`cmd/server/{config,types,routes}.go`** — `ThemeFile` / `Config` / `ThemeResponse` pick up `MarkerStroke` so `theme.json` and `config.json` can ship server-side defaults. Defaults mirror the `:root` CSS values so no breaking change for current operators. - **`config.example.json`** — documented `markerStroke` section with usage hint. ## TDD - **Red commit** `92183f95` — `test-issue-1488-marker-stroke-vars.js` (5 sections, 18 assertions); failed 14/18 before implementation. - **Green commit** `ce39637e` — implementation; same test now passes 18/18. - Existing `#1438` (marker CSS-var migration) and `#1293` (marker shapes) regression tests still pass. - Go tests (`cmd/server/...`) all green. ## CDP validation Synthetic page with 600 markers, three blocks proving CSS-var control works end-to-end: | Block | Stroke setting | Computed `getComputedStyle().stroke` / width / opacity | | --- | --- | --- | | Default | `var(--mc-marker-stroke-color)` (no override) | `rgba(255,255,255,0.85)` / `1px` / `1` | | Tuned | inline `--mc-marker-stroke-*` (operator override) | `rgb(255,255,255)` / `0.5px` / `0.3` | | Cyan | inline `--mc-marker-stroke-*` (branding/CB) | `rgb(0,229,255)` / `2px` / `1` | Same SVG source, three different rendered strokes — that's the whole point. Runtime `documentElement.style.setProperty(...)` (which is exactly what the customizer slider's `input` handler does) repaints mounted markers without reload. CDP screenshot attached to the implementation note. ## Hot-deploy Frontend + Go binary changes. Safe to hot-deploy frontend files (`public/*.js`, `public/style.css`) via the standard staging path; Go binary update needs a container restart. ## Defer Glow highlight ring (the second half of #1488) — separate follow-up issue. This PR delivers the immediately-useful, smaller deliverable. Partial fix for #1488 (stroke customization shipped; glow ring deferred to a follow-up issue). --------- Co-authored-by: meshcore-bot <bot@meshcore.local> |
||
|
|
13bdee57d4 |
perf: P0 hot-path fixes (observers, neighbor-graph, observer-analytics) (#1481) (#1483)
## What Three of the four P0s from #1481's scale-test findings. Each cuts a distinct hot path; together they target /api/observers, /api/analytics/neighbor-graph, and /api/observers/{id}/analytics — the top three live offenders. ### P0-1: 5-min atomic-pointer cache for default neighbor-graph response - Live p95 10.8s on the most-trafficked organic endpoint. - Background recomputer (5-min cadence per operator directive) builds the default-filter (`minCount=5 minScore=0.1`, no region, no role) `NeighborGraphResponse` and stores it via `atomic.Pointer`. - `handleNeighborGraph` short-circuits on the default shape; non-default filters take the extracted `computeNeighborGraphResponse` path (identical semantics to the previous inline build). ### P0-2: cache parsed `StoreObs.Timestamp` + drop RLock window - `handleObserverAnalytics` re-parsed the RFC3339 timestamp three times per observation, for 60k+ observations per active observer, under `s.store.mu.RLock` — blocking writers for the full scan. - `StoreObs.ParsedTime()` parses once via `sync.Once` (mirrors `StoreTx.ParsedDecoded`). - Handler snapshots the `byObserver[id]` pointer slice, releases the RLock immediately, then iterates locally. ### P0-3: 30s cache for `/api/observers` + sargable `IN` + covering index - Three SQL queries on every request → ~1.7s p50 at 50-concurrent. - Atomic-pointer 30s cache for the default (no-filter) query. - `GetNodeLocationsByKeys` drops `LOWER(public_key) IN (...)` (non-sargable); callers pre-lowercase in Go and the plain `IN` matches the existing `public_key` index. - New ingestor migration `obs_observer_ts_idx_v1` adds composite index `idx_observations_observer_idx_timestamp(observer_idx, timestamp)` so `GetObserverPacketCounts` can resolve its GROUP-BY + range filter from the index without scanning the 1.9M-row observations table. ### P0-4: deferred `perfMiddleware`'s global mutex was claimed to serialize every API request. A direct test (`50 concurrent requests through the middleware, handler sleeps 20ms each`) shows total elapsed ≈ 25ms, not 1s — the lock is held only for the post-handler bookkeeping (a few µs). Real impact is below measurement noise. Skipping to avoid invasive churn on PerfStats consumers without a demonstrable win. ## Test plan Red → green per P0: - `observers_cache_test.go` — handler reads `s.observersCache` before SQL, TTL boundary, atomic.Pointer (no mutex contention). - `storeobs_parsedtime_test.go` — parses three timestamp shapes, caches result, no race under concurrent readers. - `neighbor_graph_cache_test.go` — handler serves from atomic pointer when set, bypasses cache when `?region=` (or any non-default filter) is passed. Full server + ingestor suites pass: `go test -count=1 ./...`. ## Perf proof Before/after p50/p95/p99 (50 requests × 50 concurrent) against prod (before) and staging once CI deploys (after) will be posted as a PR comment per the operator's "no merge without proof of improvement" gate. Closes #1481 ## TDD exemption — P0-1 and P0-2 (net-new surfaces, AGENTS.md) Per CoreScope `AGENTS.md` § "Exemptions": **net-new code surfaces with no prior tests to break** may land tests in the same PR without a strict test-first → impl commit split. - **P0-1 (neighbor-graph atomic-pointer cache)** — `neighborGraphCache`, `recomputeNeighborGraphCache`, `loadNeighborGraphCacheBytes`, `startNeighborGraphRecomputer` and the default-shape short-circuit in `handleNeighborGraph` were brand-new code with no pre-existing assertions covering them. There was no green test to first turn red. - **P0-2 (cached `StoreObs.Timestamp` + RLock window drop)** — `StoreObs.ParsedTime()` and the snapshot+release pattern in `handleObserverAnalytics` were new surfaces; the prior code did the parse inline per call with no behavioural test to break. P0-3 was authored properly red-then-green (commit `6e63ec6a` red, then `83ae129b` green) and does NOT use this exemption. ## Default-filter detection vs frontend reality (#1483 follow-up) The Neighbor Graph analytics tab in `public/analytics.js` fetches `/analytics/neighbor-graph?min_count=1&min_score=0` because the client-side sliders need the full edge set to filter from. That shape did NOT match the `(5, 0.1)` cached default, so the UI tab still paid the cold compute cost despite #1481 P0-1. The #1483 follow-up commit caches BOTH shapes in the same recomputer pass: - `(minCount=5, minScore=0.1, no region, no role)` — `live.js` affinity-scoring consumer. - `(minCount=1, minScore=0, no region, no role)` — analytics tab. Both are served from `atomic.Pointer` with an `X-Cache-Age-Seconds` header. The per-shape cost in the background goroutine is roughly linear in edge count; total recompute time stays well under the 5-minute cadence on prod-scale graphs. --------- Co-authored-by: openclaw-bot <bot@openclaw.dev> Co-authored-by: mc-bot <mc-bot@users.noreply.github.com> |
||
|
|
43b93c6bb9 |
feat(observers): surface naive-clock observers as ⚠️ chip + detail banner (#1478) (#1480)
## Summary Issue #1478 — surface observers whose envelope timestamps are being clamped because they're emitting zone-less local-time strings (UTC-N observers showed up perpetually as "Stale" before #1466, and per-packet rxTime is still clamped to ingest time for them, muddying propagation-delay analytics). Now the UI tells operators which observers are misconfigured + how to fix it. ## What changed ### Ingestor (cmd/ingestor) - New `observers_clock_naive_v1` migration adds three columns to `observers`: - `clock_skew_seconds INTEGER` (signed: negative = behind UTC, positive = ahead) - `clock_skew_count_24h INTEGER` (rolling 24h event count) - `clock_last_naive_at TEXT` (RFC3339 timestamp of last clamp) - `resolveRxTime` now returns `(rxTime, naiveSkewSec)`. The packet-handler call site invokes `store.RecordNaiveSkew(observerID, deltaSec)` whenever a naive envelope is clamped (the existing >15 min naive-tolerance path). The counter resets to 1 if no event in the prior 24h, else increments. Single INSERT-or-UPDATE round trip per clamp. ### Server (cmd/server) - `Observer` struct + `GetObservers` / `GetObserverByID` extended to scan the three new columns. - `ObserverResp` gains four JSON fields exposed by `/api/observers` and `/api/observers/{id}`: - `clock_naive` (bool, derived from `clock_last_naive_at` being within 24h) - `clock_skew_seconds`, `clock_skew_count_24h`, `clock_last_naive_at` - Decay is **read-side**: a stale event yields `clock_naive=false` with zero counts. No background sweep, no writes from the read-only server, no race with the ingestor. ### Frontend (public) - `window.ObserversNaiveChip.render(o)` — total render helper, returns ⚠️ chip HTML when `o.clock_naive===true`, `""` otherwise. Used inline in the observers-list `name` cell and in the row-detail slide-over. Tooltip explains magnitude + direction + count + fix. - `window.ObserverDetailNaiveBanner.render(obs)` — yellow alert banner at the top of the observer-detail page with the skew magnitude, last-event timestamp, and the actionable fix ("Set host clock to UTC, OR emit Z-suffixed/offset-aware timestamps from the observer script"). ## TDD trail - `5ddd5b42` red: backend `cmd/server/observer_naive_clock_1478_test.go` (3 tests asserting JSON fields + 24h decay) + frontend `test-observer-naive-clock-1478.js` (8 jsdom-style tests asserting helpers exist and render correctly). Both failed on master with field-missing / export-missing assertions. - `4ecc79c8` green backend: schema + Observer / GetObservers / ObserverResp / handler decay. - `2137ab81` green frontend: chip + banner helpers and call sites. ## Tests - `cd cmd/server && go test ./...` → all green (full suite, 46s) - `cd cmd/ingestor && go test ./...` → all green (full suite, 98s) - `node test-observer-naive-clock-1478.js` → 8/8 pass - `node test-frontend-helpers.js` → unchanged from master (pre-existing failures only) ## Acceptance (issue #1478) - ✅ Observer running with `python datetime.now().isoformat()` (naive, off by N hours) → `clock_naive=true` after the next clamp → UI shows ⚠️ chip + banner. - ✅ Observer with `datetime.now(timezone.utc).isoformat()` (Z-suffixed) → never clamped → never flagged. - ✅ Observer that fixed its clock → `clock_naive` returns to `false` 24h after the last clamp event (read-side decay). Closes #1478. --------- Co-authored-by: openclaw <bot@openclaw.local> |
||
|
|
462cb2cb5a |
chore: update MeshCore URLs to use new site (#1445)
# Summary The main MeshCore website is https://meshcore.io. Reasons for the new website are listed here: https://blog.meshcore.io/2026/04/23/the-split # Changes Any occurrence of `meshcore.co.uk` was replaced with `meshcore.io`. No logic was changed, only updated strings. Co-authored-by: hrtndev <hrtndev@users.noreply.github.com> |