## Summary
**PR 3/3 of #1034** — wires the existing `window.ChannelQR` module (PR2
#1035) into the existing channel modal placeholders (PR1 #1037).
### Changes
**`public/channels.js`**
- **Generate handler** (`#chGenerateBtn`): replaced the "QR coming in
next update" placeholder text with a real call to
`window.ChannelQR.generate(label || channelName, keyHex, qrOut)`.
Renders QR canvas + `meshcore://channel/add?...` URL + Copy Key inline
into `#qr-output`.
- **Scan handler** (`#scan-qr-btn`): removed `disabled` attribute,
refreshed title, and added a click handler that calls
`window.ChannelQR.scan()`. On success it populates `#chPskKey` (from
`result.secret`) and `#chPskName` (from `result.name`); on cancel it's a
no-op; on error it surfaces the message via `#chPskError`.
The Share button on sidebar entries was already wired to
`ChannelQR.generate` in PR1 (no change needed).
### TDD
1. **Red commit** (`178020b`): `test-channel-qr-wiring.js` — 12
assertions, 7 failed against the placeholder code (Generate handler
still printed "coming in next update", scan button still disabled).
2. **Green commit** (`e708f3f`): wiring added → all 12 assertions pass.
### E2E (rule 18)
`test-e2e-playwright.js` gains 3 Playwright tests (run against the live
Go server with fixture DB in CI):
- Generate → asserts `#qr-output canvas` and the
`meshcore://channel/add` URL appear after the click.
- Scan button is enabled (no `disabled` attribute).
- Stubs `ChannelQR.scan` to return `{name, secret}`, clicks the button,
asserts `#chPskKey` + `#chPskName` are populated.
### CI registration
Added `node test-channel-qr-wiring.js` and `node
test-channel-modal-ux.js` to the JS unit-test step in
`.github/workflows/deploy.yml` (and `test-all.sh`).
### Closes
Closes#1034 (final PR in the redesign series).
---------
Co-authored-by: OpenClaw Bot <bot@openclaw.local>
## Summary
Implements the full filter-input UX upgrade from #966 — Wireshark-style
help, autocomplete, right-click-to-filter, and saved filters.
Closes#966.
## Surfaces
### A. Help popover (ⓘ button next to filter input)
Auto-generated from `PacketFilter.FIELDS` / `OPERATORS` so it stays in
sync with the parser. Includes:
- Syntax overview (boolean ops, parens, case-insensitivity,
URL-shareable filters)
- Full field reference (27 entries: top-level + `payload.*`)
- Full operator reference with one example per op
- 10 ready-to-paste examples
- Tips (right-click, autocomplete, save)
### B. Autocomplete dropdown
- Type partial field name → field suggestions (top-level + dynamic
`payload.*` keys discovered from visible packets)
- Type `field` → operator suggestions
- Type `type ==` → list of canonical type values (`ADVERT`, `GRP_TXT`,
…)
- Type `route ==` → list of route values (`FLOOD`, `DIRECT`,
`TRANSPORT_FLOOD`, …)
- Keyboard nav: ↑/↓, Tab/Enter to accept, Esc to dismiss
### C. Right-click → filter by this value
Right-click any of these cells in the packet table:
- `hash`, `size`, `type`, `observer`
Context menu offers `==`, `!=`, `contains`. Click → clause appended to
filter input (with `&&` if expression already present).
### D. Saved filters
- ★ Saved ▾ dropdown next to the input
- 7 starter defaults (Adverts only, Channel traffic, Direct messages,
Strong signal SNR > 5, Multi-hop, Repeater adverts, Recent < 5m)
- "+ Save current expression" prompts for a name and persists to
`localStorage` under `corescope_saved_filters_v1`
- User filters can be deleted (✕); defaults cannot
- User filters with the same name as a default override it
## Implementation
**`public/packet-filter.js`** — exposes `FIELDS`, `OPERATORS`,
`TYPE_VALUES`, `ROUTE_VALUES`, and a new `suggest(input, cursor, opts)`
function that returns ranked autocomplete suggestions with
replace-range. Pure function — no DOM, fully unit-tested.
**NEW `public/filter-ux.js`** — `window.FilterUX` IIFE owning the help
popover, autocomplete dropdown, context menu, and saved-filters store.
`init()` is idempotent, called once after the filter input renders.
**`public/packets.js`** — calls `FilterUX.init()` after the filter input
IIFE; row builders gain `data-filter-field` / `data-filter-value` attrs
on hash/size/type/observer cells. `filter-group` wrapper now `position:
relative` so dropdowns anchor correctly.
**`public/style.css`** — scoped `.fux-*` styles using existing CSS
variables (no new theme tokens).
## Tests
- `test-packet-filter-ux.js` (19 unit tests, wired into `test-all.sh`):
- Metadata exposure (FIELDS / OPERATORS / TYPE_VALUES / ROUTE_VALUES)
- `suggest()` for empty input, prefix match, after `==`, dynamic
`payload.*` keys
- `SavedFilters.list/save/delete` — defaults, persistence, override,
dedup
- `buildCellFilterClause()` and `appendClauseToExpr()` quoting +
appending
- `test-filter-ux-e2e.js` (Playwright, wired into `deploy.yml`):
- Navigate /packets → metadata exposed
- Help popover opens with field reference, operators, examples
- Autocomplete shows on focus, filters by prefix, accepts on Enter
- Saved-filter dropdown lists defaults, click populates input
- Right-click on TYPE cell → context menu → click appends clause
- Save current expression persists to localStorage
TDD red commit (`bddf1c1`) — assertion failures only, no import errors.
Green commit (`0d3f381`) — all 19 unit tests pass.
## Browser validation
Spawned local server on :39966 against the e2e fixture DB and exercised
every UX surface via the openclaw browser tool. Confirmed:
- `window.PacketFilter.FIELDS.length === 27`, `suggest()` available
- `FilterUX.SavedFilters.list().length === 7` (defaults seeded)
- Help popover renders with `payload.name`, `contains`, `ADVERT` text
content
- Right-click on a `data-filter-field="type"` /
`data-filter-value="Response"` cell → context menu showed three options
→ clicking == populated the input with `type == "Response"` (and the
existing alias resolver matched it to `payload_type === 1`)
- Autocomplete on `pay` returned `payload_bytes`, `payload_hex`,
`payload.name`, `payload.lat`, `payload.lon`, `payload.text`
## Out of scope (deferred per the issue)
- Server-synced saved filters (cross-device)
- Visual filter builder
- Custom field expressions
## Acceptance criteria
- [x] Help icon (ⓘ) next to filter input opens documentation popover
- [x] Field reference table + operator reference + 6+ examples in
popover
- [x] Autocomplete dropdown on field names (top-level + `payload.*`)
- [x] Autocomplete dropdown on values for `type` / `route` operators
- [x] Right-click on packet cell → "Filter ==" / "Filter !=" / "Filter
contains"
- [x] Right-click context menu hides when clicking elsewhere / Esc
- [x] Saved-filters dropdown with at least 5 default examples (7
shipped)
- [x] User-saved filters persist in localStorage
- [x] Real-time match count next to filter input (already shipped
pre-PR; preserved)
- [ ] Improved error messages with token + position — partial: existing
parse errors already cite position; not a regression
- [x] No regression in existing filter behavior
(`test-packet-filter.js`: 69/69 pass)
---------
Co-authored-by: meshcore-bot <bot@meshcore.local>
Closes#1045.
## What
Adds an optional region dropdown to the **Live** page that filters
incoming packets by observer IATA. When a user selects one or more
regions, only packets observed by repeaters in those regions render in
the feed/animation/audio.
## How
- New `liveRegionFilter` container in the live header toggles row,
initialised via the shared `RegionFilter` component in `dropdown` mode
(matches packets/nodes/observers pages).
- On page init, fetches `/api/observers` once and builds an `observer_id
→ IATA` map.
- `packetMatchesRegion(packets, obsMap, selected)` (pure helper, OR
across observations, case-insensitive) gates `renderPacketTree` next to
the existing favorite + node filters.
- Selection persists in localStorage via the existing `RegionFilter`
machinery — no per-page key needed.
- Listener cleanup hooked into the existing live-page teardown.
## TDD
- Red commit `55097ce`: `test-live-region-filter.js` asserts
`_livePacketMatchesRegion` exists and behaves correctly across 9 cases
(no-selection passthrough, single match, no-match, OR across
observations, multi-region selection, unknown observer, missing
observer_id, case-insensitivity, observer-map override). Fails with
`_livePacketMatchesRegion must be exposed` against master.
- Green commit `fdec7bf`: implements helper + UI wiring + CSS; test
passes.
Test wired into `.github/workflows/deploy.yml` JS unit-test step.
## Notes
- Server-side WS broadcast is unchanged — filtering is purely
client-side, as the issue requests ("something a user can activate
themselves, and not something that would be server wide").
- Pre-existing `test-live.js` / `test-live-dedup.js` failures on master
are not introduced or affected by this PR (verified by running both on
master HEAD).
---------
Co-authored-by: meshcore-bot <bot@openclaw.local>
Fixes#289.
Adds Wireshark-style timestamp predicates to the client-side packet
filter
engine (`public/packet-filter.js`).
## New syntax
| Form | Meaning |
| --- | --- |
| `time after "2024-01-01"` | packets with timestamp strictly after the
given datetime |
| `time before "2024-12-31T23:59:59Z"` | packets strictly before |
| `time between "2024-01-01" "2024-02-01"` | inclusive range
(order-insensitive) |
| `age < 1h` | packets newer than 1 hour |
| `age > 24h` | packets older than 24 hours |
| `age < 7d && type == ADVERT` | composes with existing predicates |
Duration units: `s` / `m` / `h` / `d` / `w`. Datetime values use
`Date.parse`
(ISO 8601 + bare `YYYY-MM-DD`). `time` is also accepted as `timestamp`.
## Implementation
- `OP_WORDS` extended with `after`, `before`, `between`.
- New `TK.DURATION` token: lexer recognises `<number><unit>` and
pre-converts
to seconds at lex time (no per-evaluation parsing cost).
- `between` is a two-value op handled in `parseComparison`.
- Field resolver:
- `time` / `timestamp` → epoch-ms; falls back to `first_seen` then
`latest`
so grouped rows from `/api/packets?groupByHash=true` work.
- `age` → seconds since `Date.now()`.
- Parse-time validation rejects invalid datetimes and unknown duration
units
(silent-fail would have been a footgun — every packet would just
disappear).
- Null/missing timestamps → predicate returns `false`, consistent with
the
existing null-field behaviour for `snr` / `rssi`.
## Open questions from the issue
- **UTC vs local**: defaults to whatever `Date.parse` returns. Bare
dates like
`"2024-01-01"` are interpreted as UTC midnight by the spec. Tying this
to
the #286 timestamp display setting can be a follow-up.
- **URL query string**: out of scope for this PR.
## Tests
- New `test-packet-filter-time.js`: 20 tests covering
`after`/`before`/`between`,
ISO datetimes, all duration units, composition with `&&`, null-timestamp
safety,
invalid-datetime / invalid-unit errors, and `first_seen` fallback.
- Wired into `.github/workflows/deploy.yml` JS unit-test step.
- Existing `test-packet-filter.js` (69 tests) and inline self-tests
still pass.
## Commits
- Red: `5ccfad3` — failing tests + lexer-only stub (compiles, asserts
fail)
- Green: `976d50f` — implementation
---------
Co-authored-by: OpenClaw Bot <bot@openclaw.local>
## P0: PSK channel decryption silently failed on HTTP origins
User reported PSK key `372a9c93260507adcbf36a84bec0f33d` "still doesn't
work" after PRs #1021 (AES-ECB pure-JS) and #1024 (PSK UX) merged.
Reproduced end-to-end and found the actual remaining bug.
### Root cause
PR #1021 fixed the AES-ECB path by vendoring a pure-JS core, but
**SHA-256 and HMAC-SHA256 in `public/channel-decrypt.js` are still
pinned to `crypto.subtle`**. `SubtleCrypto` is exposed **only in secure
contexts** (HTTPS / localhost); when CoreScope is served over plain HTTP
— common for self-hosted instances — `crypto.subtle` is `undefined`,
and:
- `computeChannelHash(key)` → `Cannot read properties of undefined
(reading 'digest')`
- `verifyMAC(...)` → `Cannot read properties of undefined (reading
'importKey')`
Both throws are swallowed by `addUserChannel`'s `try/catch`, so the only
user-visible signal is the toast `"Failed to decrypt"` with no
console-friendly explanation. Verdict: PR #1021 only fixed half of the
crypto-in-insecure-context problem.
### Reproduction (no browser required)
`test-channel-decrypt-insecure-context.js` loads the production
`public/channel-decrypt.js` in a `vm` sandbox where `crypto.subtle` is
undefined (mirrors HTTP browser). Pre-fix it failed 8/8 with the exact
error above; post-fix it passes 8/8.
### Fix
- New `public/vendor/sha256-hmac.js`: minimal pure-JS SHA-256 +
HMAC-SHA256 (FIPS-180-4 + RFC 2104, ~120 LOC, MIT). Verified against
Node `crypto` for SHA-256 (empty / "abc" / 1000 bytes) and RFC 4231
HMAC-SHA256 TC1.
- `public/channel-decrypt.js`: `hasSubtle()` guard. `deriveKey`,
`computeChannelHash`, and `verifyMAC` use `crypto.subtle` when available
and fall back to `window.PureCrypto` otherwise. Same API, same return
types, same async signatures.
- `public/index.html`: load `vendor/sha256-hmac.js` immediately before
`channel-decrypt.js` (mirrors the `vendor/aes-ecb.js` wiring from
#1021).
### TDD
- **Red** (`8075b55`): `test-channel-decrypt-insecure-context.js` — runs
the **unmodified** prod module in a no-`subtle` sandbox, asserts on the
known PSK key (hash byte `0xb7`) and synthetic encrypted packet
round-trip. Compiles, runs, **fails 8/8 on assertions** (not on import
errors).
- **Green** (`232add6`): vendor + delegate. Test passes 8/8.
- Wired into `test-all.sh` and `.github/workflows/deploy.yml` so CI
gates the regression.
### Validation (all green post-fix)
| Test | Result |
|---|---|
| `test-channel-decrypt-insecure-context.js` | 8/8 |
| `test-channel-decrypt-ecb.js` (#1021 KAT) | 7/7 |
| `test-channel-decrypt-m345.js` (existing) | 24/24 |
| `test-channel-psk-ux.js` (#1024) | 19/19 |
| `test-packet-filter.js` | 69/69 |
### Files changed
- `public/vendor/sha256-hmac.js` — **new** (~150 LOC, MIT, decrypt-side
only)
- `public/channel-decrypt.js` — `hasSubtle()` guard + fallback in
`deriveKey`/`computeChannelHash`/`verifyMAC`
- `public/index.html` — script tag for `vendor/sha256-hmac.js`
- `test-channel-decrypt-insecure-context.js` — **new** (8 assertions,
pure Node, no browser)
- `test-all.sh` + `.github/workflows/deploy.yml` — wire the test
### Risk / scope
- Frontend-only, decrypt-side only. No server, schema, or config changes
(Config Documentation Rule N/A).
- Secure-context behaviour unchanged (still uses Web Crypto when
present).
- HMAC `secret` building, MAC truncation (2 bytes), and AES-ECB
delegation untouched.
- Hash vector for the user's PSK key matches:
`SHA-256(372a9c93260507adcbf36a84bec0f33d) = b7ce04…`, channel hash byte
`0xb7` (183) — confirmed against Node `crypto` and against the new
pure-JS path.
### Note on the FIPS test data in the new test
The PSK `372a9c93260507adcbf36a84bec0f33d` is shared test data from the
bug report, not a real channel secret.
---------
Co-authored-by: openclaw-bot <bot@openclaw.local>
## Summary
Adds Wireshark-style filter support for transport route type to the
packets-page filter engine, per #339.
## New filter syntax
| Filter | Matches |
|---|---|
| `transport == true` | route_type 0 (TRANSPORT_FLOOD) or 3
(TRANSPORT_DIRECT) |
| `transport == false` | route_type 1 (FLOOD) or 2 (DIRECT) |
| `transport` | bare truthy — same as `transport == true` |
| `route == T_FLOOD` | alias for `route == TRANSPORT_FLOOD` |
| `route == T_DIRECT` | alias for `route == TRANSPORT_DIRECT` |
| `route == TRANSPORT_FLOOD` / `TRANSPORT_DIRECT` | already worked —
canonical names |
Aliases are case-insensitive (`route == t_flood` works).
## Implementation
- `public/packet-filter.js`: new `transport` virtual boolean field
driven by `isTransportRouteType(rt)` which returns `rt === 0 || rt ===
3`, mirroring `isTransportRoute()` in `cmd/server/decoder.go`.
- `ROUTE_ALIASES = { t_flood: 'TRANSPORT_FLOOD', t_direct:
'TRANSPORT_DIRECT' }` resolved in the equality comparator, same pattern
as the existing `TYPE_ALIASES`.
- All client-side; no backend changes (issue noted this).
## Tests / TDD
Red commit: `9d8fdf0` — five new assertion-failing test cases + wires
`test-packet-filter.js` into CI (it existed but wasn't being executed).
Green commit: `c67612b` — implementation makes all 69 tests pass.
The CI wiring is part of the red commit on purpose: previously
`test-packet-filter.js` was never run by CI, so a frontend filter
regression couldn't fail the build. Now it can.
## CI gating proof
Run `git revert c67612b` locally → `node test-packet-filter.js` reports
5 assertion failures (not build/import errors). Re-applying the green
commit returns all tests to passing.
Fixes#339
---------
Co-authored-by: openclaw-bot <bot@openclaw.local>
## Problem
The E2E fixture DB (`test-fixtures/e2e-fixture.db`) has static
timestamps from March 29, 2026. The map page applies a default
`lastHeard=30d` filter, so once the fixture ages past 30 days all nodes
are excluded from `/api/nodes?lastHeard=30d` — causing the "Map page
loads with markers" test to fail deterministically.
This started blocking all CI on ~April 28, 2026 (30 days after March
29).
Closes#955 (RCA #1: time-based fixture rot)
## Fix
Added `tools/freshen-fixture.sh` — a small script that shifts all
`last_seen`/`first_seen` timestamps forward so the newest is near
`now()`, preserving relative ordering between nodes. Runs in CI before
the Go server starts. Does **not** modify the checked-in fixture (no
binary blob churn).
## Verification
```
$ cp test-fixtures/e2e-fixture.db /tmp/fix4.db
$ bash tools/freshen-fixture.sh /tmp/fix4.db
Fixture timestamps freshened in /tmp/fix4.db
nodes: min=2026-05-01T07:10:00Z max=2026-05-01T14:51:33Z
$ ./corescope-server -port 13585 -db /tmp/fix4.db -public public &
$ curl -s "http://localhost:13585/api/nodes?limit=200&lastHeard=30d" | jq '{total, count: (.nodes | length)}'
{
"total": 200,
"count": 200
}
```
All 200 nodes returned with the 30-day filter after freshening (vs 0
without the fix).
Co-authored-by: you <you@example.com>
## Summary
Fixes RCA #2 from #955: the HTTP listener and `/api/stats` go live
before background goroutines (pickBestObservation, neighbor graph build)
finish, causing CI readiness checks to pass prematurely.
## Changes
1. **`cmd/server/healthz.go`** — New `GET /api/healthz` endpoint:
- Returns `503 {"ready":false,"reason":"loading"}` while background init
is running
- Returns `200 {"ready":true,"loadedTx":N,"loadedObs":N}` once ready
2. **`cmd/server/main.go`** — Added `sync.WaitGroup` tracking
pickBestObservation and neighbor graph build goroutines. A coordinator
goroutine sets `readiness.Store(1)` when all complete.
`backfillResolvedPathsAsync` is NOT gated (async by design, can take 20+
min).
3. **`cmd/server/routes.go`** — Wired `/api/healthz` before system
endpoints.
4. **`.github/workflows/deploy.yml`** — CI wait-for-ready loop now polls
`/api/healthz` instead of `/api/stats`.
5. **`cmd/server/healthz_test.go`** — Tests for 503-before-ready,
200-after-ready, JSON shape, and anti-tautology gate.
## Rule 18 Verification
Built and ran against `test-fixtures/e2e-fixture.db` (499 tx):
- With the small fixture DB, init completes in <300ms so both immediate
and delayed curls return 200
- Unit tests confirm 503 behavior when `readiness=0` (simulating slow
init)
- On production DBs with 100K+ txs, the 503 window would be 5-15s
(pickBestObservation processes in 5000-tx chunks with 10ms yields)
## Test Results
```
=== RUN TestHealthzNotReady --- PASS
=== RUN TestHealthzReady --- PASS
=== RUN TestHealthzAntiTautology --- PASS
ok github.com/corescope/server 19.662s (full suite)
```
Co-authored-by: you <you@example.com>
Reverts the `if: false` guard from #908.
## Why
- Azure subscription was blocked, staging VM `meshcore-runner-2`
deallocated.
- Subscription unblocked, VM started, runner online, smoke CI [run
#25117292530](https://github.com/Kpa-clawbot/CoreScope/actions/runs/25117292530)
passed.
- Time to resume automatic staging deploys on master pushes.
## Changes
- `deploy` job: `if: false` → `if: github.event_name == 'push'`
(original condition from before #908).
- `publish` job: `needs: [build-and-publish]` → `needs: [deploy]`
(original wiring restored).
## Verify after merge
- Next master push triggers the full chain: go-test → e2e-test →
build-and-publish → deploy → publish.
- `docker ps` on staging VM shows `corescope-staging-go` updated to the
new commit.
Co-authored-by: you <you@example.com>
## Problem
`docker pull` on ARM devices fails because the published image is
amd64-only.
## Fix
Enable multi-arch Docker builds via `docker buildx`. **Builder stage
uses native Go cross-compilation; only the runtime-stage `RUN` steps use
QEMU emulation.**
### Changes
| File | Change |
|------|--------|
| `Dockerfile` | Pin builder stage to `--platform=$BUILDPLATFORM`
(always native), accept `ARG TARGETOS`/`ARG TARGETARCH` from buildx, set
`GOOS=$TARGETOS GOARCH=$TARGETARCH CGO_ENABLED=0` on every `go build` |
| `.github/workflows/deploy.yml` | Add `docker/setup-buildx-action@v3` +
`docker/setup-qemu-action@v3` (latter needed only for runtime-stage
RUNs), set `platforms: linux/amd64,linux/arm64` |
### Build architecture
- **Builder stage** (`FROM --platform=$BUILDPLATFORM
golang:1.22-alpine`) — runs natively on amd64. Go toolchain
cross-compiles the binaries to `$TARGETARCH` via `GOOS/GOARCH`. No
emulation, ~10× faster than emulated builds. Works because
`modernc.org/sqlite` is pure Go (no CGO).
- **Runtime stage** (`FROM alpine:3.20`) — buildx pulls the per-arch
base. RUN steps (`apk add`, `mkdir/chown`, `chmod`) execute inside the
target-arch image, so QEMU is required to interpret arm64 binaries on
the amd64 host. Only a handful of short shell commands run under
emulation, so the QEMU cost is small.
### Verify
After merge, on an ARM device:
```bash
docker pull ghcr.io/kpa-clawbot/corescope:edge
docker inspect ghcr.io/kpa-clawbot/corescope:edge --format '{{.Architecture}}'
# → arm64
```
> First arm64 image appears on the next push to master after this
merges.
Closes#768
---------
Co-authored-by: you <you@example.com>
Co-authored-by: Kpa-clawbot <agent@corescope.local>
## Problem
The "Deploy Staging" job's Smoke Test always fails with `Staging
/api/stats did not return engine field`.
Root cause: the step hardcodes `http://localhost:82/api/stats`, but
`docker-compose.staging.yml:21` publishes the container on
`${STAGING_GO_HTTP_PORT:-80}:80`. Default is port 80, not 82. curl gets
ECONNREFUSED, `-sf` swallows the error, `grep -q engine` sees empty
input → failure.
Verified on staging VM: `ss -lntp` shows only `:80` listening; `docker
ps` confirms `0.0.0.0:80->80/tcp`. A `curl http://localhost:82` returns
connection-refused.
## Fix
Read `STAGING_GO_HTTP_PORT` (same default as compose) so the smoke test
tracks the port the container was actually launched on. Failure message
now includes the resolved port to make future port mismatches
self-diagnosing.
## Tested
Logic only — the curl + grep pattern is unchanged. If any CI env
override sets `STAGING_GO_HTTP_PORT`, the smoke test now follows it.
Co-authored-by: Kpa-clawbot <agent@corescope.local>
The deploy step used only 'docker compose down' which can't remove
containers created via 'docker run'. Now explicitly stops+removes
the named container first, then runs compose down as cleanup.
Permanent fix for the recurring CI deploy failure.
Removes linux/arm64 from multi-platform build and drops QEMU setup.
All infra (prod + staging) is x86. QEMU emulation was adding ~12min
to every CI run for an unused architecture.
## Consolidate CI Pipeline — Build + Publish to GHCR + Deploy Staging
### What
Merges the separate `publish.yml` workflow into `deploy.yml`, creating a
single CI/CD pipeline:
**`go-test → e2e-test → build-and-publish → deploy → publish-badges`**
### Why
- Two workflows doing overlapping builds was wasteful and error-prone
- `publish.yml` had a bug: `BUILD_TIME=$(date ...)` in a `with:` block
never executed (literal string)
- The old build job had duplicate/conflicting `APP_VERSION` assignments
### Changes
- **`build-and-publish` job** replaces old `build` job — builds locally
for staging, then does multi-arch GHCR push (gated to push events only,
PRs skip)
- **Build metadata** computed in a dedicated step, passed via
`GITHUB_OUTPUT` — no more shell expansion bugs
- **`APP_VERSION`** is `v1.2.3` on tag push, `edge` on master push
- **Deploy** now pulls the `edge` image from GHCR and tags for compose
compatibility, with fallback to local build
- **`publish.yml` deleted** — no duplicate workflow
- **Top-level `permissions`** block with `packages:write` for GHCR auth
- **Triggers** now include `tags: ['v*']` for release publishing
### Status
- ✅ Rebased onto master
- ✅ Self-reviewed (all checklist items pass)
- ✅ Ready for merge
Co-authored-by: you <you@example.com>
## Summary
Fixes#472
The Docker build job on the self-hosted runner fails with `no space left
on device` because Docker build cache and Go module downloads accumulate
between runs. The existing cleanup (line ~330) runs in the **deploy**
step *after* the build — too late to help.
## Changes
- Added a "Free disk space" step at the start of the build job,
**before** "Build Go Docker image":
- `docker system prune -af` — removes all unused images, containers,
networks
- `docker builder prune -af` — clears the build cache
- `df -h /` — logs available disk space for visibility
- Kept the existing post-deploy cleanup as belt-and-suspenders
---------
Co-authored-by: you <you@example.com>
Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
## Problem
The self-hosted runner (`meshcore-runner-2`) filled its 29GB disk to
100%, blocking all CI runs:
```
Filesystem Size Used Avail Use%
/dev/root 29G 29G 2.3M 100%
Docker Images: 67 total, 2 active, 18.83GB reclaimable (99%)
```
Root cause: no Docker image cleanup after builds. Each CI run builds a
new image but never prunes old ones.
## Fix
### 1. Docker image cleanup after deploy (`deploy` job)
- Runs with `if: always()` so it executes even if deploy fails
- `docker image prune -af --filter "until=24h"` — removes images older
than 24h (safe: current build is minutes old)
- `docker builder prune -f --keep-storage=1GB` — caps build cache
- Logs before/after `docker system df` for visibility
### 2. Runner log cleanup at start of E2E job
- Prunes runner diagnostic logs older than 3 days (was 53MB and growing)
- Reports `df -h` for disk visibility in CI output
## Impact
After manual cleanup today, disk went from 100% → 35% (19GB free). This
PR prevents recurrence.
## Test plan
- [x] Manual cleanup verified on runner via `az vm run-command`
- [ ] Next CI run should show cleanup step output in deploy job logs
Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Removes the separate config.json file bind mount from both compose
files. The data directory mount already covers it, and the Go server
searches /app/data/config.json via LoadConfig.
- Entrypoint symlinks /app/data/config.json for ingestor compatibility
- manage.sh setup creates config in data dir, prompts admin if missing
- manage.sh start checks config exists before starting, offers to create
- deploy.yml simplified — no more sudo rm or directory cleanup
- Backup/restore updated to use data dir path
Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Removes the separate config.json file bind mount from both compose
files. The data directory mount already covers it, and the Go server
searches /app/data/config.json via LoadConfig.
- Entrypoint symlinks /app/data/config.json for ingestor compatibility
- manage.sh setup creates config in data dir, prompts admin if missing
- manage.sh start checks config exists before starting, offers to create
- deploy.yml simplified — no more sudo rm or directory cleanup
- Backup/restore updated to use data dir path
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Root causes from CI logs:
1. 'read /app/config.json: is a directory' — Docker creates a directory
when bind-mounting a non-existent file. The entrypoint now detects
and removes directory config.json before falling back to example.
2. 'unable to open database file: out of memory (14)' — old container
(3GB) not fully exited when new one starts. Deploy now uses
'docker compose down' with timeout and waits for memory reclaim.
3. Supervisor gave up after 3 fast retries (FATAL in ~6s). Increased
startretries to 10 and startsecs to 2 for server and ingestor.
Additional:
- Deploy step ensures staging config.json exists before starting
- Healthcheck: added start_period=60s, increased timeout and retries
- No longer uses manage.sh (CI working dir != repo checkout dir)
## Summary
Complete CI pipeline restructure. Sequential fail-fast chain, E2E tests
against Go server with real staging data, all deprecated Node.js server
tests removed.
### Pipeline (PR):
1. **Go unit tests** — fail-fast, coverage + badges
2. **Playwright E2E** — against Go server with fixture DB, frontend
coverage, fail-fast on first failure
3. **Docker build** — verify containers build
### Pipeline (master merge):
Same chain + deploy to staging + badge publishing
### Removed:
- All Node.js server-side unit tests (deprecated JS server)
- `npm ci` / `npm run test` steps
- JS server coverage collection (`COVERAGE=1 node server.js`)
- Changed-files detection logic
- Docs-only CI skip logic
- Cancel-workflow API hacks
### Added:
- `test-fixtures/e2e-fixture.db` — real data from staging (200 nodes, 31
observers, 500 packets)
- `scripts/capture-fixture.sh` — refresh fixture from staging API
- Go server launches with `-port 13581 -db test-fixtures/e2e-fixture.db
-public public-instrumented`
---------
Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
Co-authored-by: you <you@example.com>
The Playwright E2E tests were starting `node server.js` (the deprecated
JS server) instead of the Go server, meaning E2E tests weren't testing
the production backend at all.
Changes:
- Add Go 1.22 setup and build steps to the node-test job
- Build the Go server binary before E2E tests run
- Replace `node server.js` with `./corescope-server` in both the
instrumented (coverage) and quick (no-coverage) E2E server starts
- Use `-port 13581` and `-public` flags to configure the Go server
- For coverage runs, serve from `public-instrumented/` directory
The Go server serves the same static files and exposes compatible
/api/* routes (stats, packets, health, perf) that the E2E tests hit.
* docs: remove letsmesh.net reference from README
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* ci: remove paths-ignore from pull_request trigger
PR #233 only touches .md files, which were excluded by paths-ignore,
causing CI to be skipped entirely. Remove paths-ignore from the
pull_request trigger so all PRs get validated. Keep paths-ignore on
push to avoid unnecessary deploys for docs-only changes to master.
* ci: skip heavy CI jobs for docs-only PRs
Instead of using paths-ignore (which skips the entire workflow and
blocks required status checks), detect docs-only changes at the start
of each job and skip heavy steps while still reporting success.
This allows doc-only PRs to merge without waiting for Go builds,
Node.js tests, or Playwright E2E runs.
Reverts the approach from 7546ece (removing paths-ignore entirely)
in favor of a proper conditional skip within the jobs themselves.
* fix: update engine tests to match engine-badge HTML format
Tests expected [go]/[node] text but formatVersionBadge now renders
<span class="engine-badge">go</span>. Updated 6 assertions to
check for engine-badge class and engine name in HTML output.
---------
Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
Co-authored-by: you <you@example.com>
Three optimizations to the CI frontend test pipeline:
1. Run E2E tests and coverage collection concurrently
- Previously sequential (E2E ~1.5min, then coverage ~5.75min)
- Now both run in parallel against the same instrumented server
- Expected savings: ~5 min (coverage runs alongside E2E instead of after)
2. Replace networkidle with domcontentloaded in coverage collector
- SPA uses hash routing — networkidle waits 500ms for network silence
on every navigation, adding ~10-15s of dead time across 23 navigations
- domcontentloaded fires immediately once HTML is parsed; JS initializes
the route handler synchronously
- For in-page hash changes, use 200ms setTimeout instead of
waitForLoadState (which would never re-fire for same-document nav)
3. Extract coverage from E2E tests too
- E2E tests already exercise the app against the instrumented server
- Now writes window.__coverage__ to .nyc_output/e2e-coverage.json
- nyc merges both coverage files for higher total coverage
Also:
- Split Playwright install into browser + deps steps (deps skip if present)
- Replace sleep 5 with health-check poll in quick E2E path
The Windows self-hosted runner picks up jobs and fails because bash
scripts run in PowerShell. Node.js tests need Chromium/Playwright
(Linux-only), and build/deploy/publish use Docker (Linux-only).
Changes:
- node-test: runs-on: [self-hosted, Linux]
- build: runs-on: [self-hosted, Linux]
- deploy: runs-on: [self-hosted, Linux]
- publish: runs-on: [self-hosted, Linux]
- go-test: unchanged (ubuntu-latest)