meshcore-analyzer

mirror of https://github.com/Kpa-clawbot/meshcore-analyzer.git synced 2026-05-12 14:24:43 +00:00

Author	SHA1	Message	Date
Kpa-clawbot	78dabd5bda	feat(filter): timestamp predicates (after/before/between/age) — #289 (#1070 ) Fixes #289. Adds Wireshark-style timestamp predicates to the client-side packet filter engine (`public/packet-filter.js`). ## New syntax \| Form \| Meaning \| \| --- \| --- \| \| `time after "2024-01-01"` \| packets with timestamp strictly after the given datetime \| \| `time before "2024-12-31T23:59:59Z"` \| packets strictly before \| \| `time between "2024-01-01" "2024-02-01"` \| inclusive range (order-insensitive) \| \| `age < 1h` \| packets newer than 1 hour \| \| `age > 24h` \| packets older than 24 hours \| \| `age < 7d && type == ADVERT` \| composes with existing predicates \| Duration units: `s` / `m` / `h` / `d` / `w`. Datetime values use `Date.parse` (ISO 8601 + bare `YYYY-MM-DD`). `time` is also accepted as `timestamp`. ## Implementation - `OP_WORDS` extended with `after`, `before`, `between`. - New `TK.DURATION` token: lexer recognises `<number><unit>` and pre-converts to seconds at lex time (no per-evaluation parsing cost). - `between` is a two-value op handled in `parseComparison`. - Field resolver: - `time` / `timestamp` → epoch-ms; falls back to `first_seen` then `latest` so grouped rows from `/api/packets?groupByHash=true` work. - `age` → seconds since `Date.now()`. - Parse-time validation rejects invalid datetimes and unknown duration units (silent-fail would have been a footgun — every packet would just disappear). - Null/missing timestamps → predicate returns `false`, consistent with the existing null-field behaviour for `snr` / `rssi`. ## Open questions from the issue - UTC vs local: defaults to whatever `Date.parse` returns. Bare dates like `"2024-01-01"` are interpreted as UTC midnight by the spec. Tying this to the #286 timestamp display setting can be a follow-up. - URL query string: out of scope for this PR. ## Tests - New `test-packet-filter-time.js`: 20 tests covering `after`/`before`/`between`, ISO datetimes, all duration units, composition with `&&`, null-timestamp safety, invalid-datetime / invalid-unit errors, and `first_seen` fallback. - Wired into `.github/workflows/deploy.yml` JS unit-test step. - Existing `test-packet-filter.js` (69 tests) and inline self-tests still pass. ## Commits - Red: `5ccfad3` — failing tests + lexer-only stub (compiles, asserts fail) - Green: `976d50f` — implementation --------- Co-authored-by: OpenClaw Bot <bot@openclaw.local>	2026-05-05 01:13:48 -07:00
Kpa-clawbot	3aaa21bbc0	fix(channel-decrypt): pure-JS SHA-256/HMAC fallback for HTTP context (P0 follow-up to #1021 ) (#1027 ) ## P0: PSK channel decryption silently failed on HTTP origins User reported PSK key `372a9c93260507adcbf36a84bec0f33d` "still doesn't work" after PRs #1021 (AES-ECB pure-JS) and #1024 (PSK UX) merged. Reproduced end-to-end and found the actual remaining bug. ### Root cause PR #1021 fixed the AES-ECB path by vendoring a pure-JS core, but SHA-256 and HMAC-SHA256 in `public/channel-decrypt.js` are still pinned to `crypto.subtle`. `SubtleCrypto` is exposed only in secure contexts (HTTPS / localhost); when CoreScope is served over plain HTTP — common for self-hosted instances — `crypto.subtle` is `undefined`, and: - `computeChannelHash(key)` → `Cannot read properties of undefined (reading 'digest')` - `verifyMAC(...)` → `Cannot read properties of undefined (reading 'importKey')` Both throws are swallowed by `addUserChannel`'s `try/catch`, so the only user-visible signal is the toast `"Failed to decrypt"` with no console-friendly explanation. Verdict: PR #1021 only fixed half of the crypto-in-insecure-context problem. ### Reproduction (no browser required) `test-channel-decrypt-insecure-context.js` loads the production `public/channel-decrypt.js` in a `vm` sandbox where `crypto.subtle` is undefined (mirrors HTTP browser). Pre-fix it failed 8/8 with the exact error above; post-fix it passes 8/8. ### Fix - New `public/vendor/sha256-hmac.js`: minimal pure-JS SHA-256 + HMAC-SHA256 (FIPS-180-4 + RFC 2104, ~120 LOC, MIT). Verified against Node `crypto` for SHA-256 (empty / "abc" / 1000 bytes) and RFC 4231 HMAC-SHA256 TC1. - `public/channel-decrypt.js`: `hasSubtle()` guard. `deriveKey`, `computeChannelHash`, and `verifyMAC` use `crypto.subtle` when available and fall back to `window.PureCrypto` otherwise. Same API, same return types, same async signatures. - `public/index.html`: load `vendor/sha256-hmac.js` immediately before `channel-decrypt.js` (mirrors the `vendor/aes-ecb.js` wiring from #1021). ### TDD - Red (`8075b55`): `test-channel-decrypt-insecure-context.js` — runs the unmodified prod module in a no-`subtle` sandbox, asserts on the known PSK key (hash byte `0xb7`) and synthetic encrypted packet round-trip. Compiles, runs, fails 8/8 on assertions (not on import errors). - Green (`232add6`): vendor + delegate. Test passes 8/8. - Wired into `test-all.sh` and `.github/workflows/deploy.yml` so CI gates the regression. ### Validation (all green post-fix) \| Test \| Result \| \|---\|---\| \| `test-channel-decrypt-insecure-context.js` \| 8/8 \| \| `test-channel-decrypt-ecb.js` (#1021 KAT) \| 7/7 \| \| `test-channel-decrypt-m345.js` (existing) \| 24/24 \| \| `test-channel-psk-ux.js` (#1024) \| 19/19 \| \| `test-packet-filter.js` \| 69/69 \| ### Files changed - `public/vendor/sha256-hmac.js` — new (~150 LOC, MIT, decrypt-side only) - `public/channel-decrypt.js` — `hasSubtle()` guard + fallback in `deriveKey`/`computeChannelHash`/`verifyMAC` - `public/index.html` — script tag for `vendor/sha256-hmac.js` - `test-channel-decrypt-insecure-context.js` — new (8 assertions, pure Node, no browser) - `test-all.sh` + `.github/workflows/deploy.yml` — wire the test ### Risk / scope - Frontend-only, decrypt-side only. No server, schema, or config changes (Config Documentation Rule N/A). - Secure-context behaviour unchanged (still uses Web Crypto when present). - HMAC `secret` building, MAC truncation (2 bytes), and AES-ECB delegation untouched. - Hash vector for the user's PSK key matches: `SHA-256(372a9c93260507adcbf36a84bec0f33d) = b7ce04…`, channel hash byte `0xb7` (183) — confirmed against Node `crypto` and against the new pure-JS path. ### Note on the FIPS test data in the new test The PSK `372a9c93260507adcbf36a84bec0f33d` is shared test data from the bug report, not a real channel secret. --------- Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-03 21:06:59 -07:00
Kpa-clawbot	f229e15869	feat(packet-filter): transport boolean + T_FLOOD/T_DIRECT route aliases (#339 ) (#1014 ) ## Summary Adds Wireshark-style filter support for transport route type to the packets-page filter engine, per #339. ## New filter syntax \| Filter \| Matches \| \|---\|---\| \| `transport == true` \| route_type 0 (TRANSPORT_FLOOD) or 3 (TRANSPORT_DIRECT) \| \| `transport == false` \| route_type 1 (FLOOD) or 2 (DIRECT) \| \| `transport` \| bare truthy — same as `transport == true` \| \| `route == T_FLOOD` \| alias for `route == TRANSPORT_FLOOD` \| \| `route == T_DIRECT` \| alias for `route == TRANSPORT_DIRECT` \| \| `route == TRANSPORT_FLOOD` / `TRANSPORT_DIRECT` \| already worked — canonical names \| Aliases are case-insensitive (`route == t_flood` works). ## Implementation - `public/packet-filter.js`: new `transport` virtual boolean field driven by `isTransportRouteType(rt)` which returns `rt === 0 \|\| rt === 3`, mirroring `isTransportRoute()` in `cmd/server/decoder.go`. - `ROUTE_ALIASES = { t_flood: 'TRANSPORT_FLOOD', t_direct: 'TRANSPORT_DIRECT' }` resolved in the equality comparator, same pattern as the existing `TYPE_ALIASES`. - All client-side; no backend changes (issue noted this). ## Tests / TDD Red commit: `9d8fdf0` — five new assertion-failing test cases + wires `test-packet-filter.js` into CI (it existed but wasn't being executed). Green commit: `c67612b` — implementation makes all 69 tests pass. The CI wiring is part of the red commit on purpose: previously `test-packet-filter.js` was never run by CI, so a frontend filter regression couldn't fail the build. Now it can. ## CI gating proof Run `git revert c67612b` locally → `node test-packet-filter.js` reports 5 assertion failures (not build/import errors). Re-applying the green commit returns all tests to passing. Fixes #339 --------- Co-authored-by: openclaw-bot <bot@openclaw.local>	2026-05-03 17:40:12 -07:00
Kpa-clawbot	7aef3c355c	fix(ci): freshen fixture timestamps before E2E to avoid time-based filter exclusion (#955 ) (#957 ) ## Problem The E2E fixture DB (`test-fixtures/e2e-fixture.db`) has static timestamps from March 29, 2026. The map page applies a default `lastHeard=30d` filter, so once the fixture ages past 30 days all nodes are excluded from `/api/nodes?lastHeard=30d` — causing the "Map page loads with markers" test to fail deterministically. This started blocking all CI on ~April 28, 2026 (30 days after March 29). Closes #955 (RCA #1: time-based fixture rot) ## Fix Added `tools/freshen-fixture.sh` — a small script that shifts all `last_seen`/`first_seen` timestamps forward so the newest is near `now()`, preserving relative ordering between nodes. Runs in CI before the Go server starts. Does not modify the checked-in fixture (no binary blob churn). ## Verification ``` $ cp test-fixtures/e2e-fixture.db /tmp/fix4.db $ bash tools/freshen-fixture.sh /tmp/fix4.db Fixture timestamps freshened in /tmp/fix4.db nodes: min=2026-05-01T07:10:00Z max=2026-05-01T14:51:33Z $ ./corescope-server -port 13585 -db /tmp/fix4.db -public public & $ curl -s "http://localhost:13585/api/nodes?limit=200&lastHeard=30d" \| jq '{total, count: (.nodes \| length)}' { "total": 200, "count": 200 } ``` All 200 nodes returned with the 30-day filter after freshening (vs 0 without the fix). Co-authored-by: you <you@example.com>	2026-05-01 08:06:19 -07:00
Kpa-clawbot	57e272494d	feat(server): /api/healthz readiness endpoint gated on store load (#955 ) (#956 ) ## Summary Fixes RCA #2 from #955: the HTTP listener and `/api/stats` go live before background goroutines (pickBestObservation, neighbor graph build) finish, causing CI readiness checks to pass prematurely. ## Changes 1. `cmd/server/healthz.go` — New `GET /api/healthz` endpoint: - Returns `503 {"ready":false,"reason":"loading"}` while background init is running - Returns `200 {"ready":true,"loadedTx":N,"loadedObs":N}` once ready 2. `cmd/server/main.go` — Added `sync.WaitGroup` tracking pickBestObservation and neighbor graph build goroutines. A coordinator goroutine sets `readiness.Store(1)` when all complete. `backfillResolvedPathsAsync` is NOT gated (async by design, can take 20+ min). 3. `cmd/server/routes.go` — Wired `/api/healthz` before system endpoints. 4. `.github/workflows/deploy.yml` — CI wait-for-ready loop now polls `/api/healthz` instead of `/api/stats`. 5. `cmd/server/healthz_test.go` — Tests for 503-before-ready, 200-after-ready, JSON shape, and anti-tautology gate. ## Rule 18 Verification Built and ran against `test-fixtures/e2e-fixture.db` (499 tx): - With the small fixture DB, init completes in <300ms so both immediate and delayed curls return 200 - Unit tests confirm 503 behavior when `readiness=0` (simulating slow init) - On production DBs with 100K+ txs, the 503 window would be 5-15s (pickBestObservation processes in 5000-tx chunks with 10ms yields) ## Test Results ``` === RUN TestHealthzNotReady --- PASS === RUN TestHealthzReady --- PASS === RUN TestHealthzAntiTautology --- PASS ok github.com/corescope/server 19.662s (full suite) ``` Co-authored-by: you <you@example.com>	2026-05-01 07:55:57 -07:00
Kpa-clawbot	d81852736d	ci: re-enable staging deploy now that VM is back (#932 ) Reverts the `if: false` guard from #908. ## Why - Azure subscription was blocked, staging VM `meshcore-runner-2` deallocated. - Subscription unblocked, VM started, runner online, smoke CI [run #25117292530](https://github.com/Kpa-clawbot/CoreScope/actions/runs/25117292530) passed. - Time to resume automatic staging deploys on master pushes. ## Changes - `deploy` job: `if: false` → `if: github.event_name == 'push'` (original condition from before #908). - `publish` job: `needs: [build-and-publish]` → `needs: [deploy]` (original wiring restored). ## Verify after merge - Next master push triggers the full chain: go-test → e2e-test → build-and-publish → deploy → publish. - `docker ps` on staging VM shows `corescope-staging-go` updated to the new commit. Co-authored-by: you <you@example.com>	2026-04-30 19:40:51 -07:00
Kpa-clawbot	f4484adb52	ci: move to GitHub-hosted runners, disable staging deploy (#908 ) ## Why The Azure staging VM (`meshcore-vm`) is offline. Self-hosted runners are unavailable, blocking all CI. ## What changed (per job) \| Job \| Change \| Revert \| \|-----\|--------\|--------\| \| `e2e-test` \| `runs-on: [self-hosted, Linux]` → `ubuntu-latest`; removed self-hosted-specific "Free disk space" step \| Change `runs-on` back to `[self-hosted, Linux]`, restore disk cleanup step \| \| `build-and-publish` \| `runs-on: [self-hosted, meshcore-runner-2]` → `ubuntu-latest`; removed "Free disk space" prune step (noop on fresh GH-hosted runners) \| Change `runs-on` back, restore prune step \| \| `deploy` \| `if: false # disabled` (was `github.event_name == 'push'`); `runs-on` kept as-is \| Change `if:` back to `github.event_name == 'push'` \| \| `publish` \| `runs-on: [self-hosted, Linux]` → `ubuntu-latest`; `needs: [deploy]` → `needs: [build-and-publish]` \| Change both back \| ## Notes - `go-test` and `release-artifacts` were already on `ubuntu-latest` — untouched. - The `deploy` job is disabled via `if: false` for trivial one-line revert when the VM returns. - No new `setup-*` actions were needed — `setup-node`, `setup-go`, `docker/setup-buildx-action`, and `docker/login-action` were already present. Co-authored-by: you <you@example.com>	2026-04-24 17:25:53 -07:00
Kpa-clawbot	99029e41aa	ci(#768 ): publish multi-arch (amd64+arm64) Docker image (#869 ) ## Problem `docker pull` on ARM devices fails because the published image is amd64-only. ## Fix Enable multi-arch Docker builds via `docker buildx`. Builder stage uses native Go cross-compilation; only the runtime-stage `RUN` steps use QEMU emulation. ### Changes \| File \| Change \| \|------\|--------\| \| `Dockerfile` \| Pin builder stage to `--platform=$BUILDPLATFORM` (always native), accept `ARG TARGETOS`/`ARG TARGETARCH` from buildx, set `GOOS=$TARGETOS GOARCH=$TARGETARCH CGO_ENABLED=0` on every `go build` \| \| `.github/workflows/deploy.yml` \| Add `docker/setup-buildx-action@v3` + `docker/setup-qemu-action@v3` (latter needed only for runtime-stage RUNs), set `platforms: linux/amd64,linux/arm64` \| ### Build architecture - Builder stage (`FROM --platform=$BUILDPLATFORM golang:1.22-alpine`) — runs natively on amd64. Go toolchain cross-compiles the binaries to `$TARGETARCH` via `GOOS/GOARCH`. No emulation, ~10× faster than emulated builds. Works because `modernc.org/sqlite` is pure Go (no CGO). - Runtime stage (`FROM alpine:3.20`) — buildx pulls the per-arch base. RUN steps (`apk add`, `mkdir/chown`, `chmod`) execute inside the target-arch image, so QEMU is required to interpret arm64 binaries on the amd64 host. Only a handful of short shell commands run under emulation, so the QEMU cost is small. ### Verify After merge, on an ARM device: ```bash docker pull ghcr.io/kpa-clawbot/corescope:edge docker inspect ghcr.io/kpa-clawbot/corescope:edge --format '{{.Architecture}}' # → arm64 ``` > First arm64 image appears on the next push to master after this merges. Closes #768 --------- Co-authored-by: you <you@example.com> Co-authored-by: Kpa-clawbot <agent@corescope.local>	2026-04-21 10:32:02 -07:00
Kpa-clawbot	ff05db7367	ci: fix staging smoke test port — read STAGING_GO_HTTP_PORT, not hardcoded 82 (#854 ) ## Problem The "Deploy Staging" job's Smoke Test always fails with `Staging /api/stats did not return engine field`. Root cause: the step hardcodes `http://localhost:82/api/stats`, but `docker-compose.staging.yml:21` publishes the container on `${STAGING_GO_HTTP_PORT:-80}:80`. Default is port 80, not 82. curl gets ECONNREFUSED, `-sf` swallows the error, `grep -q engine` sees empty input → failure. Verified on staging VM: `ss -lntp` shows only `:80` listening; `docker ps` confirms `0.0.0.0:80->80/tcp`. A `curl http://localhost:82` returns connection-refused. ## Fix Read `STAGING_GO_HTTP_PORT` (same default as compose) so the smoke test tracks the port the container was actually launched on. Failure message now includes the resolved port to make future port mismatches self-diagnosing. ## Tested Logic only — the curl + grep pattern is unchanged. If any CI env override sets `STAGING_GO_HTTP_PORT`, the smoke test now follows it. Co-authored-by: Kpa-clawbot <agent@corescope.local>	2026-04-21 16:23:50 +00:00
you	fa348efe2a	fix: force-remove staging container before deploy — handles both compose and docker-run containers The deploy step used only 'docker compose down' which can't remove containers created via 'docker run'. Now explicitly stops+removes the named container first, then runs compose down as cleanup. Permanent fix for the recurring CI deploy failure.	2026-04-17 05:08:32 +00:00
Kpa-clawbot	c233c14156	feat: CLI tool to decrypt and export hashtag channel messages (#724 ) ## Summary Adds `corescope-decrypt` — a standalone CLI tool that decrypts and exports MeshCore hashtag channel messages from a CoreScope SQLite database. ### What it does MeshCore hashtag channels use symmetric encryption with keys derived from the channel name. The CoreScope ingestor stores all GRP_TXT packets, even those it can't decrypt. This tool enables retroactive decryption — decrypt historical messages for any channel whose name you learn after the fact. ### Architecture - `internal/channel/` — Shared crypto package extracted from ingestor logic: - `DeriveKey()` — `SHA-256("#name")[:16]` - `ChannelHash()` — 1-byte packet filter (`SHA-256(key)[0]`) - `Decrypt()` — HMAC-SHA256 MAC verify + AES-128-ECB - `ParsePlaintext()` — timestamp + flags + "sender: message" parsing - `cmd/decrypt/` — CLI binary with three output formats: - `--format json` — Full metadata (observers, path, raw hex) - `--format html` — Self-contained interactive viewer with search/sort - `--format irc` (or `log`) — Plain-text IRC-style log, greppable ### Usage ```bash # JSON export corescope-decrypt --channel "#wardriving" --db meshcore.db # Interactive HTML viewer corescope-decrypt --channel wardriving --db meshcore.db --format html --output wardriving.html # Greppable log corescope-decrypt --channel "#wardriving" --db meshcore.db --format irc \| grep "KE6QR" # From Docker docker exec corescope-prod /app/corescope-decrypt --channel "#wardriving" --db /app/data/meshcore.db ``` ### Build & deployment - Statically linked (`CGO_ENABLED=0`) — zero dependencies - Added to Dockerfile (available at `/app/corescope-decrypt` in container) - CI: builds and tests in go-test job - CI: attaches linux/amd64 and linux/arm64 binaries to GitHub Releases on tags ### Testing - `internal/channel/` — 9 tests: key derivation, encrypt/decrypt round-trip, MAC rejection, wrong-channel rejection, plaintext parsing - `cmd/decrypt/` — 7 tests: payload extraction, channel hash consistency, all 3 output formats, JSON parseability, fixture DB integration - Verified against real fixture DB: successfully decrypts 17 `#wardriving` messages ### Limitations - Hashtag channels only (name-derived keys). Custom PSK channels not supported. - No DM decryption (asymmetric, per-peer keys). - Read-only database access. Fixes #723 --------- Co-authored-by: you <you@example.com>	2026-04-12 22:07:41 -07:00
you	00953207fb	ci: remove arm64 build + QEMU — amd64 only Removes linux/arm64 from multi-platform build and drops QEMU setup. All infra (prod + staging) is x86. QEMU emulation was adding ~12min to every CI run for an unused architecture.	2026-04-08 05:23:41 +00:00
Kpa-clawbot	243de9fba1	fix: consolidate CI pipeline — build, publish to GHCR, then deploy staging (#636 ) ## Consolidate CI Pipeline — Build + Publish to GHCR + Deploy Staging ### What Merges the separate `publish.yml` workflow into `deploy.yml`, creating a single CI/CD pipeline: `go-test → e2e-test → build-and-publish → deploy → publish-badges` ### Why - Two workflows doing overlapping builds was wasteful and error-prone - `publish.yml` had a bug: `BUILD_TIME=$(date ...)` in a `with:` block never executed (literal string) - The old build job had duplicate/conflicting `APP_VERSION` assignments ### Changes - `build-and-publish` job replaces old `build` job — builds locally for staging, then does multi-arch GHCR push (gated to push events only, PRs skip) - Build metadata computed in a dedicated step, passed via `GITHUB_OUTPUT` — no more shell expansion bugs - `APP_VERSION` is `v1.2.3` on tag push, `edge` on master push - Deploy now pulls the `edge` image from GHCR and tags for compose compatibility, with fallback to local build - `publish.yml` deleted — no duplicate workflow - Top-level `permissions` block with `packages:write` for GHCR auth - Triggers now include `tags: ['v*']` for release publishing ### Status - ✅ Rebased onto master - ✅ Self-reviewed (all checklist items pass) - ✅ Ready for merge Co-authored-by: you <you@example.com>	2026-04-05 18:09:20 -07:00
you	af9754dbea	ci: move staging build+deploy to meshcore-runner-2 Prod VM (meshcore-vm) is now prod-only. Staging builds and deploys on the secondary runner.	2026-04-05 17:33:15 +00:00
you	ddce26ff2d	ci: pin build and deploy jobs to meshcore-vm runner	2026-04-04 04:21:48 +00:00
Kpa-clawbot	9f14c74b3e	ci: add Docker cleanup before build to prevent disk space exhaustion (#473 ) ## Summary Fixes #472 The Docker build job on the self-hosted runner fails with `no space left on device` because Docker build cache and Go module downloads accumulate between runs. The existing cleanup (line ~330) runs in the deploy step after the build — too late to help. ## Changes - Added a "Free disk space" step at the start of the build job, before "Build Go Docker image": - `docker system prune -af` — removes all unused images, containers, networks - `docker builder prune -af` — clears the build cache - `df -h /` — logs available disk space for visibility - Kept the existing post-deploy cleanup as belt-and-suspenders --------- Co-authored-by: you <you@example.com> Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>	2026-04-01 22:27:12 -07:00
Kpa-clawbot	38e5f02a00	ci: add Docker image cleanup to prevent runner disk exhaustion (#333 ) ## Problem The self-hosted runner (`meshcore-runner-2`) filled its 29GB disk to 100%, blocking all CI runs: ``` Filesystem Size Used Avail Use% /dev/root 29G 29G 2.3M 100% Docker Images: 67 total, 2 active, 18.83GB reclaimable (99%) ``` Root cause: no Docker image cleanup after builds. Each CI run builds a new image but never prunes old ones. ## Fix ### 1. Docker image cleanup after deploy (`deploy` job) - Runs with `if: always()` so it executes even if deploy fails - `docker image prune -af --filter "until=24h"` — removes images older than 24h (safe: current build is minutes old) - `docker builder prune -f --keep-storage=1GB` — caps build cache - Logs before/after `docker system df` for visibility ### 2. Runner log cleanup at start of E2E job - Prunes runner diagnostic logs older than 3 days (was 53MB and growing) - Reports `df -h` for disk visibility in CI output ## Impact After manual cleanup today, disk went from 100% → 35% (19GB free). This PR prevents recurrence. ## Test plan - [x] Manual cleanup verified on runner via `az vm run-command` - [ ] Next CI run should show cleanup step output in deploy job logs Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-31 16:00:31 -07:00
Kpa-clawbot	5aa4fbb600	chore: normalize all files to LF line endings	2026-03-30 22:52:46 -07:00
Kpa-clawbot	92188e8c12	ci: add manual workflow_dispatch trigger (#302 ) Adds `workflow_dispatch` trigger to the CI/CD pipeline so it can be manually triggered from the Actions tab. Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-30 19:28:46 -07:00
Kpa-clawbot	16a99159cc	fix: config.json lives in data dir, not bind-mounted as file (#282 ) Removes the separate config.json file bind mount from both compose files. The data directory mount already covers it, and the Go server searches /app/data/config.json via LoadConfig. - Entrypoint symlinks /app/data/config.json for ingestor compatibility - manage.sh setup creates config in data dir, prompts admin if missing - manage.sh start checks config exists before starting, offers to create - deploy.yml simplified — no more sudo rm or directory cleanup - Backup/restore updated to use data dir path Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-30 18:55:44 +00:00
Kpa-clawbot	61ff72fc80	Revert "fix: config.json lives in data dir, not bind-mounted as file" This reverts commit `57ebd76070`.	2026-03-30 10:00:17 -07:00
Kpa-clawbot	57ebd76070	fix: config.json lives in data dir, not bind-mounted as file Removes the separate config.json file bind mount from both compose files. The data directory mount already covers it, and the Go server searches /app/data/config.json via LoadConfig. - Entrypoint symlinks /app/data/config.json for ingestor compatibility - manage.sh setup creates config in data dir, prompts admin if missing - manage.sh start checks config exists before starting, offers to create - deploy.yml simplified — no more sudo rm or directory cleanup - Backup/restore updated to use data dir path Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-30 09:58:22 -07:00
Kpa-clawbot	726b041740	fix: staging config.json directory mount and wrong source file Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-30 09:16:07 -07:00
Kpa-clawbot	4cbb66d8e9	ci: fix badge publish — use admin PAT via Contents API to bypass branch protection [skip ci]	2026-03-29 20:03:59 -07:00
Kpa-clawbot	5777780fc8	refactor: parallel coverage collector (~30-60s vs 8min) (#272 ) ## Summary Redesigned frontend coverage collector with 7 parallel browser contexts. Coverage collector runs on master pushes only (skipped on PRs). ### Architecture 7 groups run simultaneously via `Promise.allSettled()`: - G1: Home + Customizer - G2: Nodes + Node Detail - G3: Packets + Packet Detail - G4: Map - G5: Analytics + Channels + Observers - G6: Live + Perf + Traces + Globals - G7: Utility functions (page.evaluate) ### Speed gains - `safeClick` 500ms → 100ms - `navHash` 150ms → 50ms - Removed redundant page visits and E2E-duplicate interactions - Wall time = slowest group (~30-60s estimated) ### 821 lines → ~450 lines Each group writes its own coverage JSON, nyc merges automatically. ### CI behavior - PRs: Coverage collector skipped (fast CI) - Master: Coverage collector runs (full synthetic user validation) Co-authored-by: you <you@example.com>	2026-03-29 19:46:01 -07:00
Kpa-clawbot	ada53ff899	ci: fix badge artifacts not uploading (include-hidden-files for .badges/)	2026-03-30 01:38:31 +00:00
you	3dd68d4418	fix: staging deploy failures — OOM + config.json directory mount Root causes from CI logs: 1. 'read /app/config.json: is a directory' — Docker creates a directory when bind-mounting a non-existent file. The entrypoint now detects and removes directory config.json before falling back to example. 2. 'unable to open database file: out of memory (14)' — old container (3GB) not fully exited when new one starts. Deploy now uses 'docker compose down' with timeout and waits for memory reclaim. 3. Supervisor gave up after 3 fast retries (FATAL in ~6s). Increased startretries to 10 and startsecs to 2 for server and ingestor. Additional: - Deploy step ensures staging config.json exists before starting - Healthcheck: added start_period=60s, increased timeout and retries - No longer uses manage.sh (CI working dir != repo checkout dir)	2026-03-29 23:16:46 +00:00
Kpa-clawbot	900cbf6392	fix: deploy uses manage.sh restart staging instead of raw compose Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 14:06:37 -07:00
Kpa-clawbot	067b101e14	fix: split prod/staging compose and harden deploy/manage staging control Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 14:01:29 -07:00
Kpa-clawbot	c271093795	fix: use docker compose down (not stop) to properly tear down staging stop leaves the container/network in place, blocking port rebind. down removes everything cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 12:53:18 -07:00
Kpa-clawbot	424e4675ae	ci: restrict staging deploy container cleanup Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 12:42:31 -07:00
Kpa-clawbot	fd162a9354	fix: CI kills legacy meshcore-* containers before deploy (#261 ) Old meshcore-analyzer container still running from pre-rename era. Freed 2.2GB by killing it. CI now cleans up both old and new container names. Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 12:30:13 -07:00
Kpa-clawbot	075dcaed4d	fix: CI staging OOM — wait for old container before starting new (#259 ) Old staging container wasn't fully stopped before new one started. Both loaded 300MB stores simultaneously → OOM. Now properly waits and verifies. Ref: https://github.com/Kpa-clawbot/CoreScope/actions/runs/23716535123/job/69084603590 Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 12:08:56 -07:00
you	2817877380	ci: pass BUILD_TIME to Docker build	2026-03-29 18:55:37 +00:00
you	251b7fa5c2	ci: rename frontend-tests badge to e2e-tests in README, remove copy hack	2026-03-29 18:49:01 +00:00
you	f31e0b42a0	ci: clean up stale badges, add Go coverage placeholders, fix frontend-tests.json name	2026-03-29 18:48:04 +00:00
you	78e0347055	ci: fix staging deploy — only stop staging container, don't nuke prod	2026-03-29 18:46:33 +00:00
you	8ab195b45f	ci: fix Go cache warnings on E2E step + fix staging deploy OOM (proper container cleanup)	2026-03-29 18:45:50 +00:00
you	6c7a3c1614	ci: clean Go module cache before setup to prevent tar extraction warnings	2026-03-29 18:37:59 +00:00
you	a5a3a85fc0	ci: disable coverage collector — E2E extracts window.__coverage__ directly	2026-03-29 18:33:46 +00:00
Kpa-clawbot	ec7ae19bb5	ci: restructure pipeline — sequential fail-fast, Go server E2E, remove deprecated JS tests (#256 ) ## Summary Complete CI pipeline restructure. Sequential fail-fast chain, E2E tests against Go server with real staging data, all deprecated Node.js server tests removed. ### Pipeline (PR): 1. Go unit tests — fail-fast, coverage + badges 2. Playwright E2E — against Go server with fixture DB, frontend coverage, fail-fast on first failure 3. Docker build — verify containers build ### Pipeline (master merge): Same chain + deploy to staging + badge publishing ### Removed: - All Node.js server-side unit tests (deprecated JS server) - `npm ci` / `npm run test` steps - JS server coverage collection (`COVERAGE=1 node server.js`) - Changed-files detection logic - Docs-only CI skip logic - Cancel-workflow API hacks ### Added: - `test-fixtures/e2e-fixture.db` — real data from staging (200 nodes, 31 observers, 500 packets) - `scripts/capture-fixture.sh` — refresh fixture from staging API - Go server launches with `-port 13581 -db test-fixtures/e2e-fixture.db -public public-instrumented` --------- Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com> Co-authored-by: you <you@example.com>	2026-03-29 11:24:22 -07:00
you	75637afcc8	ci: upgrade upload/download-artifact to v6 (Node.js 24)	2026-03-29 18:05:03 +00:00
you	97486cfa21	ci: temporarily disable node-test job (CI restructure in progress)	2026-03-29 17:32:07 +00:00
you	bb43b5696c	ci: use Go server instead of Node.js for E2E tests The Playwright E2E tests were starting `node server.js` (the deprecated JS server) instead of the Go server, meaning E2E tests weren't testing the production backend at all. Changes: - Add Go 1.22 setup and build steps to the node-test job - Build the Go server binary before E2E tests run - Replace `node server.js` with `./corescope-server` in both the instrumented (coverage) and quick (no-coverage) E2E server starts - Use `-port 13581` and `-public` flags to configure the Go server - For coverage runs, serve from `public-instrumented/` directory The Go server serves the same static files and exposes compatible /api/* routes (stats, packets, health, perf) that the E2E tests hit.	2026-03-29 10:22:26 -07:00
Kpa-clawbot	5bb9bc146e	docs: remove letsmesh.net reference from README (#233 ) * docs: remove letsmesh.net reference from README Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ci: remove paths-ignore from pull_request trigger PR #233 only touches .md files, which were excluded by paths-ignore, causing CI to be skipped entirely. Remove paths-ignore from the pull_request trigger so all PRs get validated. Keep paths-ignore on push to avoid unnecessary deploys for docs-only changes to master. * ci: skip heavy CI jobs for docs-only PRs Instead of using paths-ignore (which skips the entire workflow and blocks required status checks), detect docs-only changes at the start of each job and skip heavy steps while still reporting success. This allows doc-only PRs to merge without waiting for Go builds, Node.js tests, or Playwright E2E runs. Reverts the approach from 7546ece (removing paths-ignore entirely) in favor of a proper conditional skip within the jobs themselves. * fix: update engine tests to match engine-badge HTML format Tests expected [go]/[node] text but formatVersionBadge now renders <span class="engine-badge">go</span>. Updated 6 assertions to check for engine-badge class and engine name in HTML output. --------- Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com> Co-authored-by: you <you@example.com>	2026-03-29 16:25:51 +00:00
you	12d1174e39	perf: speed up frontend coverage tests (~3x faster) Three optimizations to the CI frontend test pipeline: 1. Run E2E tests and coverage collection concurrently - Previously sequential (E2E ~1.5min, then coverage ~5.75min) - Now both run in parallel against the same instrumented server - Expected savings: ~5 min (coverage runs alongside E2E instead of after) 2. Replace networkidle with domcontentloaded in coverage collector - SPA uses hash routing — networkidle waits 500ms for network silence on every navigation, adding ~10-15s of dead time across 23 navigations - domcontentloaded fires immediately once HTML is parsed; JS initializes the route handler synchronously - For in-page hash changes, use 200ms setTimeout instead of waitForLoadState (which would never re-fire for same-document nav) 3. Extract coverage from E2E tests too - E2E tests already exercise the app against the instrumented server - Now writes window.__coverage__ to .nyc_output/e2e-coverage.json - nyc merges both coverage files for higher total coverage Also: - Split Playwright install into browser + deps steps (deps skip if present) - Replace sleep 5 with health-check poll in quick E2E path	2026-03-29 09:12:23 -07:00
you	1b09c733f5	ci: restrict self-hosted jobs to Linux runners The Windows self-hosted runner picks up jobs and fails because bash scripts run in PowerShell. Node.js tests need Chromium/Playwright (Linux-only), and build/deploy/publish use Docker (Linux-only). Changes: - node-test: runs-on: [self-hosted, Linux] - build: runs-on: [self-hosted, Linux] - deploy: runs-on: [self-hosted, Linux] - publish: runs-on: [self-hosted, Linux] - go-test: unchanged (ubuntu-latest)	2026-03-29 14:58:15 +00:00
Kpa-clawbot	553c0e4963	ci: bump GitHub Actions to Node 24 compatible versions checkout v4→v5, setup-go v5→v6, setup-node v4→v5, upload-artifact v4→v5, download-artifact v4→v5 Fixes the Node.js 20 deprecation warning. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 07:51:48 -07:00
you	074f3d3760	ci: cancel workflow run immediately when any test job fails When go-test or node-test fails, the workflow run is now cancelled via the GitHub API so the sibling job doesn't sit queued/running. Also fixed build job to need both go-test AND node-test (was only waiting on go-test despite the pipeline comment saying both gate it).	2026-03-29 14:20:22 +00:00
you	206d9bd64a	fix: use per-PR concurrency group to prevent cross-PR cancellation The flat 'deploy' concurrency group caused ALL PRs to share one queue, so pushing to any PR would cancel CI runs on other PRs. Changed to deploy-${{ github.event.pull_request.number \|\| github.ref }} so each PR gets its own concurrency group while re-pushes to the same PR still cancel the previous run.	2026-03-29 14:14:57 +00:00

1 2 3

107 Commits