85 Commits

Author SHA1 Message Date
Kpa-clawbot
4cbb66d8e9 ci: fix badge publish — use admin PAT via Contents API to bypass branch protection [skip ci] 2026-03-29 20:03:59 -07:00
Kpa-clawbot
5777780fc8 refactor: parallel coverage collector (~30-60s vs 8min) (#272)
## Summary

Redesigned frontend coverage collector with 7 parallel browser contexts.
Coverage collector runs on master pushes only (skipped on PRs).

### Architecture
7 groups run simultaneously via `Promise.allSettled()`:
- G1: Home + Customizer
- G2: Nodes + Node Detail
- G3: Packets + Packet Detail
- G4: Map
- G5: Analytics + Channels + Observers
- G6: Live + Perf + Traces + Globals
- G7: Utility functions (page.evaluate)

### Speed gains
- `safeClick` 500ms → 100ms
- `navHash` 150ms → 50ms
- Removed redundant page visits and E2E-duplicate interactions
- Wall time = slowest group (~30-60s estimated)

### 821 lines → ~450 lines
Each group writes its own coverage JSON, nyc merges automatically.

### CI behavior
- **PRs:** Coverage collector skipped (fast CI)
- **Master:** Coverage collector runs (full synthetic user validation)

Co-authored-by: you <you@example.com>
2026-03-29 19:46:01 -07:00
Kpa-clawbot
ada53ff899 ci: fix badge artifacts not uploading (include-hidden-files for .badges/) 2026-03-30 01:38:31 +00:00
Kpa-clawbot
54e39c241d chore: add squad agent, workflows, and gitattributes
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 18:32:22 -07:00
you
3dd68d4418 fix: staging deploy failures — OOM + config.json directory mount
Root causes from CI logs:
1. 'read /app/config.json: is a directory' — Docker creates a directory
   when bind-mounting a non-existent file. The entrypoint now detects
   and removes directory config.json before falling back to example.
2. 'unable to open database file: out of memory (14)' — old container
   (3GB) not fully exited when new one starts. Deploy now uses
   'docker compose down' with timeout and waits for memory reclaim.
3. Supervisor gave up after 3 fast retries (FATAL in ~6s). Increased
   startretries to 10 and startsecs to 2 for server and ingestor.

Additional:
- Deploy step ensures staging config.json exists before starting
- Healthcheck: added start_period=60s, increased timeout and retries
- No longer uses manage.sh (CI working dir != repo checkout dir)
2026-03-29 23:16:46 +00:00
Kpa-clawbot
900cbf6392 fix: deploy uses manage.sh restart staging instead of raw compose
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 14:06:37 -07:00
Kpa-clawbot
067b101e14 fix: split prod/staging compose and harden deploy/manage staging control
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 14:01:29 -07:00
Kpa-clawbot
c271093795 fix: use docker compose down (not stop) to properly tear down staging
stop leaves the container/network in place, blocking port rebind.
down removes everything cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 12:53:18 -07:00
Kpa-clawbot
424e4675ae ci: restrict staging deploy container cleanup
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 12:42:31 -07:00
Kpa-clawbot
fd162a9354 fix: CI kills legacy meshcore-* containers before deploy (#261)
Old meshcore-analyzer container still running from pre-rename era. Freed
2.2GB by killing it. CI now cleans up both old and new container names.

Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 12:30:13 -07:00
Kpa-clawbot
075dcaed4d fix: CI staging OOM — wait for old container before starting new (#259)
Old staging container wasn't fully stopped before new one started. Both
loaded 300MB stores simultaneously → OOM. Now properly waits and
verifies. Ref:
https://github.com/Kpa-clawbot/CoreScope/actions/runs/23716535123/job/69084603590

Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 12:08:56 -07:00
you
2817877380 ci: pass BUILD_TIME to Docker build 2026-03-29 18:55:37 +00:00
you
251b7fa5c2 ci: rename frontend-tests badge to e2e-tests in README, remove copy hack 2026-03-29 18:49:01 +00:00
you
f31e0b42a0 ci: clean up stale badges, add Go coverage placeholders, fix frontend-tests.json name 2026-03-29 18:48:04 +00:00
you
78e0347055 ci: fix staging deploy — only stop staging container, don't nuke prod 2026-03-29 18:46:33 +00:00
you
8ab195b45f ci: fix Go cache warnings on E2E step + fix staging deploy OOM (proper container cleanup) 2026-03-29 18:45:50 +00:00
you
6c7a3c1614 ci: clean Go module cache before setup to prevent tar extraction warnings 2026-03-29 18:37:59 +00:00
you
a5a3a85fc0 ci: disable coverage collector — E2E extracts window.__coverage__ directly 2026-03-29 18:33:46 +00:00
Kpa-clawbot
ec7ae19bb5 ci: restructure pipeline — sequential fail-fast, Go server E2E, remove deprecated JS tests (#256)
## Summary

Complete CI pipeline restructure. Sequential fail-fast chain, E2E tests
against Go server with real staging data, all deprecated Node.js server
tests removed.

### Pipeline (PR):
1. **Go unit tests** — fail-fast, coverage + badges
2. **Playwright E2E** — against Go server with fixture DB, frontend
coverage, fail-fast on first failure
3. **Docker build** — verify containers build

### Pipeline (master merge):
Same chain + deploy to staging + badge publishing

### Removed:
- All Node.js server-side unit tests (deprecated JS server)
- `npm ci` / `npm run test` steps
- JS server coverage collection (`COVERAGE=1 node server.js`)
- Changed-files detection logic
- Docs-only CI skip logic
- Cancel-workflow API hacks

### Added:
- `test-fixtures/e2e-fixture.db` — real data from staging (200 nodes, 31
observers, 500 packets)
- `scripts/capture-fixture.sh` — refresh fixture from staging API
- Go server launches with `-port 13581 -db test-fixtures/e2e-fixture.db
-public public-instrumented`

---------

Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
Co-authored-by: you <you@example.com>
2026-03-29 11:24:22 -07:00
you
75637afcc8 ci: upgrade upload/download-artifact to v6 (Node.js 24) 2026-03-29 18:05:03 +00:00
you
97486cfa21 ci: temporarily disable node-test job (CI restructure in progress) 2026-03-29 17:32:07 +00:00
you
bb43b5696c ci: use Go server instead of Node.js for E2E tests
The Playwright E2E tests were starting `node server.js` (the deprecated
JS server) instead of the Go server, meaning E2E tests weren't testing
the production backend at all.

Changes:
- Add Go 1.22 setup and build steps to the node-test job
- Build the Go server binary before E2E tests run
- Replace `node server.js` with `./corescope-server` in both the
  instrumented (coverage) and quick (no-coverage) E2E server starts
- Use `-port 13581` and `-public` flags to configure the Go server
- For coverage runs, serve from `public-instrumented/` directory

The Go server serves the same static files and exposes compatible
/api/* routes (stats, packets, health, perf) that the E2E tests hit.
2026-03-29 10:22:26 -07:00
Kpa-clawbot
5bb9bc146e docs: remove letsmesh.net reference from README (#233)
* docs: remove letsmesh.net reference from README

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* ci: remove paths-ignore from pull_request trigger

PR #233 only touches .md files, which were excluded by paths-ignore,
causing CI to be skipped entirely. Remove paths-ignore from the
pull_request trigger so all PRs get validated. Keep paths-ignore on
push to avoid unnecessary deploys for docs-only changes to master.

* ci: skip heavy CI jobs for docs-only PRs

Instead of using paths-ignore (which skips the entire workflow and
blocks required status checks), detect docs-only changes at the start
of each job and skip heavy steps while still reporting success.

This allows doc-only PRs to merge without waiting for Go builds,
Node.js tests, or Playwright E2E runs.

Reverts the approach from 7546ece (removing paths-ignore entirely)
in favor of a proper conditional skip within the jobs themselves.

* fix: update engine tests to match engine-badge HTML format

Tests expected [go]/[node] text but formatVersionBadge now renders
<span class="engine-badge">go</span>. Updated 6 assertions to
check for engine-badge class and engine name in HTML output.

---------

Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
Co-authored-by: you <you@example.com>
2026-03-29 16:25:51 +00:00
you
12d1174e39 perf: speed up frontend coverage tests (~3x faster)
Three optimizations to the CI frontend test pipeline:

1. Run E2E tests and coverage collection concurrently
   - Previously sequential (E2E ~1.5min, then coverage ~5.75min)
   - Now both run in parallel against the same instrumented server
   - Expected savings: ~5 min (coverage runs alongside E2E instead of after)

2. Replace networkidle with domcontentloaded in coverage collector
   - SPA uses hash routing — networkidle waits 500ms for network silence
     on every navigation, adding ~10-15s of dead time across 23 navigations
   - domcontentloaded fires immediately once HTML is parsed; JS initializes
     the route handler synchronously
   - For in-page hash changes, use 200ms setTimeout instead of
     waitForLoadState (which would never re-fire for same-document nav)

3. Extract coverage from E2E tests too
   - E2E tests already exercise the app against the instrumented server
   - Now writes window.__coverage__ to .nyc_output/e2e-coverage.json
   - nyc merges both coverage files for higher total coverage

Also:
- Split Playwright install into browser + deps steps (deps skip if present)
- Replace sleep 5 with health-check poll in quick E2E path
2026-03-29 09:12:23 -07:00
you
1b09c733f5 ci: restrict self-hosted jobs to Linux runners
The Windows self-hosted runner picks up jobs and fails because bash
scripts run in PowerShell. Node.js tests need Chromium/Playwright
(Linux-only), and build/deploy/publish use Docker (Linux-only).

Changes:
- node-test: runs-on: [self-hosted, Linux]
- build: runs-on: [self-hosted, Linux]
- deploy: runs-on: [self-hosted, Linux]
- publish: runs-on: [self-hosted, Linux]
- go-test: unchanged (ubuntu-latest)
2026-03-29 14:58:15 +00:00
Kpa-clawbot
553c0e4963 ci: bump GitHub Actions to Node 24 compatible versions
checkout v4→v5, setup-go v5→v6, setup-node v4→v5,
upload-artifact v4→v5, download-artifact v4→v5

Fixes the Node.js 20 deprecation warning.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 07:51:48 -07:00
you
074f3d3760 ci: cancel workflow run immediately when any test job fails
When go-test or node-test fails, the workflow run is now cancelled
via the GitHub API so the sibling job doesn't sit queued/running.

Also fixed build job to need both go-test AND node-test (was only
waiting on go-test despite the pipeline comment saying both gate it).
2026-03-29 14:20:22 +00:00
you
206d9bd64a fix: use per-PR concurrency group to prevent cross-PR cancellation
The flat 'deploy' concurrency group caused ALL PRs to share one queue,
so pushing to any PR would cancel CI runs on other PRs.

Changed to deploy-${{ github.event.pull_request.number || github.ref }}
so each PR gets its own concurrency group while re-pushes to the same
PR still cancel the previous run.
2026-03-29 14:14:57 +00:00
Kpa-clawbot
202d0d87d7 ci: Add pull_request trigger to CI workflow
- Add pull_request trigger for PRs against master
- Add 'if: github.event_name == push' to build/deploy/publish jobs
- Test jobs (go-test, node-test) now run on both push and PRs
- Build/deploy/publish only run on push to master

This fixes the chicken-and-egg problem where branch protection requires
CI checks but CI doesn't run on PRs. Now PRs get test validation before
merge while keeping production deployments only on master pushes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-28 15:15:35 -07:00
Kpa-clawbot
cdcaa476f2 rename: MeshCore Analyzer → CoreScope (Phase 1 — backend + infra)
Rename product branding, binary names, Docker images, container names,
Go modules, proto go_package, CI, manage.sh, and documentation.

Preserved (backward compat):
- meshcore.db database filename
- meshcore-data / meshcore-staging-data directory paths
- MQTT topics (meshcore/#, meshcore/+/+/packets, etc.)
- proto package namespace (meshcore.v1)
- localStorage keys

Changes by category:
- Go modules: github.com/corescope/{server,ingestor}
- Binaries: corescope-server, corescope-ingestor
- Docker images: corescope:latest, corescope-go:latest
- Containers: corescope-prod, corescope-staging, corescope-staging-go
- Supervisord programs: corescope, corescope-server, corescope-ingestor
- Branding: siteName, heroTitle, startup logs, fallback HTML
- Proto go_package: github.com/corescope/proto/v1
- CI: container refs, deploy path
- Docs: 8 markdown files updated

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-28 14:08:15 -07:00
Kpa-clawbot
aa2e8ed420 ci: remove Node deploy steps, update badges for Go
- Remove build-node and deploy-node jobs (Node staging on port 81)
- Rename build-go → build and deploy-go → deploy
- Update publish job to depend only on deploy (not deploy-node)
- Update README badges to show Go coverage (server/ingestor) instead of Node backend
- Remove Node staging references from deployment summary
- node-test job remains (frontend tests + Playwright)

Pipeline is now: node-test + go-test → build → deploy → publish

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-28 01:59:31 -07:00
Kpa-clawbot
11fee9526d Fix CI failures: increase Go health timeout to 120s, make WS capture non-blocking, clean stale ports/containers
Problem 1 (Go staging timeout): Increased healthcheck from 60s to 120s to allow 50K+ packets to load into memory.

Problem 2 (Node staging timeout): Added forced cleanup of stale containers, volumes, and ports before starting staging containers to prevent conflicts.

Problem 3 (Proto validation WS timeout): Made WebSocket message capture non-blocking using timeout command. If no live packets are available, it now skips with a warning instead of failing the entire proto validation pipeline.

Problem 4 (Playwright E2E failures): Added forced cleanup of stale server on port 13581 before starting test server, plus better diagnostics on failure.

All health checks now include better logging (tail 50 instead of 30 lines) for debugging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-28 00:57:18 -07:00
Kpa-clawbot
387818ae6b Fix #199 (CI): Go test failures now fail the pipeline
Added 'set -e -o pipefail' to both Go test steps. Without pipefail, the exit code from 'go test' was being lost when piped to tee, causing test failures to appear as successes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 22:04:58 -07:00
Kpa-clawbot
a48b09f4e0 fix: broken CI YAML — inline Python at column 1 broke YAML parser
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 21:46:48 -07:00
Kpa-clawbot
9d7a3eb2d1 feat: capture one fixture per packet type (fixes #177)
Add per-payload-type packet detail fixtures captured from production:
- packet-type-advert.json (payload_type=4, ADVERT)
- packet-type-grptxt-decrypted.json (payload_type=5, decrypted GRP_TXT)
- packet-type-grptxt-undecrypted.json (payload_type=5, decryption_failed GRP_TXT)
- packet-type-txtmsg.json (payload_type=1, TXT_MSG)
- packet-type-req.json (payload_type=0, REQ)

Update validate-protos.py to validate all 5 new fixtures against
PacketDetailResponse proto message.

Update CI deploy workflow to automatically capture per-type fixtures
on each deploy, including both decrypted and undecrypted GRP_TXT.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 18:19:55 -07:00
Kpa-clawbot
4a6ac482e6 ci: fix proto syntax check command — fixes #173
The proto validation infrastructure was added in commit e70ba44 but used
an invalid --syntax_check flag. Changed to use --descriptor_set_out=/dev/null
which validates syntax without generating files.

Proto validation flow (now complete):
1. go-test job: verify .proto files compile (syntax check) 
2. deploy-node job: validate protos match prod API responses 

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 15:43:18 -07:00
Kpa-clawbot
e70ba440c0 security: scrub PII — remove real name and IP from committed files
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 15:41:38 -07:00
Kpa-clawbot
6ec23acfc8 Fix CI: Add Node.js setup to build-node job
The build-node job was failing with 'node: not found' because it
runs scripts/validate.sh (which uses 'node -c' for syntax checking)
but didn't have the actions/setup-node@v4 step.

Added Node.js 22 setup before the validate step to match the pattern
used in other jobs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 15:39:12 -07:00
Kpa-clawbot
b2dc02ee11 fix: capture proto fixtures from prod (stable reference), not staging
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 15:03:40 -07:00
Kpa-clawbot
d938e27abb ci: Capture all 33 proto fixtures including dynamic ID endpoints
Previously only captured 19 simple endpoints. Now captures all 33:
- 19 simple endpoints (stats, health, nodes, etc.)
- 14 dynamic ID endpoints (node-detail, packet-detail, etc.)

Dynamic ID resolution:
- Extracts real pubkey from /api/nodes for node detail endpoints
- Extracts real hash from /api/packets for packet-detail
- Extracts real observer ID from /api/observers for observer endpoints
- Gracefully skips fixtures if DB is empty (no data yet)

WebSocket capture:
- Uses node -e with ws module to capture one live WS message
- Falls back gracefully if no live packets available

The validator already handles missing fixtures without failing, so this
will work even when staging container has no data yet.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 15:01:56 -07:00
Kpa-clawbot
dc57168d96 ci: add proto validation step to catch API contract drift
Added a CI step that:
- Refreshes Node fixtures from the staging container after deployment
- Runs tools/validate-protos.py to validate proto definitions match actual API responses
- Fails the pipeline if proto drift is detected

This ensures nobody can merge a Node change that breaks the Go proto contract
without updating the .proto definitions.

The step runs after the Node staging healthcheck, capturing fresh responses
from 19 API endpoints (stats, health, nodes, analytics/*, config/*, etc.).
Endpoints requiring parameters (node-detail, packet-detail) use existing
fixtures and aren't auto-refreshed.

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;
2026-03-27 14:57:21 -07:00
Kpa-clawbot
385d2ae578 ci: split pipeline into two independent tracks (Node + Go)
- build-node depends only on node-test
- build-go depends only on go-test
- deploy-node depends only on build-node
- deploy-go depends only on build-go
- publish job waits for both deploy-node and deploy-go to complete
- Badges and deployment summary moved to final publish step

Result: Go staging no longer waits for Node tests to complete.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 13:36:45 -07:00
Kpa-clawbot
85047eab08 ci: deploy-go no longer waits for node-test or deploy-node
Go staging now deploys immediately after build completes, in parallel
with Node staging. Both test suites still gate the build job.

Before:
  go-test + node-test → build → deploy-node → deploy-go

After:
  go-test + node-test → build → deploy-node (parallel)
                                 deploy-go  (parallel)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 13:34:22 -07:00
Kpa-clawbot
2d17f91639 ci: fix 3 deploy.yml warnings (Node24, Go cache, badge artifacts)
- Add FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 env var for Node.js 20 deprecation
- Add cache-dependency-path for go.sum files in cmd/server and cmd/ingestor
- Add if-no-files-found: ignore to go-badges upload-artifact step

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 13:28:35 -07:00
Kpa-clawbot
5539bc9fde ci: restructure deploy.yml into 5 clear jobs with readable step names
Split the monolithic 3-job pipeline (go-build, test, deploy) into 5
focused jobs that each do ONE thing:

  go-test      - Go Build & Test (coverage badges, runs on ubuntu-latest)
  node-test    - Node.js Tests (backend + Playwright E2E, coverage)
  build        - Build Docker Images (Node + Go, badge publishing)
  deploy-node  - Deploy Node Staging (port 81, healthcheck, smoke test)
  deploy-go    - Deploy Go Staging (port 82, healthcheck, smoke test)

Dependency chain: go-test + node-test (parallel) -> build -> deploy-node -> deploy-go

Every step now has a human-readable name describing exactly what it does.
Job names include emoji for visual scanning on GitHub Actions.
All existing functionality preserved - just reorganized for clarity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 13:24:29 -07:00
Kpa-clawbot
7bd14dce6a fix: run go tool cover from module directory, not repo root
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 10:21:59 -07:00
Kpa-clawbot
7807063967 ci: add Go test coverage reporting to CI pipeline
- Go server and ingestor tests now run with -coverprofile
- Coverage percentages parsed and printed in CI output
- Badge JSON files generated (.badges/go-server-coverage.json,
  .badges/go-ingestor-coverage.json) matching existing format
- Badges uploaded as artifacts from go-build job, downloaded
  in test job, and published alongside existing Node.js badges
- Coverage summary table added to GitHub Step Summary

fixes #141

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 10:14:14 -07:00
Kpa-clawbot
0d9b535451 feat: add version and git commit to /api/stats and /api/health
Node.js: reads version from package.json, commit from .git-commit file
or git rev-parse --short HEAD at runtime, with unknown fallback.

Go: uses -ldflags build-time variables (Version, Commit) with fallback
to .git-commit file and git command at runtime.

Dockerfile: copies .git-commit if present (CI bakes it before build).
Dockerfile.go: passes APP_VERSION and GIT_COMMIT as build args to ldflags.
deploy.yml: writes GITHUB_SHA to .git-commit before docker build steps.
docker-compose.yml: passes build args to Go staging build.

Tests updated to verify version and commit fields in both endpoints.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 09:39:49 -07:00
Kpa-clawbot
ab879b78fe fix: remove continue-on-error from Go staging deploy — broken deploys should fail CI
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 09:37:14 -07:00
Kpa-clawbot
013a67481f ci: add Go staging auto-deploy to CI pipeline
Build and deploy the Go staging container (port 82) after Node staging
is healthy. Uses continue-on-error so Go staging failures don't block
the Node.js deploy. Health-checks the Go container for up to 60s and
verifies /api/stats returns the engine field.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 09:34:16 -07:00