Commit Graph

4 Commits

Author SHA1 Message Date
Kpa-clawbot efd66ea3f5 feat(mqtt): per-source status endpoint + Observers panel (#1682)
## Summary

Adds MQTT source status visibility per #1043 acceptance criteria:

- **Ingestor:** per-source counter registry
(`cmd/ingestor/source_status.go`) tracking `connected`,
`lastConnectUnix`, `lastDisconnectUnix`, `lastPacketUnix`,
`connectCount`, `disconnectCount`, `packetsTotal`, `packetsLast5m`
(sliding 5-min window via per-second buckets keyed by unix second — no
stale-leak), `lastError`. Wired at the existing OnConnect /
ConnectionLost / DefaultPublish callsites alongside the liveness
watchdog. Idempotent registration so counters survive reconnects.
Snapshot emitted in the existing stats file under `source_statuses`
(additive, `omitempty`).
- **Backend:** new `GET /api/mqtt/status` handler reads the ingestor
stats file and returns the per-source list. **Broker passwords are
masked** via a regex over the `scheme://user:pass@host` form (covers
mqtt/mqtts/tcp/ssl/ws/wss). Mask is also applied to `lastError` as
defense-in-depth (broker libs occasionally quote the failing URL).
OpenAPI completeness gate satisfied with a `routeDescriptions` entry.
- **Frontend:** small self-contained panel
(`public/mqtt-status-panel.js`) mounted above the Observers table.
Auto-refreshes every 10s, color-codes each row (green = connected +
recent packet, yellow = connected idle, red = disconnected), and tears
down its timer on SPA route change.

## TDD

- Red commit `f19a93b5` — stub `/api/mqtt/status` handler + assertion
test that the broker password is `****`-redacted. Test fails on the
assertion (handler passes the URL through verbatim). Compile-clean —
assertion-fail, not build-fail.
- Green commit `77042e41` — `maskBrokerURL` helper + table-driven unit
tests across all schemes + handler rewires to mask both `Broker` and
`LastError`.
- Subsequent commits land the ingestor wiring and the frontend panel.

## Tests

```
$ cd cmd/server && go test -run 'TestMqttStatus|TestMaskBrokerURL' -v ./...
PASS: TestMqttStatus_MasksBrokerPassword
PASS: TestMqttStatus_EmptyWhenNoStatsFile
PASS: TestMaskBrokerURL_Patterns (10 subtests)

$ cd cmd/ingestor && go test -run 'TestSourceStatus|TestSnapshotSourceStatuses' -v ./...
PASS: TestSourceStatus_BasicLifecycle
PASS: TestSourceStatus_Disconnect
PASS: TestSnapshotSourceStatuses_ReturnsAll

$ node test-mqtt-status-panel.js
7 passed, 0 failed
```

Full `go test ./...` clean in both `cmd/server` and `cmd/ingestor`.

## Preflight overrides

- `cross-stack`: justified — issue #1043 is intrinsically full-stack
(ingestor stats → server endpoint → observers panel). Per-stack split
would land an unreachable endpoint or a fetch with no backend.
- `check-xss-sinks` (public/mqtt-status-panel.js:55): justified — the
flagged `innerHTML=` is a fully-static literal (empty-state placeholder,
no payload data interpolated). All payload-bearing `innerHTML=` sites in
this file run through `escapeHTML` (defined in the same file); the test
`renderPanel never echoes a plaintext password (defense-in-depth)`
exercises the rendered HTML against payload strings.

## Acceptance criteria

- [x] `/api/mqtt/status` returns per-source connection state —
`cmd/server/mqtt_status.go`
- [x] UI panel shows all configured sources with live status —
`public/mqtt-status-panel.js`
- [x] Connection state updates on reconnect/disconnect events —
`MarkConnect` / `MarkDisconnect` wired in `cmd/ingestor/main.go`
- [x] Broker URLs don't expose passwords in the API response —
`maskBrokerURL` + 13 test cases
- [x] Works with 1-N sources — registry is keyed per-source, snapshot
iterates the map

**Partial fix for #1043** — per-packet `mqtt_source` attribution (the
issue's "Follow-up" section) is **deferred** per the `mc-bot-triaged:v1`
triage and the autofix comment ("Per-packet attribution deferred to
follow-up issue"). That work requires a new observation-row column and
DB schema migration, both explicitly out of scope for this PR.

Refs #1043

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
2026-06-12 08:11:02 -07:00
Kpa-clawbot 1da2034341 refactor(db): move all writes from server to ingestor; server truly read-only (fixes #1283) (#1286)
**Red commit:** f6290b63 — CI run will appear at
https://github.com/Kpa-clawbot/CoreScope/actions

Fixes #1283.

## What

Moves all four DB write operations out of `cmd/server/` into
`cmd/ingestor/`, making the server truly read-only and eliminating the
SQLITE_BUSY VACUUM bug at its root: the server can no longer race the
ingestor for the write lock because the server has no write path.

## The four operations

| # | Was in | Now in |
|---|--------|--------|
| 1 | `cmd/server/vacuum.go` (`checkAutoVacuum`, full VACUUM +
`auto_vacuum=INCREMENTAL` migration) | `cmd/ingestor/db.go`
`Store.CheckAutoVacuum` (already existed; ingestor runs it at startup
**before** the MQTT subscriber starts → no contention) |
| 2 | `cmd/server/db.go` `PruneOldPackets` (`DELETE FROM transmissions`)
| `cmd/ingestor/maintenance.go` `Store.PruneOldPackets` (new) + 24h
ticker in `cmd/ingestor/main.go` |
| 3 | `cmd/server/db.go` `PruneOldMetrics` (`DELETE FROM
observer_metrics`) | `cmd/ingestor/db.go` `Store.PruneOldMetrics`
(already existed) |
| 4 | `cmd/server/db.go` `RemoveStaleObservers` (`UPDATE observers SET
inactive=1`) | `cmd/ingestor/db.go` `Store.RemoveStaleObservers`
(already existed) |

## HTTP surface

- **Removed:** `POST /api/admin/prune` (`handleAdminPrune`, route,
openapi entry). Operators trigger an ad-hoc prune by restarting the
ingestor.
- **Kept:** `GET /api/backup` — uses `VACUUM INTO` which writes to a
separate file, not the live DB; read-only-safe.

## Tests

- `cmd/server/readonly_invariant_test.go` (RED gate) — reflect-asserts
`PruneOldPackets`/`PruneOldMetrics`/`RemoveStaleObservers` are NOT
methods on the server's `*DB`. Fails on master, passes after this PR.
- `cmd/ingestor/issue1283_test.go` — exercises `Store.PruneOldPackets`
and the auto_vacuum=NONE → INCREMENTAL migration through
`Store.CheckAutoVacuum` with `vacuumOnStartup=true`.

## Why the bug is gone

The SQLITE_BUSY VACUUM failure happened because supervisord launched
both ingestor + server in one container; the ingestor took the write
lock for INSERTs and the server's `checkAutoVacuum` then failed to
acquire it within `busy_timeout=5000`. After this PR, only the ingestor
ever opens a writable connection, and it runs `CheckAutoVacuum`
**before** spawning the MQTT subscriber → no contention possible.

## Scope notes

- `cachedRW()` still has three pre-existing callers in `cmd/server/`
(`neighbor_persist.go`, `ensure_indexes.go`,
`from_pubkey_migration.go`). These pre-date #1283 and are not in the
issue's four-operation list. Leaving them for follow-up keeps this PR
honest about scope; AGENTS.md documents the invariant so new write paths
can't sneak in.
- PII preflight reports false positives on the Go method name
`requireAPIKey` in `routes.go` diff context — no real PII.
- Server-side neighbor-edge prune (`PruneNeighborEdges`) intentionally
left in place — out of scope of #1283.

---------

Co-authored-by: MeshCore Bot <bot@meshcore.local>
2026-05-18 23:52:27 -07:00
Kpa-clawbot b06adf9f2a feat: /api/backup — one-click SQLite database export (#474) (#1022)
## Summary

Implements `GET /api/backup` — one-click SQLite database export per
#474.

Operators can now grab a complete, consistent snapshot of the analyzer
DB with a single authenticated request — no SSH, no scripts, no DB
tooling.

## Endpoint

```
GET /api/backup
X-API-Key: <key>            # required
→ 200 OK
  Content-Type: application/octet-stream
  Content-Disposition: attachment; filename="corescope-backup-<unix>.db"
  <body: complete SQLite database file>
```

## Approach

Uses SQLite's `VACUUM INTO 'path'` to produce an atomic, defragmented
copy of the database into a fresh file:

- **Consistent**: VACUUM INTO runs at read isolation — the snapshot
reflects a single point in time even while the ingestor is writing to
the WAL.
- **Non-blocking**: writers continue uninterrupted; we never hold a
write lock.
- **Works on read-only connections**: verified manually against a
WAL-mode source DB (`mode=ro` connection successfully produces a
snapshot).
- **No corruption risk**: even if the live on-disk DB has issues, VACUUM
INTO surfaces what the server can read rather than copying broken pages
byte-for-byte.

The snapshot is staged in `os.MkdirTemp(...)` and removed after the
response body is fully streamed (deferred cleanup). Requesting client IP
is logged for audit.

The issue suggested an alternative in-memory rebuild path; `VACUUM INTO`
is simpler, faster, and produces a strictly more accurate copy of what
the server actually sees, so going with it.

## Security

- Mounted under `requireAPIKey` middleware — same gate as other admin
endpoints (`/api/admin/prune`, `/api/perf/reset`).
- Returns 401 without a valid `X-API-Key` header.
- Returns 403 if no API key is configured server-side.
- `X-Content-Type-Options: nosniff` set on the response.

## TDD

- **Red** (`99548f2`): `cmd/server/backup_test.go` adds
`TestBackupRequiresAPIKey` + `TestBackupReturnsValidSQLiteSnapshot`.
Stub handler returns 200 with no body so the tests fail on assertions
(Content-Type / Content-Disposition / SQLite magic header), not on
import or build errors.
- **Green** (`837b2fe`): real implementation lands; both tests pass;
full `go test ./...` suite stays green.

## Files

- `cmd/server/backup.go` — handler implementation
- `cmd/server/backup_test.go` — red-then-green tests
- `cmd/server/routes.go` — route registration under `requireAPIKey`
- `cmd/server/openapi.go` — OpenAPI metadata so `/api/openapi`
advertises the endpoint

## Out of scope (follow-ups)

- Rate limiting (issue suggested 1 req/min). Not added here —
admin-key-gated endpoint with a fast snapshot path is acceptable for v1;
happy to add a token-bucket limiter in a follow-up if operators report
hammering.
- UI button to trigger the download (frontend work — separate PR).

Fixes #474

---------

Co-authored-by: corescope-bot <bot@corescope.local>
2026-05-03 17:56:42 -07:00
Kpa-clawbot 0f5e2db5cf feat: auto-generated OpenAPI 3.0 spec endpoint + Swagger UI (#530) (#632)
## Summary

Auto-generated OpenAPI 3.0.3 spec endpoint (`/api/spec`) and Swagger UI
(`/api/docs`) for the CoreScope API.

## What

- **`cmd/server/openapi.go`** — Route metadata map
(`routeDescriptions()`) + spec builder that walks the mux router to
generate a complete OpenAPI 3.0.3 spec at runtime. Includes:
- All 47 API endpoints grouped by tag (admin, analytics, channels,
config, nodes, observers, packets)
- Query parameter documentation for key endpoints (packets, nodes,
search, resolve-hops)
  - Path parameter extraction from mux `{name}` patterns
  - `ApiKeyAuth` security scheme for API-key-protected endpoints
  - Swagger UI served as a self-contained HTML page using unpkg CDN

- **`cmd/server/openapi_test.go`** — Tests for spec endpoint (validates
JSON structure, required fields, path count, security schemes,
self-exclusion of `/api/spec` and `/api/docs`), Swagger UI endpoint, and
`extractPathParams` helper.

- **`cmd/server/routes.go`** — Stores router reference on `Server`
struct for spec generation; registers `/api/spec` and `/api/docs`
routes.

## Design Decisions

- **Runtime spec generation** vs static YAML: The spec walks the actual
router, so it can never drift from registered routes. Route metadata
(summaries, descriptions, tags, auth flags) is maintained in a parallel
map — the test enforces minimum path count to catch drift.
- **No external dependencies**: Uses only stdlib + existing gorilla/mux.
Swagger UI loaded from unpkg CDN (no vendored assets).
- **Security tagging**: Auth-protected endpoints (those behind
`requireAPIKey` middleware) are tagged with `security: [{ApiKeyAuth:
[]}]` in the spec, matching the actual middleware configuration.

## Testing

- `go test -run TestOpenAPI` — validates spec structure, field presence,
path count ≥ 20, security schemes
- `go test -run TestSwagger` — validates HTML response with swagger-ui
references
- `go test -run TestExtractPathParams` — unit tests for path parameter
extraction

---------

Co-authored-by: you <you@example.com>
2026-04-05 15:05:20 -07:00