Fixes#1258 — Perf dashboard (/#/perf) was slow because of three
frontend issues; backend APIs were never the problem.
## Findings
1. **`/api/health` fetched sequentially after `Promise.all`** in
`refresh()` — added a full RTT (~50-200ms) on every 5s tick on top of
the parallel batch.
2. **Endpoints table not actually sorted** despite the heading "sorted
by total time". JSON shape is `map[string]EndpointStatsResp` (no defined
order); frontend rendered map iteration order. Visible correctness bug
surfaced during investigation.
3. **`setInterval(refresh, 5000)` kept firing while tab was hidden**,
rebuilding the entire ~10-section `innerHTML` (cards + 3 tables) in the
background. On tab return the user saw a backlog thrash + felt the page
was "slow to render".
## Fix (`public/perf.js`)
- Move `/api/health` into the same `Promise.all` as the other 4
endpoints — saves one RTT per refresh.
- Sort `Object.entries(server.endpoints)` by `count * avgMs` DESC
client-side.
- Add `document.hidden` guard in the interval tick + `visibilitychange`
listener that refreshes once on return; `destroy()` removes the
listener.
## Tests
`test-perf-render-1258.js` (new):
- All 5 initial fetches issued in parallel (including `/api/health`)
- Refresh suppressed while `document.hidden`
- Endpoints table sorted by total time DESC, regardless of input map
order
RED commit first (`6b54f9e8`, 0/3 pass) → GREEN commit (`be81303b`, 3/3
pass). Existing `test-perf-go-runtime.js` (13/13) and
`test-perf-disk-io-1120.js` (15/15) still green.
## Investigation exemption
No Playwright timing test — sandbox can't run a real browser. Static
analysis + render-shape unit tests cover the three identified
bottlenecks. Documented per AGENTS "investigation surfaces" exemption.
## Measurement
Before: refresh = parallel batch (~max(server-side)) + sequential
`/api/health` (~50ms) + full innerHTML rebuild every 5s including hidden
tabs.
After: refresh = single parallel batch, runs only while visible.
Expected improvement on tab-return ≈ -1 RTT per refresh + zero
background work.
---------
Co-authored-by: corescope-bot <bot@corescope.local>
Red commit: e964ec9c46 (CI run: pending —
workflow only triggers on PR open)
Partial fix for #1120 — finishes the four follow-up items left open
after PR #1123 (cancelled writes, ingestor I/O, threshold-flag tests,
docs).
## What's done
- **`cancelledWriteBytesPerSec`** — server `/proc/self/io` parser
handles `cancelled_write_bytes`; `/api/perf/io` exposes the per-second
rate; Perf page renders it next to Read/Write with ⚠️ when sustained >1
MB/s.
- **Ingestor `/proc/<pid>/io`** — `cmd/ingestor/stats_file.go` samples
its own `/proc/self/io` each tick and includes `procIO` in the snapshot.
The server's `/api/perf/io` reads it and surfaces `.ingestor`. Frontend
renders an `Ingestor process` Disk I/O block alongside the existing
`server process` block (issue mockup: "Both ingestor and server").
- **Threshold + anomaly tests** — `test-perf-disk-io-1120.js` now
asserts ⚠️ fires/suppresses on WAL>100MB, cache_hit<90%, and the
backfill-rate-vs-tx-rate guard with the `tx_inserted >= 100` baseline
floor. Drops the tautological `|| ... === false` short-circuits flagged
in MINOR m4.
- **Docs (m8)** — `config.example.json` adds `_comment_ingestorStats`
(env var, default path, shared-tmp security note);
`cmd/ingestor/README.md` adds `CORESCOPE_INGESTOR_STATS` to the env-var
table plus a `Stats file` section.
## What's NOT done (deferred)
m1 sync.Map → map+RWMutex, m2 perfIOMu rate caching, m3 negative
cacheSize translation, m5 deterministic-write test, m7 ctx-aware
shutdown — pure polish; will file a follow-up issue if the operator
wants them tracked.
## TDD
- Red: `e964ec9` — adds failing tests + stub field/handler shape
(cancelled missing from struct, ingestor stub returns nil, ingestor
procIO absent).
- Green: `1240703` — wires up the parser case, ingestor sampler,
frontend rendering, docs.
E2E assertion added: test-perf-disk-io-1120.js:108
---------
Co-authored-by: clawbot <clawbot@users.noreply.github.com>
Co-authored-by: Kpa-clawbot <bot@kpa-clawbot.local>
Co-authored-by: Kpa-clawbot <bot@kpa-clawbot>
## Summary
Implements per-component disk I/O + write source metrics on the Perf
page so operators can self-diagnose write-volume anomalies (cf. the
BackfillPathJSON loop debugged in #1119) without SSHing in to run
iotop/fatrace.
Partial fix for #1120
## What's done (4/6 ACs)
- ✅ `/api/perf/io` — server-process `/proc/self/io` delta rates
(read/write bytes per sec, syscalls)
- ✅ `/api/perf/sqlite` — WAL size, page count, page size, cache hit rate
- ✅ `/api/perf/write-sources` — per-component counters from ingestor
(tx/obs/upserts/backfill_*)
- ✅ Frontend Perf page — three new sections with anomaly thresholds +
per-second rate columns
## What's NOT done (deferred to follow-up)
- ❌ `cancelledWriteBytesPerSec` field — issue #1120 lists this under
server-process I/O ("writes the kernel discarded — interesting signal");
not exposed in this PR
- ❌ Ingestor `/proc/<pid>/io` — issue #1120 says "Both ingestor and
server"; only server-process I/O lands here. Adding ingestor I/O
requires either a unix socket back to the server, or surfacing the
ingestor pid through the stats file. Doable without changing the
existing API shape.
- ❌ Adaptive baselining — anomaly thresholds remain static (10×, 100 MB,
90%); steady-state baselining can come once we have enough deployed
Perf-page telemetry
Per AGENTS.md rule 34, this PR uses "Partial fix for #1120" rather than
"Fixes #1120" so the issue stays open until the remaining ACs land.
## Backend
**Server (`cmd/server/perf_io.go`)**
- `GET /api/perf/io` — reads `/proc/self/io` and returns delta-rate
`{readBytesPerSec, writeBytesPerSec, syscallsRead, syscallsWrite}` since
last call (in-memory tracker, no allocation per sample).
- `GET /api/perf/sqlite` — returns `{walSize, walSizeMB, pageCount,
pageSize, cacheSize, cacheHitRate}`. `cacheHitRate` is proxied from the
in-process row cache (closest available signal under the modernc sqlite
driver).
- `GET /api/perf/write-sources` — reads the ingestor's stats JSON file
and returns a flat `{sources: {...}, sampleAt}` payload.
**Ingestor (`cmd/ingestor/`)**
- `DBStats` gains `WALCommits atomic.Int64` (incremented on every
successful `tx.Commit()` and on every auto-commit `InsertTransmission`
write) and `BackfillUpdates sync.Map` keyed by backfill name with
`IncBackfill(name)` / `SnapshotBackfills()` helpers.
- `BackfillPathJSONAsync` now increments `BackfillUpdates["path_json"]`
per row write — the BackfillPathJSON-style infinite loop becomes
immediately visible at `backfill_path_json` in the Write Sources table.
- New `StartStatsFileWriter` publishes a JSON snapshot to
`/tmp/corescope-ingestor-stats.json` (override via
`CORESCOPE_INGESTOR_STATS`) every second using atomic tmp+rename. The
tmp file is opened with `O_CREATE|O_WRONLY|O_TRUNC|O_NOFOLLOW` mode
`0o600` so a pre-planted symlink in a world-writable `/tmp` cannot
redirect the write to an arbitrary file.
## Frontend (`public/perf.js`)
Three new sections on the Perf page, all auto-refreshed via the existing
5s interval:
- **Disk I/O (server process)** — read/write rates (formatted
B/KB/MB-per-sec) + syscall counts. Write rate >10 MB/s flags ⚠️.
- **Write Sources** — sorted table of per-component counters with a
per-second rate column derived from snapshot deltas. Backfill rows show
⚠️ only when `tx_inserted >= 100` (meaningful baseline) AND the
backfill's per-second rate exceeds 10× the live tx rate. Avoids the
startup-spurious-alarm where cumulative-vs-cumulative was a tautology.
- **SQLite (WAL + Cache Hit)** — WAL size (⚠️ when >100 MB), page count,
page size, cache hit rate (⚠️ when <90%).
## Tests
- **Backend** (`cmd/server/perf_io_test.go`) —
`TestPerfIOEndpoint_ReturnsValidJSON`,
`TestPerfSqliteEndpoint_ReturnsValidJSON`,
`TestPerfWriteSourcesEndpoint_ReturnsSources` exercise the three new
endpoints. Skips the `/proc/self/io` non-zero-rate assertion when
`/proc` is unavailable.
- **Frontend** (`test-perf-disk-io-1120.js`) — vm-sandbox runs `perf.js`
with stubbed `fetch`, asserts the three new sections render with their
headings + values.
E2E assertion added: test-perf-disk-io-1120.js:91
## TDD
1. Red commit (`21abd22`) — added the three handlers as no-op stubs
returning empty values; tests fail on assertion mismatches (non-zero
rate, `pageSize > 0`, headings present).
2. Green commit (`d8da54c`) — fills in the real `/proc/self/io` parser,
PRAGMA queries, ingestor stats writer, and Perf page rendering.
---------
Co-authored-by: corescope-bot <bot@corescope.local>
Co-authored-by: Kpa-clawbot <kpa-clawbot@users.noreply.github.com>
## Summary
The perf page "Memory Used" tile displayed `estimatedMB` (Go
`runtime.HeapAlloc`), which includes all Go runtime allocations — not
just packet store data. This made the displayed value misleading: it
showed ~2.4GB heap when only ~833MB was actual tracked packet data.
## Changes
### Frontend (`public/perf.js`)
- Primary tile now shows `trackedMB` as **"Tracked Memory"** — the
self-accounted packet store memory
- Added separate **"Heap (debug)"** tile showing `estimatedMB` for
runtime visibility
### Backend
- **`types.go`**: Added `TrackedMB` field to `HealthPacketStoreStats`
struct
- **`routes.go`**: Populate `TrackedMB` in `/health` endpoint response
from `GetPerfStoreStatsTyped()`
- **`routes_test.go`**: Assert `trackedMB` exists in health endpoint's
`packetStore`
- **`testdata/golden/shapes.json`**: Updated shape fixture with new
field
### What was already correct
- `/api/perf/stats` already exposed both `estimatedMB` and `trackedMB`
- `trackedMemoryMB()` method already existed in store.go
- Eviction logic already used `trackedBytes` (not HeapAlloc)
## Testing
- All Go tests pass (`go test ./... -count=1`)
- No frontend logic changes beyond template string field swap
Fixes#717
Co-authored-by: you <you@example.com>
- perf.js: toFixed(1) on all ms/MB values in Go Runtime section
- style.css: white-space: nowrap on .nav-stats to prevent the · separator
from wrapping onto its own line
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
#203: Live page node detail panel becomes a bottom-sheet on mobile
(width:100%, bottom:0, max-height:60vh, rounded top corners).
#204: Perf page reduces padding to 12px, perf-cards stack in 2-col
grid, tables get smaller font/padding on mobile.
#205: Nodes table hides Public Key column on mobile via .col-pubkey
class (same pattern as packets page .col-region/.col-rpt).
Cache busters bumped in index.html.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When engine=go, the perf page now renders Go-specific runtime stats
(goroutines, GC collections, GC pause times, heap breakdown, CPUs)
instead of the misleading Node.js Event Loop metrics. Falls back to
the existing Node UI when engine is not 'go' or goRuntime data is
missing. Includes color-coded GC pause thresholds.
fixes#153
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Move --status-green/yellow/red from home.css to style.css :root (light+dark)
- Replace hardcoded status colors in style.css (.tl-snr, .health-dot, .byop-err,
.badge-hash-*, .fav-star.on, .spark-fill) with CSS variable references
- Replace hardcoded colors in live.css (VCR mode, stat pills, fdc-link, playhead)
- Replace --primary/--bg-secondary/--text-primary/--text-secondary dead vars with
canonical --accent/--input-bg/--text/--text-muted in style.css, map.js, live.js,
traces.js, packets.js
- Fix nodes.js legend colors to use ROLE_COLORS globals instead of hardcoded hex
- Replace hardcoded hex in home.js (SNR), perf.js (indicators), map.js (accuracy
circles) with CSS variable references via getComputedStyle or var()
- Add --detail-bg to customizer (THEME_CSS_MAP, DEFAULTS, ADVANCED_KEYS, labels)
- Move font/mono out of ADVANCED_KEYS into separate Fonts section in customizer
- Remove debug console.log lines from customize.js
- Bump cache busters in index.html
Perf page now accessible from main nav (⚡ Perf).
Shows in-memory packet store metrics: packets in RAM, memory used/limit,
queries served, live inserts, evictions, index sizes.