## Summary
- Adds `readCgroupMemoryMB()` to detect container memory ceiling from
cgroup v2 (`/sys/fs/cgroup/memory.max`) and v1
(`/sys/fs/cgroup/memory.limit_in_bytes`)
- Adds `warnIfMemlimitUnderprovisioned()` called once from `main()`
after the existing memlimit block — logs a `[memlimit] WARN` at startup
if the effective GOMEMLIMIT is below 50% of the container limit
- Works whether the limit was set via `GOMEMLIMIT` env var or derived
from `packetStore.maxMemoryMB`
- Adds `readCgroupMemoryMBFn` package-level hook for test injection
(same pattern as `readProcSelfIOFn` in the ingestor)
Fixes#1264. In the reported incident, GOMEMLIMIT was 1536 MiB on a 7.7
GB container; GC consumed 82% of CPU and all endpoints were 3–100×
slower. This warning fires at startup so operators catch the
misconfiguration before it causes an incident.
## Test plan
- [ ] `TestWarnIfMemlimitUnderprovisioned_EmitsWarning` — warning fires
when effective < 50% of cgroup
- [ ] `TestWarnIfMemlimitUnderprovisioned_NoWarnWhenAdequate` — no
warning at boundary (effective = 1024 MiB, cgroup = 1536 MiB)
- [ ] `TestWarnIfMemlimitUnderprovisioned_NoCgroupNoLog` — silent on
non-container hosts
- [ ] `TestWarnIfMemlimitUnderprovisioned_NoneSource` — no warning when
`source="none"` (no limit configured, runtime returns math.MaxInt64)
- [ ] `TestMemlimitUnderprovisioned` — boundary table for the comparison
helper
- [ ] All existing `TestApplyMemoryLimit_*` still pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
- Adds `TestHandleNodePaths_HopName_CanonicalPathShowsTarget_1144` as a
regression test for issue #1144
- When two nodes share a short pubkey prefix (e.g. `"37"`), the biased
hop resolver (`resolveWithContext`) could pick a GPS-having sibling over
the actual target node, producing the wrong name in hop display
- The bug was already fixed during the #1352 canonical-path work: the
canonical-path branch (Option A) uses `lookupNode(resolvedPK)` with the
full pubkey from `resolved_path`, bypassing the biased resolver entirely
- This PR documents and locks in the correct behaviour with a targeted
test
## Test setup
- `targetPK` (`37cf...`): no GPS
- `siblingPK` (`37bb...`): has GPS — the biased resolver's tier-3 picks
this without the fix
- One TX with `resolved_path = [targetPK]` → Option A fires →
`lookupNode(targetPK)` → hop shows `"CJS SF Mission"`, not `"Templeton
Hills"`
If Option A were removed (bug re-introduced), `resolveWithContext("37",
...)` on the two candidates would return the GPS-having sibling,
triggering the test failure.
## Test plan
- [x] `go test -run TestHandleNodePaths_HopName -v` passes
- [x] Full `go test ./...` passes
- [x] Code review addressed (collapsed redundant error checks)
Closes#1144🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
- Removes the TTL-based inline rebuild from `GetRepeaterRelayInfoMap`
and `GetRepeaterUsefulnessScoreMap`
- When the cache is non-nil it is returned immediately, regardless of
age — no more 700ms on-request recompute
- Inline compute is retained only as a nil-cache guard (edge case: tests
without a running recomputer)
- Fixes the stale `// 15s-TTL gate` comment in
`recomputeRepeaterEnrichmentSafe`
**Root cause:** `computeRepeaterRelayInfoMap` runs inline when the TTL
expires, taking ~700ms on a busy instance.
`StartRepeaterEnrichmentRecomputer` (introduced in #1262) already keeps
the cache warm via synchronous prewarm at startup + 5-min ticks, making
the inline path dead code that fires only when the TTL is shorter than
the recomputer interval (e.g. custom `analytics.defaultIntervalSeconds >
600`).
## Test plan
- [ ] `TestGetRepeaterRelayInfoMap_ServesStaleOnTTLExpiry` — regression
guard: stale sentinel is returned without recompute
- [ ] `TestGetRepeaterUsefulnessScoreMap_ServesStaleOnTTLExpiry` — same
for usefulness score map
- [ ] `TestGetRepeaterRelayInfoMap_BuildsWhenNil` — nil-cache fallback
still works
- [ ] Full `-short` suite passes (`go test -short ./...`)
Closes#1272🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes#1434.
## Problem
The ingestor's `Checkpoint()` (`PRAGMA wal_checkpoint(TRUNCATE)`) was
only called on shutdown. SQLite's built-in auto-checkpoint runs in
PASSIVE mode which cannot truncate the WAL while the server holds an
active read connection. Result: the WAL grows at ~40–50 MB/hour and is
never reset during a running instance.
Observed on analyzer.on8ar.eu: **183.4 MB WAL** after ~4h uptime.
## Changes
**`cmd/ingestor/main.go`**
- Add a periodic goroutine that calls `Checkpoint()` every hour,
staggered 30s after startup
- Hoist `walCheckpointTicker` to function scope so it is stopped cleanly
at shutdown alongside all other tickers
**`cmd/ingestor/db.go`**
- Switch `Checkpoint()` from `Exec` to `QueryRow(...).Scan` to capture
SQLite's 3-column result (`busy`, `log`, `checkpointed`)
- Return the checkpointed frame count (callers that discard it are
unaffected)
- Log only when `walFrames > 0` — silent when WAL is already empty,
avoiding log spam
- Log `blocked=true/false` instead of raw `busy` integer to make it
clear when the server's read lock is preventing full truncation
## Behaviour after fix
Each hourly tick flushes all WAL frames not held by an active server
reader. Worst-case WAL size is now bounded to roughly one hour of write
traffic (~45 MB) instead of unbounded growth. If the server holds a read
lock at checkpoint time, the log shows `blocked=true` and remaining
frames are retried on the next tick.
## Test plan
- [x] `go build ./...` (ingestor module)
- [x] `go test ./...` passes
- [x] Code review addressed (ticker stop on shutdown, log message
clarity)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
CoreScope's ingestor already supports WebSocket MQTT connections today —
`paho.mqtt.golang` v1.5.0 handles `ws://` and `wss://` natively via
gorilla/websocket. However this support was **undocumented, untested,
and had a TLS gap** for `wss://` connections.
This PR closes those gaps without any breaking changes.
## Changes
### `cmd/ingestor/config.go`
- Added godoc comment to `ResolvedSources()` explaining all four
supported schemes and which ones require translation vs. pass-through
- `ws://` and `wss://` explicitly documented as native paho schemes
requiring no mapping
### `cmd/ingestor/main.go`
- Extended TLS config to cover `wss://` in addition to `ssl://`
- Before: `wss://` connections would use paho's default TLS (no explicit
`tls.Config` set), which works for valid certs but doesn't apply the
same predictable setup as `ssl://`
- After: both `ssl://` and `wss://` get `tls.Config{}` (system CA pool),
matching behavior; `rejectUnauthorized: false` still works for
self-signed certs on both schemes
### `cmd/ingestor/config_test.go`
Two new tests:
- `TestResolvedSourcesSchemeMapping`: validates all six scheme
variations (`mqtt://`, `mqtts://`, `tcp://`, `ssl://`, `ws://`,
`wss://`) including paths like `wss://host/mqtt`
- `TestLoadConfigWSSource`: full round-trip of a dual-source config (TCP
+ wss:// with username/password), verifies scheme unchanged through
`LoadConfig` and `ResolvedSources`
### `config.example.json`
- Added `wsmqtt` example entry showing `wss://` with username/password
- Updated `_comment_mqttSources` to enumerate all supported schemes:
`mqtt://`, `mqtts://`, `ws://`, `wss://`
## Motivation
We run
[meshcore-mqtt-broker](https://github.com/andrewjfreyer/meshcore-mqtt-broker)
(a WebSocket MQTT bridge with JWT auth) alongside Mosquitto, and
subscribe to both via `mqttSources`. The dual-source config works in
production but nothing in the docs or example config made this
discoverable for other operators.
## Testing
```
cd cmd/ingestor && go test ./...
ok github.com/corescope/ingestor 1.568s
```
All existing tests pass. Two new tests added.
## No breaking changes
- Existing configs: no change in behavior
- `ws://` / `wss://` configs that were already working: same behavior +
explicit TLS setup for `wss://`
## Summary
- `/api/nodes/{pk}/paths` returned paths in non-deterministic map
iteration order; with many paths the UI showed a random ordering on each
page load
- Now sorted by `LastSeen` descending (newest-first), with `Count` as a
tiebreaker (higher first)
- Nil `LastSeen` sorts last (treated as oldest)
- `LastSeen` is an RFC 3339 string so lexicographic comparison is
correct
Closes#1145.
## Test plan
- [ ] `TestHandleNodePaths_SortByRecency_1145` — 3 distinct paths (via
relay1, relay2, direct), verifies newest appears first
- [ ] `TestHandleNodePaths_SortCountTiebreaker_1145` — two paths with
identical `LastSeen`, verifies higher-count path wins the tiebreak
- [ ] All existing `TestHandleNodePaths_*` tests still pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
`observer.last_seen` (and `last_packet_at`) answer "when did the
analyzer last hear from this observer" — fundamentally an ingest-time
question. Previously both the status-message handler and the
packet-message handler passed the MQTT envelope timestamp into
`UpsertObserverAt` / `stmtUpdateObserverLastSeen`, which let buggy
observer clocks drag `last_seen` hours into the past even when the
timestamp parsed cleanly as RFC3339 (so #1464's naive-clamp didn't catch
it).
California observers on `analyzer.00id.net` consistently appeared 3-7h
stale for this reason.
## Fix
- `cmd/ingestor/main.go` status handler: pass `""` to `UpsertObserverAt`
so it falls back to `time.Now()`.
- `cmd/ingestor/main.go` packet-path observer upsert: same.
- `cmd/ingestor/db.go` `InsertTransmission`'s
`stmtUpdateObserverLastSeen.Exec` call: use `ingestNow` for both
`last_seen` and `last_packet_at` (was `rxTime`).
Per-packet rxTime semantics (`transmissions.first_seen`,
`observations.timestamp`) are unchanged — those continue to use envelope
time with the naive-clamp / 14h-future / 30d-past guards from #1463 /
#1464. Per-hop SNR-vs-time analysis still works.
## TDD
- Red: `test(#1465): observer.last_seen uses ingest time even with
well-formed envelope (red)`
- 3 new tests in `observer_lastseen_1465_test.go`: status-past,
status-future, packet-path-past.
- Status-past and packet-path-past assertions failed on master (envelope
time stored verbatim).
- Green: `fix(#1465): observer.last_seen always uses ingest time, not
envelope`
- All 3 new tests pass.
- Pre-existing `TestInsertTransmissionUpdatesObserverLastSeen` and
`TestLastPacketAtUpdatedOnPacketOnly` were encoding the buggy behavior;
updated to assert ingest-time semantics.
- Full `go test ./cmd/ingestor/...` green.
## Refs
- Refs #1463 (root-cause investigation)
- Refs #1464 (naive-clamp fix that handled malformed timestamps)
- Closes#1465
---------
Co-authored-by: openclaw-bot <bot@openclaw.local>
Red commit: fc6ed65f (CI fails on
`TestResolveRxTimeNaiveTimestampClamp`)
Green commit: 80bf1285
## Problem
California observers (UTC−7) had `last_seen` perpetually pinned ~7h
behind wall-clock and rendered "Stale" in the UI despite active MQTT
status traffic. Root cause: `parseEnvelopeTime` parses zone-less ISO
timestamps (python `datetime.now().isoformat()`) as UTC, leaving a
residual offset equal to the observer's UTC offset. The existing
soft-clamp at `resolveRxTime` only caught the future-skew (UTC+N) mirror
case.
## Fix — Option B (symmetric clamp)
- `parseEnvelopeTime` now returns a `(time.Time, naive bool, error)`
tuple so callers can tell zone-aware from zone-less parses.
- `resolveRxTime` applies a 15-minute symmetric tolerance window for
`naive==true` values: anything further off than 15 min collapses to
ingest time and emits a warning log.
- Well-behaved observers (Z-suffixed or explicit `±HH:MM` offset) are
completely untouched regardless of skew — legitimate buffered uploads
remain accurate to the second.
Chose option B over option A (reject naive outright) because some
observers may be sending naive *UTC* strings — those would suddenly lose
their own time. Symmetric clamp preserves the well-synced naive case (<
15 min off) and rescues every other zone.
## Tests
- New `TestResolveRxTimeNaiveTimestampClamp` covers naive past, naive
future, naive w/ microseconds, Z-suffixed past (verbatim),
offset-suffixed (canonicalized to UTC), naive within tolerance
(verbatim).
- `TestParseEnvelopeTime` updated for new signature, asserts `naive`
flag.
- All existing rxtime tests preserved (factory date, 30-day floor, 14h
future, plausible past).
- Red commit ran first, failed on assertions, then green commit makes
everything pass.
## Operator visibility
`naive timestamp "..." off by 7h, using ingest time` now appears in the
ingestor log so operators can identify upstream observer scripts that
should switch to `datetime.now(timezone.utc).isoformat()`.
Fixes#1463
---------
Co-authored-by: openclaw-bot <bot@openclaw.local>
## Summary
Master CI has been failing on `test-channel-color-picker-e2e.js` — the
"outside click closes popover" step — most recently on run
[26574358472](https://github.com/Kpa-clawbot/CoreScope/actions/runs/26574358472)
(master push `d24246395`). The previous deflake attempt (#1317, commit
62a81776) only papered over part of the race.
## Root cause
`showPopover` in `public/channel-color-picker.js:148-152` installs the
document-level outside-click listener inside a `setTimeout(0)`:
```js
setTimeout(function() {
document.addEventListener('click', onOutsideClick, true);
document.addEventListener('keydown', onEscape, true);
}, 0);
```
The previous fix tried to wait for that listener with a `rect.width > 0`
"popover visible" proxy — but visibility ≠ listener install. Under CI
load, the macrotask can be deferred past Playwright's polling
resolution, so `page.mouse.click(700, 500)` fires before the listener
exists, the click is dropped, and the second `waitForFunction` runs out
the 8s default timeout.
## Fix (test-only)
1. **Drain pending macrotasks node-side** with `requestAnimationFrame` ×
2 + `setTimeout(0)` before clicking, so the same scheduler tier the
listener uses has definitely run.
2. **Retry the outside click in a small loop** (up to 10×, 1s each).
Even if the very first synthetic click still races install, subsequent
clicks land cleanly. Each retry is cheap (~ms), and `assert(closed,
...)` gives a clear failure message if the popover never hides.
## Verification
| Scenario | Old test | New test |
|---|---|---|
| Baseline (no artificial delay) | passes | 45/45 clean runs locally |
| Artificially delay listener install to **250ms** | **5/5 FAIL** | 5/5
PASS (popover closes on retry #2) |
Production code untouched. Comment block in-test captures the history so
the next person doesn't re-introduce the race.
## Linked
- Supersedes the partial fix in #1317
- CI run that exposed it:
https://github.com/Kpa-clawbot/CoreScope/actions/runs/26574358472
Co-authored-by: Kpa-clawbot <bot@kpa-clawbot.local>
## Closes#1415 — packets cross-viewport jank
## Closes#1458 — Tufte mobile-packets P0 findings (folded into same
branch)
Single PR covers both issues — they touch the same files
(`public/packets.js`,
`public/style.css`) and a split would invite merge thrash.
### #1415 — column priority + chrome compaction
Locked column-priority tiers (operator spec):
| Tier | Viewport | Columns |
|---|---|---|
| 1 | always (mobile through desktop) | expand · time · type · details |
| 2 | tablet+ (>768px) | path |
| 3 | desktop only (>1024px) | hash · observer · rpt |
Enforced via existing `data-priority` system in `TableResponsive.apply`
(priorities 3 → hide ≤1024, 5 → hide ≤768).
CSS:
- `.col-expand` pinned to `width/min-width/max-width: 32px` at every
viewport
— kills the 50–180px dead column that pushed every data column right.
- `.col-details` capped at `max-width: 480px` so wide viewports stop
wasting
hundreds of px on the last column.
- `@media (max-width: 480px)` hides page-header BYOP, shrinks the h2,
and
tightens row padding → pre-table chrome drops from ~280px to ~140px.
### #1458 — Tufte mobile P0 findings
**P0-A: semantic-first detail panel.** Was: `"Packet Byte Breakdown (134
bytes)"`
title + giant neon hex grid above the meaningful fields. Now: type badge
+
decoded summary + hop count + `src → dst` lead the panel, followed by
the
existing `.detail-meta` dl (reordered: Payload Type → Path → Timestamp →
Observer).
**P0-B: raw-bytes disclosure.** Hex legend / hex dump / field table
wrapped in
`<details class="detail-technical">`. Disclosure copy reads "Show raw
bytes".
Collapsed by default on phones (`window.innerWidth ≤ 480`), expanded on
tablet+.
**P0-C: mobile filter-zone collapse.** The always-on filter-expression
input
above `.filter-bar` is now wrapped with `.pkt-filter-expr` and hidden
under
the `@media (max-width: 480px)` block. Reveals when the existing
"Filters ▾"
toggle adds `.filters-expanded` to the sibling `.filter-bar` (CSS
`:has()`
selector — one tap reveals both chrome rows together).
### TDD
`test-issue-1415-packets-layout.js` — pure source-grep, no browser:
- col-expand class on first `<th>` + `<td>` + CSS 32px pin
- locked column-priority tier values per column
- `.col-details` max-width ≤ 480px
- mobile @media block: hides BYOP, hides `.pkt-filter-expr` (revealed by
`.filters-expanded`)
- detail-meta order: Payload Type before Observer
- `<details class="detail-technical">` wrapper exists with "Show raw
bytes"
summary
- detail-title leads with a type badge; `.detail-srcdst` emitted
- old "Packet Byte Breakdown (N bytes)" title literal removed
Red commit `d4372d82` (8 assertion failures, no compile errors), green
commit `4fab9dbd` (#1415 work), follow-up commit `a5218035` (#1458 work)
keeps everything green. 26 assertions, 0 failed.
---------
Co-authored-by: openclaw-bot <bot@openclaw>
## Summary
Rename the "Usefulness" UI label to "Traffic share", add hover tooltips
for both Traffic share and Bridge score, and introduce a new
`traffic_share_score` field on `/api/nodes` (alongside the legacy
`usefulness_score`, kept for API back-compat).
Closes#1456.
## Why
The "Usefulness" label implied a composite score that doesn't exist yet
— only the Traffic-share axis (axis 1 of 4 from #672) and the Bridge
axis (axis 2 of 4 from #1275) are wired today. A node with low traffic
but critical structural position read as "not useful" — exactly wrong.
Neither score had a tooltip explaining what it measured.
## Changes
### Frontend (`public/nodes.js`)
- Visible label `Usefulness` → `Traffic share` (with ⓘ glyph)
- Tooltip explains traffic-share semantics, cross-references Bridge for
structural importance, points at #672 for the 4-axis roadmap
- Bridge row gets a parallel ⓘ glyph and a tooltip naming "betweenness
centrality" + the "quiet but irreplaceable chokepoint" interpretation
- Prefers new `traffic_share_score` with graceful fallback to legacy
`usefulness_score`
### Backend (`cmd/server/routes.go`)
- `/api/nodes` and `/api/nodes/{pubkey}` now emit BOTH
`usefulness_score` (kept for API compat) AND `traffic_share_score` (new
canonical name), populated with the same value
- Inline comment documents the deprecation path: when the #672 composite
ships, `usefulness_score` becomes the composite and
`traffic_share_score` keeps the per-axis value
## Tests
- `test-issue-1456-score-labels.js` — file-grep pins on `nodes.js`
(label, tooltip fragments, percent formatting, dual-field read with
fallback)
- `cmd/server/traffic_share_score_test.go` — `/api/nodes` +
`/api/nodes/{pk}` responses contain both fields with equal values
TDD: red commit (`8bd235a0`) added failing tests; green commit
(`c4d3aee5`) implemented. `go test ./cmd/server/...` passes (47s).
## Out of scope
- Renaming the backend field (would break consumers)
- Wiring axes 3 (Coverage) and 4 (Redundancy) — tracked in #672
- Changing the score calculation
---------
Co-authored-by: clawbot <bot@openclaw.local>
## Summary
Adds a customizer checkbox that toggles
`localStorage["channels-show-encrypted"]` — the read-gate that controls
whether `/api/channels` is fetched with `?includeEncrypted=true`. Today
operators can only flip that gate from DevTools; this PR gives them the
obvious affordance.
Default behavior is unchanged: key remains unset → server filters
encrypted entries → ~19 channels rendered. Toggle ON sets the key to
`"true"` → fetch grows to ~265 with `Encrypted (0xAB)` entries.
## Behavior
- **Display tab → new "Channels" subsection → "Show encrypted channels"
checkbox.**
- ON writes `localStorage["channels-show-encrypted"] = "true"`.
- OFF *removes* the key (never writes `"false"`) so the read-gate
cleanly returns false and the customizer match-default detection still
works.
- Toggling dispatches `mc-channels-show-encrypted-changed`;
`channels.js` listens and re-fetches via `loadChannels()` — no page
reload.
- Tooltip / hint copy: "Encrypted channels appear as 'Encrypted (0xAB)'
with no name. Operators usually leave this off."
## TDD
`test-issue-1454-channels-toggle.js` — source-grep invariants:
- Red commit `feb9dcee`: assertions on customizer + listener — failed
(production code not yet present).
- Green commit `d8742f2c`: production patch — passes.
Read-gate at `public/channels.js:1564` is left untouched; the test
asserts it.
## Out of scope
- Migration of legacy localStorage values into customizer overrides (no
override store needed — we keep using the raw localStorage key as the
single source of truth).
- Per-region toggle.
- Decryption key UI.
Closes#1454
---------
Co-authored-by: openclaw-bot <bot@openclaw.local>