mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-06-06 09:51:36 +00:00
fe997fefb2b083062ee252f957b147cd6a3de106
141 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
f15d2efe81 |
fix(#1386): #1324 follow-up — test coverage + RWMutex + lock-hold-time + dead code + cadence (#1390)
# #1324 follow-up — test coverage + RWMutex + lock-hold-time + dead code + cadence Addresses the post-merge audit findings in #1386 on PR #1324 (multi-byte capability persistence). Two independent audits (Kent Beck test-quality + Carmack perf) surfaced one top-level test-coverage gap and three perf concerns. This PR closes all of them; cadence cleanup is included. Red commit: `<RED_SHA>` (CI: `<RED_URL>`) ## What 1. **Tests** (`cmd/ingestor/multibyte_persist_test.go`): - `TestRunMultibyteCapPersist_RoundTrip` — end-to-end persist → close store → reopen → assert DB state survived. - `TestRunMultibyteCapPersist_MalformedSnapshot` — corrupt snapshot must log + no-op, not crash. - `TestRunMultibyteCapPersist_MissingSchemaColumns` — legacy DB without `multibyte_sup` cols must skip with explicit log, not panic / silently swallow. - `TestRunMultibyteCapPersist_PreservesConfirmedOnUnknown` — status=`unknown` MUST NOT clobber an existing `confirmed` row (mutation guard for the data-destruction check). 2. **`cmd/server/store.go`** - `cacheMu sync.Mutex` → `sync.RWMutex`. The per-node `GetMultibyteCapFor` read path in `/api/nodes` (`routes.go:1215`) uses `RLock` now; no longer serializes against itself or against analytics readers. - Build the multi-byte index map OUTSIDE `cacheMu`, then swap the pointer inside. Removes a 2400-iteration allocation hold from the analytics-cycle critical section. - Drop the dead `GetMultiByteCapMap` (zero callers confirmed by `rg`) and the stale `multibyteStatusToInt` tombstone comment. 3. **`cmd/ingestor/multibyte_persist.go`** - Replace the per-entry pair of `UPDATE nodes` + `UPDATE inactive_nodes` (50% guaranteed-miss) with a single dispatch-by-table-membership `UPDATE` per entry. ~50% fewer prepared-stmt round-trips. - Explicit `MalformedSnapshot` log line distinct from cold-start. - Defensive schema-presence check via `PRAGMA table_info` once at start; logs `[multibyte-persist] schema missing` and returns clean stats on legacy DBs. 4. **`cmd/server/analytics_recomputer.go` / `config.example.json`** — bump default snapshot cadence from 15s to 1m (the snapshot is a derived cache the ingestor only reads every 5 min; 4× less disk churn, no observable freshness loss). ## Why Direct quotes from the audit (#1386): > *"No end-to-end persist→restart→load round-trip — the documented > value prop of the PR ('survives restart') has no single test > exercising the full path."* (Kent Beck) > *"`cacheMu` is `sync.Mutex` not `sync.RWMutex` + per-node read in > `handleNodes` — 2400 serialized lock acquisitions per `/api/nodes` > call, contended against every analytics-cache reader/writer. > The O(1) win is consumed by lock contention."* (Carmack #1) > *"Map construction held under shared `cacheMu` — every 15s > analytics cycle blocks every API cache read for the duration of a > 2400-entry map build. Build outside the lock, swap pointer > inside."* (Carmack #2) > *"`UPDATE nodes` + `UPDATE inactive_nodes` per entry … 4800 > prepared-stmt round-trips, 2400 guaranteed-empty."* (Carmack #3) > *"Server writes 20 snapshots for every one the ingestor reads. > Cadence mismatch — server could publish every 1 min and lose > nothing."* (Carmack §2) ## TDD Red commit adds the four tests above. Two of the four (`MalformedSnapshot`, `MissingSchemaColumns`) fail on assertions against the pre-fix `multibyte_persist.go`; the other two (`RoundTrip`, `PreservesConfirmedOnUnknown`) are regression coverage of behaviour the original implementation already honoured but never exercised — they exist to guard future mutation (the audit's mutation-suggestion lens). Green commit lands the implementation. ## Bench `go test -bench BenchmarkGetMultibyteCapFor -benchmem -count=10` (local, idle laptop, n=2400-entry index, 8 reader goroutines vs. one analytics writer): | variant | ns/op | allocs/op | |--------------------|------:|----------:| | `sync.Mutex` (pre) | n/a — see note | — | | `sync.RWMutex` | n/a — see note | — | Note: did not produce a concurrent benchmark in this PR (would require non-trivial test scaffolding around the cache lifecycle). The win is structural — `RLock` allows the ~2400 per-`/api/nodes` reads to proceed in parallel rather than serializing on the same mutex held by every analytics writer. Documenting honestly per AGENTS.md "perf claims require proof": full microbench deferred to a follow-up. ## Manual verification (staging) - New tests: `go test ./... -count=1 -timeout 300s` in `cmd/ingestor` and `cmd/server` — green. - All multibyte-area tests (`#1366`, `#1368`, `#1372` regression suites in `multibyte_capability_test.go`, `multibyte_enrich_test.go`, `multibyte_region_filter_test.go`): green. - Preflight: `bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master` — exit 0. Fixes #1386 --------- Co-authored-by: claw <claw@openclaw.local> |
||
|
|
0b35c7eef3 |
feat(server): persist multi-byte capability across restart + O(1) per-key lookup (#903) (#1324)
## Summary Follows the reconciliation recommendation in #916 — extracts only the NET-NEW persistence layer from that PR (which is now superseded by #1002 for the overlay UI) into a focused 6-file change against current master. **What this adds:** - `multibyte_sup_v1` migration: `multibyte_sup INTEGER NOT NULL DEFAULT 0` + `multibyte_evidence TEXT` on `nodes`/`inactive_nodes` so capability survives restart - `hasMultibyteSupCols` schema detection gates the persist/load paths - `loadMultibyteCapFromDB()`: pre-populates `mbCapSnapshot`/`mbCapIndex` at startup — cold starts serve last-known capability without waiting for the first ~15s analytics cycle - `maybePersistMultibyteCapability()` + `persistMultibyteCapability()`: after each analytics cycle; TryLock-gated (concurrent cycles coalesce); skips `sup==0` entries (data-destruction guard) - `GetMultibyteCapFor(pk)`: O(1) map lookup; both `handleNodes` and node-detail call sites updated from the O(N)-alloc `GetMultiByteCapMap()` **What this explicitly does NOT change:** - API field names (`multi_byte_status`, `multi_byte_evidence`, `multi_byte_max_hash_size`) - `EnrichNodeWithMultiByte` — unchanged - `GetMultiByteCapMap` — still present for any external callers - `public/map.js`, `public/live.css`, `Dockerfile`, `docs/` — zero frontend churn ## Test plan - [x] `TestMultibyteCapPersistRoundTrip` — confirmed values survive persist → fresh-store load - [x] `TestMultibyteCapPersistSkipsUnknown` — data-destruction guard: `sup==0` entry does not overwrite DB-confirmed value - [x] `TestMultibyteCapMaybePersistCoalesces` — TryLock coalesces 10 concurrent callers without deadlock - [x] `TestMultibyteCapGetMultibyteCapForO1` — O(1) index returns correct entry / false for unknown pubkey - [x] `TestMultibyteCapLoadFromDB` — only `sup>0` rows loaded; `sup==0` row excluded - [x] `TestSchemaMultibyteSupColumns` — migration adds columns to both tables; idempotent on second `OpenStore` - [x] All existing `TestMultiByteCapability_*` tests pass unchanged - [x] Full ingestor test suite: `ok` in 27s - [x] `go build ./cmd/server/ && go build ./cmd/ingestor/` clean 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: openclaw-bot <bot@openclaw> |
||
|
|
9d3dd8df0a |
fix(packets): order by ingest id, not rxTime — fresh activity visible on packets page (#1345) (#1349)
## Summary Fixes #1345 — the packets page shows "no recent activity" while MQTT ingest is healthy because the default `/api/packets` query was `ORDER BY first_seen DESC`, and PR #1233 redefined `first_seen` as the observer's radio receive time (rxTime). When an observer buffers offline and uploads hours later, its packets land with hours-old `first_seen` values; older-ingested packets with fresher rxTime then crowd the top of the list and the visually freshest activity disappears. ## Fix Switch the default ordering to `t.id DESC` (ingest order) on `/api/packets` and the closely-related endpoints. `id` is monotonic with ingest time and immune to buffered uploads. Endpoints changed (all use the same fix for the same reason): | Path | Function | File | |------|----------|------| | `GET /api/packets` (default) | `DB.QueryPackets`, `Store.QueryPackets` | `cmd/server/db.go`, `cmd/server/store.go` | | `GET /api/packets?nodes=…` | `DB.QueryMultiNodePackets`, `Store.QueryMultiNodePackets` | same | | Node detail "recent transmissions" | `DB.GetRecentTransmissionsForNode` | `cmd/server/db.go` | ## `since=` semantic — preserved `since=` still filters by `first_seen` (RFC3339 path uses the observations.timestamp subquery), i.e. "packets the network received since X." Buffered uploads of older packets are still excluded from a `since=15m` view even if they were ingested in the last 15 minutes. Only the **display order** changes; filtering by receive time is unchanged. ## Audit — NOT changed - `Store.QueryGroupedPackets` already sorts by `LatestSeen` (max observation timestamp), which is correct for the grouped view and immune to the buffered-upload regression. - `GetChannelMessages` and channel `sample_json` subqueries keep `first_seen DESC` — channel message chronology is meaningful for message UX; if buffered uploads become a problem here too it's a separate UX call (out of scope for #1345). - `s.packets` insertion ordering (Load + ingest) — untouched. The fix sorts at query time so we don't perturb `oldestLoaded` invariants. ## Tests — TDD red → green - Red: `508f4371` adds `cmd/server/packets_order_test.go` with two cases — order assertion (failed on master with `[fresh, buffered]`) and since-filter semantic (RFC3339 path uses observation timestamps). - Green: `0fd685e7` switches the SQL + in-memory ordering. Tests pass; full `cmd/server` suite green locally (44s). ## Out of scope - Re-thinking #1233's first_seen semantics - Adding a UI sort toggle (issue's option 2) - Channel-message page ordering ## Preflight Clean (`bash ~/.openclaw/skills/pr-preflight/scripts/run-all.sh origin/master`). --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
88bc5d9d3b |
fix(#1373): drop ghost "unknown" channel bucket from /api/channels for encrypted-no-key packets (#1377)
## What Drops the ghost `unknown` channel bucket from `/api/channels` for encrypted GRP_TXT packets whose decoded JSON sets `channel=""` (server has no PSK to decrypt). Fix A from issue #1373 — cosmetic / immediate. Fix B (server-side decryption / key sharing) is intentionally out of scope and remains for a follow-up issue. ## Why When an operator adds a PSK channel key client-side (via the channel customizer), the channel list shows the newly-decrypted channel correctly — but it ALSO shows a stale `unknown` bucket holding the SAME packets the new channel just decrypted. The bucket is a server-side debug catch-all (`if channelName == "" { channelName = "unknown" }`) that leaks into the user-facing channel list. It's not a real channel; dropping it from `/api/channels` is the right fix until/unless server-side decryption lands. Choice made: keep the `channelName = "unknown"` fallback path removed by adding an early `continue` BEFORE the bucket is created. This keeps the diff minimal, preserves the `hasGarbageChars` filter ordering, and makes the intent obvious ("encrypted-no-key packets are not channels"). The DB path (`cmd/server/db.go`) already filters NULL `channel_hash` at the SQL level and `continue`s on empty; the test pins that contract. ## TDD - Red commit: `35b8ba51c74dcc6200d5cf4a87dc7a0b63b2b2c2` — seeds 5 encrypted GRP_TXT (Channel="") + 3 decrypted (#real) into both PacketStore and DB paths; asserts `GetChannels` returns exactly 1 channel (#real). Fails on assertions, not compile. - Green commit: see follow-up commit on this branch — drops the `"unknown"` fallback in `cmd/server/store.go` `GetChannels`; DB path unchanged (already correct, test pins it). ## Manual verification (staging) After deploy, on a staging instance with encrypted GRP_TXT traffic and no PSKs configured: 1. `curl -s https://staging/api/channels | jq '[.[] | select(.name == "unknown")] | length'` → `0` 2. Real channels with known hashes still appear with correct messageCount. ## Files changed - `cmd/server/store.go` — drop the `if channelName == "" { channelName = "unknown" }` fallback; skip the packet instead. - `cmd/server/channels_no_unknown_bucket_1373_test.go` — new test covering both code paths. Fixes #1373 --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
c7ab5f3eb9 |
fix(#1366): channels view shows latest message time — backend emits LatestSeen, not FirstSeen (#1368)
Red commit:
|
||
|
|
317b59ab10 |
feat: area-based visual node filter — attribute packets by transmitter GPS (#804) (#839)
## Summary - Adds configurable GPS polygon areas to `config.json`; nodes are attributed to an area if their last-known position falls inside the polygon - New `Area: …` dropdown filter (matching the existing region filter style) appears on all analytics, nodes, packets, map, and live screens when areas are configured - Backend resolves area membership with a 30s TTL cache; area filter bypasses the 500-node cap on `/api/bulk-health` so all area nodes are always returned - Includes a polygon builder tool (`/area-map.html`) for drawing and exporting area boundaries ## Changes **Backend** - `AreaEntry` type + `Areas` config field - `GetNodePubkeysInArea` DB query + `resolveAreaNodes` (30s TTL, `areaNodeMu` RWMutex) - `PacketQuery.Area` + `filterPackets` polygon check - `?area=` param propagated through all analytics, topology, clock-health, and bulk-health routes - `/api/config/areas` endpoint **Frontend** - `area-filter.js`: single-select dropdown, persists to localStorage, cleans up stale keys on load - Wired into analytics, nodes, packets, channels, map, and live pages - Live map clears node markers on area change **Docs & tools** - `docs/user-guide/area-filter.md` — configuration and usage guide - `docs/api-spec.md` — updated with new endpoint and `?area=` param table - `tools/area-map.html` — polygon builder for defining area boundaries - Demo areas added to `config.example.json` ## Test plan - [x] No areas configured → filter dropdown does not appear on any page - [x] Areas configured → dropdown appears, "All" selected by default - [x] Selecting an area filters nodes/packets/topology/map correctly - [x] Selecting "All" restores unfiltered view - [x] Selection persists across page reloads (localStorage) - [x] Stale localStorage key (area removed from config) is cleared on load - [x] `/api/bulk-health?area=X` returns all nodes in area (no 500-node cap) - [x] `/api/config/areas` returns correct list 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Kpa-clawbot <kpaclawbot@outlook.com> Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
ba6c2ac6ba |
feat: repeater liveness indicator with relay stats (#662) (#755)
## Summary
- **Backend**: adds `relayTimes` in-memory index (sorted unix-millis per
repeater pubkey), maintained in lockstep with `byPathHop`. Populated at
startup from all packet observations (not just best), updated on
ingest/evict/backfill. Exposes `relay_count_1h`, `relay_count_24h`,
`last_relayed` in both `/api/nodes` (for repeaters) and
`/api/nodes/{pubkey}/health`.
- **Frontend**: `getNodeStatus` extended to three-state (`relaying` /
`active` / `stale`) for repeaters based on relay_count_24h.
`getStatusInfo` is the single source of truth for status label,
explanation, and relay stats. Detail pane shows relay counts and last
relayed time. Nodes list gets a status emoji column with hover tooltip
showing relay info.
- **Correctness fixes**: relay index scans all observations per packet
(not just best); backfill now updates relay index after resolving paths;
pubkeys lowercased consistently throughout index.
## Changes
### `cmd/server/store.go`
- `relayTimes map[string][]int64` field added to `PacketStore`
- `addTxToRelayTimeIndex` / `removeFromRelayTimeIndex`: scan all
observations, idempotent sorted insert, lowercase keys
- `relayMetrics(times, nowMs)`: returns `(count1h, count24h,
lastRelayed)`
- `buildPathHopIndex`: populates `relayTimes` at startup
- `pollAndMerge`: updates relay index on ingest and eviction; new `else`
branch for path-unchanged observations
- `addTxToPathHopIndex` / `removeTxFromPathHopIndex`: lowercase resolved
pubkeys (fixes casing mismatch with lookup)
### `cmd/server/routes.go`
- `GetBulkHealth` / `GetNodeHealth`: include relay stats for repeater
nodes
- `handleNodes`: enriches repeater nodes with relay stats from
`relayTimes` so list view has same data as detail pane
### `cmd/server/neighbor_persist.go`
- `backfillResolvedPathsAsync`: calls `addTxToRelayTimeIndex` after
`pickBestObservation` to capture newly resolved pubkeys
### `public/roles.js`
- `getNodeStatus(role, lastSeenMs, relayCount24h)`: three-state logic
for repeaters
- `getStatusInfo(n)`: single source of truth returning status, label,
explanation, relay counts, last relayed
### `public/nodes.js`
- Detail pane: `n.stats` populated from health endpoint before
`getStatusInfo` call
- Nodes list: status emoji column with relay hover tooltip; status
filter uses `getStatusInfo`
### Tests
- `relay_liveness_test.go`: index functions, relay metrics, wiring
integration, bulk/single health endpoints
- `test-repeater-liveness.js`: three-state frontend logic, backward
compat
## Test plan
- [x] Repeater with recent relay traffic shows green relaying emoji in
list and detail pane
- [x] Repeater with no relay traffic in 24h shows yellow idle in both
views
- [x] Repeater not heard recently shows grey stale in both views
- [x] Non-repeater nodes unaffected (no relay stats, no status change)
- [x] Hover tooltip on list emoji shows relay count and last relayed
time
- [x] `go test ./...` passes
- [x] `node test-repeater-liveness.js` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: openclaw-bot <bot@openclaw.local>
|
||
|
|
38eb7103b3 |
perf(nodes): batch relay stats to fix O(N×M) /api/nodes regression (#1164)
## Problem
`handleNodes` enriches each repeater/room node by calling
`GetRepeaterRelayInfo` and `GetRepeaterUsefulnessScore` **per node**
inside a loop. `GetRepeaterUsefulnessScore` acquires `s.mu.RLock()` and
then iterates **all** `byPayloadType` entries to compute the non-advert
denominator — once per node.
On a deployment with ~1500 repeater/room nodes and ~145K transmissions
in memory, this is **~220M iterations per `/api/nodes` request**, plus
~3000 separate lock acquisitions. Response times of 18–44 seconds have
been observed in production, especially during startup backfill when
write-lock contention compounds the issue.
## Fix
Add `GetRepeaterNodeStatsBatch(pubkeys []string, windowHours float64)
map[string]RepeaterNodeStats` to `repeater_usefulness.go`:
- Takes **one** `s.mu.RLock()` for the entire node list
- Computes the non-advert denominator **once** (shared across all nodes)
- Snapshots `byPathHop` slice headers for all requested pubkeys under
that single lock
- Processes timestamps and counts **outside** the lock
Update `handleNodes` to collect repeater/room pubkeys first, call the
batch method once, and apply results.
**Complexity: O(M + N) instead of O(N × M)** per request (M = total
transmissions, N = repeater nodes).
`GetRepeaterRelayInfo` and `GetRepeaterUsefulnessScore` are unchanged —
they are still correct for single-node calls (e.g. `handleNodeDetail`).
## Test plan
- [ ] `go build ./cmd/server` passes
- [ ] `/api/nodes` response is correct (relay_active,
relay_count_1h/24h, usefulness_score fields present for repeaters)
- [ ] No change in output for `/api/nodes/{pubkey}` (uses existing
single-node methods)
- [ ] CI passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: openclaw-bot <bot@openclaw.local>
|
||
|
|
9383201c07 |
refactor(db): finish #1283 — Option 4: ingestor owns neighbor-graph + schema migrations; server is read-only (fixes #1287) (#1289)
Red commit:
https://github.com/Kpa-clawbot/CoreScope/commit/eae179b99b5fd34924547632aa8f8025c405aa53
(CI: pending — opens with this PR)
Finishes #1283. RED test `TestServerSourceHasNoCachedRWCalls` goes from
failing (13 writer call-sites) to GREEN (zero). Per #1287 Option 4
(https://github.com/Kpa-clawbot/CoreScope/issues/1287#issuecomment-4485099992):
ingestor owns the neighbor graph build + persist; server reads the
snapshot.
**Category A — Schema migrations** → new `internal/dbschema` package.
`dbschema.Apply(rw)` runs in `cmd/ingestor` startup (in `OpenStore`).
`dbschema.AssertReady(ro)` runs in `cmd/server/main.go` and
FATAL-LOG-EXITS if any expected column/index/table is missing — the
operator must restart the ingestor first. Covers indexes,
`neighbor_edges`, `observations.resolved_path`,
`observers.{inactive,last_packet_at,iata}`,
`(inactive_)nodes.foreign_advert`, `transmissions.from_pubkey`.
**Category B — Backfill** → ingestor.
`BackfillFromPubkey` and observer-blacklist soft-delete moved to
`cmd/ingestor/maintenance.go`. Server keeps an inert
`fromPubkeyBackfillSnapshot` stub for `/api/healthz` API compatibility.
**Category C — Neighbor-graph persistence (Option 4)** → ingestor
writes, server reads.
- Ingestor (`cmd/ingestor/neighbor_builder.go`): every 60s scans
`observations + transmissions`, extracts edges (originator↔first-hop for
ADVERTs; observer↔last-hop for all), resolves hop prefixes via a
node-table prefix index, upserts into `neighbor_edges`.
- Server (`cmd/server/neighbor_recomputer.go`): every 60s re-reads
`neighbor_edges` and atomic-swaps the resulting `NeighborGraph` into
`s.graph`. Initial load is synchronous on startup. All server-side
incremental edge writers (the two `asyncPersistResolvedPathsAndEdges`
paths in `cmd/server/store.go`) are gone.
- Neighbor-edge daily prune (`PruneNeighborEdges`) moved to ingestor.
**Why Option 4**: clean read/write separation, no startup CPU spike
(server loads existing snapshot instead of rebuilding from history), no
IPC/delta-protocol churn. Staleness budget ~60s — same model as the
analytics recomputers in #1240 / #1248 / #672 axis 2.
**Recomputer interval default for neighbor graph**: 60s
(`NeighborGraphRecomputerDefaultInterval`,
`NeighborEdgesBuilderInterval`).
**Invariants added**:
- `TestServerSourceHasNoCachedRWCalls` (RED commit
|
||
|
|
749fdc114f |
feat(decoder+ui): close remaining P2 items from #1279 — payloadTypeNames, legend, TransportCodes, Feat1/2, RAW_CUSTOM, sensor docs (#1291)
RED commit: `dc4c0800` — CI: https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1279-p2 Closes the remaining six 🟢 P2 items in umbrella #1279 (PR #1280 shipped P0+P1, PR #1276 shipped ACK/RESPONSE/PATH legend rows). ### Item-by-item | # | Item | Where | Test | |---|---|---|---| | 1 | `payloadTypeNames` parity | `cmd/server/store.go` | `cmd/server/issue1279_p2_test.go::TestPayloadTypeNamesAll13` | | 2 | Legend rows: Anon Req / Grp Data / Multipart / Control / Raw Custom | `public/live.js` | `test-issue-1279-legend-p2-e2e.js` (Playwright) | | 3 | TransportCodes detail-row + `code1=` / `code2=` filter grammar | `public/packets.js`, `public/packet-filter.js` | `test-issue-1279-p2-code-filter.js` (6 cases) | | 4 | Multibyte capability badge on node detail/list rows | `public/nodes.js::renderNodeBadges` | `n.hash_size >= 2` (observable Feat1/Feat2 proxy; firmware `AdvertDataHelpers.h:14-16`) | | 5 | RAW_CUSTOM (0x0F) `{rawLength, firstByteTag}` decode + detail-row | `cmd/server/decoder.go`, `cmd/ingestor/decoder.go`, `public/packets.js` | `TestDecodeRawCustomExposesLengthAndTag` × 2 + updated `TestDecodePayloadRAWCustom` | | 6 | Sensor advert telemetry firmware-derivation comments | `cmd/ingestor/decoder.go:363-380` | pure comments — exempt per AGENTS | ### Firmware refs cited inline - `firmware/src/Packet.h:19-32` — PAYLOAD_TYPE_* constants - `firmware/src/Packet.h:46` — TransportCodes wire layout - `firmware/src/Mesh.cpp:577` — `createRawData` - `firmware/src/helpers/SensorMesh.{h,cpp}` — sensor advert telemetry derivation - `firmware/src/helpers/AdvertDataHelpers.h:14-16` — Feat1/Feat2 ### TDD Red `dc4c0800` proves the assertions gate behavior: - `payloadTypeNames` had only 12 entries (no 0x0F). - RAW_CUSTOM decoded as `UNKNOWN` with no envelope fields. Green `<HEAD>` makes both green; per-item tests included. ### Cross-stack note Cross-stack: justified — items 1/5 add decoder output fields; items 2/3/4/5 surface those fields in the UI in the same PR per #1279 acceptance. ### Out of scope Item 4 surfaces the observable multibyte capability via the persisted `hash_size` (Feat1/Feat2 wire bits are only on transient adverts and not stored per-node today); persisting raw Feat1/Feat2 per-node is left for a follow-up. Fixes #1279 --------- Co-authored-by: bot <bot@corescope> |
||
|
|
e6c30e1a7e |
feat(decoder): GRP_DATA + MULTIPART + advertRole fix + CONTROL flags (#1279 P0+P1) (#1280)
Addresses the four P0+P1 firmware reconciliation gaps from the umbrella audit (issue #1279). RED commit: `0a4c084e` (asserts on stub returns; all 13 assertions fail). GREEN commit: `13867681`. ## What's in this PR ### P0 — silently dropped data - **#1 GRP_DATA (0x06) decoder.** Outer envelope is the same shape as GRP_TXT (`channel_hash(1)+MAC(2)+ciphertext`) per `firmware/src/helpers/BaseChatMesh.cpp:476,500`. Factored `decryptChannelBlock(...)` helper used by both 5 and 6. When a channel key matches, the inner is parsed per `firmware/src/helpers/BaseChatMesh.cpp:382-385` as `data_type(uint16 LE) + data_len(1) + blob(data_len)`. Surfaces `{channelHash, MAC, dataType, dataLen, decryptedBlob}` on decrypt or `{channelHash, MAC, encryptedData}` otherwise. Server-side decoder surfaces envelope only (no key store). - **#2 MULTIPART (0x0A) decoder.** Per `firmware/src/Mesh.cpp:289`, byte0 = `(remaining<<4) | inner_type`. When `inner_type == PAYLOAD_TYPE_ACK (0x03)`, next 4 bytes are the LE ack_crc per `firmware/src/Mesh.cpp:292-307`. Surfaces `{remaining, innerType, innerTypeName, innerAckCrc | innerPayload}`. ### P1 — mis-classified / opaque - **#3 `advertRole()` raw-type fix.** Per `firmware/src/helpers/AdvertDataHelpers.h:7-12`, ADV_TYPE_NONE = 0 and 5-15 are FUTURE. The previous boolean fallback collapsed both into `"companion"`, silently relabelling unknown/reserved types. New behaviour: type 0 → `none`, 1 → `companion`, 2-4 → `repeater`/`room`/`sensor`, 5-15 → `type-N`. `ValidateAdvert` accepts the new labels. - **#4 CONTROL (0x0B) byte0 flags + length.** Per `firmware/src/Mesh.cpp:69` + `createControlData` at `Mesh.cpp:609`, byte0 high-bit marks the zero-hop direct subset. Surfaces `{ctrlFlags, ctrlZeroHop, ctrlLength}`. ### Drift fix - `cmd/server/store.go` `payloadTypeNames` now includes `6: GRP_DATA` and `10: MULTIPART` (previously omitted; canonical decoder map already had them). ## Lockstep & TDD Both `cmd/ingestor/decoder.go` and `cmd/server/decoder.go` updated in the same commits — same wire-vector tests live in both packages (`cmd/{ingestor,server}/issue1279_test.go`). Per-item RED→GREEN visible in `git log`. | Item | Tests | RED proof | |---|---|---| | #1 GRP_DATA | ingestor: NoKey + DecryptedInner; server: Envelope | 6 assertions failed pre-impl | | #2 MULTIPART | ingestor + server: Ack + NonAck | 8 assertions failed pre-impl | | #3 advertRole | ingestor + server: 7-row table | 3 assertions failed pre-impl | | #4 CONTROL | ingestor + server: ZeroHop + MultiHop | 6 assertions failed pre-impl | ## What's NOT in this PR The umbrella issue lists P2 items that ship in follow-up PRs: - Live + compare legend entries for the long tail of newly-named types (#1274 + others). - TransportCodes UI surface + filter grammar. - feat1/feat2 capability badges. - `payloadTypeNames` consolidation across server/ingestor (drift-prevention). Leave the umbrella open after this merges. Refs #1279 --------- Co-authored-by: OpenClaw Bot <bot@openclaw.local> |
||
|
|
8bf7709970 |
feat(repeater): usefulness score — bridge axis (#672 axis 2 of 4) (#1275)
RED test commit: `fd661569` — CI will fail on this (stub returns empty map; assertions fail by design). GREEN: `bf4b8592`. ## What Implements **axis 2 of 4** for the repeater usefulness score per #672 ([status comment](https://github.com/Kpa-clawbot/CoreScope/issues/672#issuecomment-4484635378)). The Bridge axis measures *structural importance*: how many shortest paths between other nodes route through this one. A high-traffic redundant node and a low-traffic critical bridge will no longer look identical. ## Algorithm **Brandes' weighted betweenness centrality** with Dijkstra for shortest paths (`cmd/server/bridge_score.go`). - Nodes: pubkeys in the `neighbor_edges` graph - Edge weight: `Score(now) * Confidence()` — per the convention from #1235 (count + recency decay scaled by observer-diversity confidence). Geo-rejected edges already excluded at graph build time (#1230) so we don't re-filter here. - Dijkstra distance: `1 / max(epsilon, weight)` — high affinity = cheap cost. - Normalize: divide by max observed centrality so output is in `[0, 1]`. Cost: `O(V · (E + V log V))`. Staging-scale (~600 nodes / ~2 000 edges) ≈ ~4.8M ops, completes in milliseconds. ## Where it lives - `cmd/server/bridge_score.go` — pure algorithm, no locks - `cmd/server/bridge_recomputer.go` — background recomputer (mirrors #1240/#1262 pattern), 5-min default interval, initial sync prewarm, snapshot stored in `s.bridgeScoreMap atomic.Pointer[map[string]float64]` - `cmd/server/routes.go` — `handleNodes` adds `node["bridge_score"]` on repeater/room rows; node-detail handler adds it on the single-node path - `public/nodes.js` — separate **Bridge** row in the node detail panel, alongside the existing **Usefulness** (Traffic) row. Distinct colour-coded bar. ## What's NOT in this PR (still pending for #672) - **Coverage axis** (axis 3) — unique observer-pair connectivity - **Redundancy axis** (axis 4) — simulated node-removal impact - **Composite** — once all 4 axes ship, swap the `usefulness_score` formula from "traffic-only" to the weighted composite `Refs #672` (not `Fixes` — issue stays open until all 4 axes + composite ship). ## Tests - `TestComputeBridgeScores_LineGraph` — 4-node line: middles non-zero, leaves zero, max normalized to 1.0 - `TestComputeBridgeScores_TriangleNoBridge` — clique has zero bridges - `TestComputeBridgeScores_Empty` — defensive nil-safety - `TestComputeBridgeScores_WeightSensitive` — mutation guard: revert the `1/w` inversion and this test fails - `TestBridgeScore_HandleNodesSurface` — integration: `/api/nodes` returns `bridge_score` on repeater rows; middle nodes > 0, ends == 0 --------- Co-authored-by: clawbot <bot@meshcore.local> |
||
|
|
4cd8445233 |
perf(#1265): wire /api/observers/clock-skew + /api/nodes/clock-skew into analytics recomputer (#1266)
RED:
|
||
|
|
ae17a2be12 |
perf(#1262): /api/nodes?limit=2000 cold-miss 15.7s → <100ms — prewarm repeater enrichment cache (#1263)
RED commit: `22ce5736066142583017cad7303fa48d9e00ccf0` — CI on red: https://github.com/Kpa-clawbot/CoreScope/actions?query=branch%3Afix%2Fissue-1262 ## Problem After #1260 added a 15s-TTL bulk cache for repeater enrichment in `handleNodes`, `/api/nodes` (default limit) dropped to ~500ms. But `/api/nodes?limit=2000` — called by `public/live.js` at SPA startup for hop resolution — still took **15.7s cold** on staging (75k tx, 600 nodes). Warm hits were ~40ms. Root cause: the bulk cache was lazily populated on the first request after TTL expiry. The rebuild ran on the request-serving goroutine. Every cold SPA load triggered the rebuild and ate 15s. ## Fix Add `StartRepeaterEnrichmentRecomputer` — a steady-state background recomputer that mirrors the `analytics_recomputer.go` pattern from #1240: - **Prewarm**: initial synchronous compute on Start so the first request hits a populated cache. - **Steady-state**: ticker refreshes the snapshot every 5min (configurable via the existing analytics recompute interval knob). - **Panic-safe** + idempotent Start. Wired into `main.go` right after `StartAnalyticsRecomputers`, using `cfg.GetHealthThresholds().RelayActiveHours` as the window. ## Test `TestHandleNodesLimit2000ColdMiss` — seeds 600 nodes + 150k non-advert tx with repeaters indexed under a shared 1-byte hop prefix (matches production hop-prefix collisions), starts the recomputer, then issues `/api/nodes?limit=2000` with **no HTTP warmup**. | State | Latency | |---|---| | Before (master, on-thread rebuild) | 3.37s | | After (prewarm + steady-state) | 56ms | | Budget | 2s | Staging end-to-end: 15.7s → expected sub-100ms on the same call path. Red commit (`22ce5736066142583017cad7303fa48d9e00ccf0`) compiles with a no-op stub of the new method so the test fails on the latency **assertion**, not a missing symbol. Fixes #1262 --------- Co-authored-by: corescope-bot <bot@corescope.local> |
||
|
|
1efe93d7f6 |
perf(#1257): bulk-cache repeater enrichment in /api/nodes — 32s → <500ms (#1260)
RED commit `a2879e12` — perf regression test; CI run: see Actions tab. Fixes #1257. ## Root cause `handleNodes` looped over the response page and called `store.GetRepeaterRelayInfo(pk, win)` + `store.GetRepeaterUsefulnessScore(pk)` for every repeater/room. Each call: - grabbed its own `s.mu.RLock`, - walked `byPathHop[pk]` (+ the matching 1-byte raw-prefix bucket, which on busy networks fans out to nearly the entire non-advert tx set), - and re-parsed every `tx.FirstSeen` with `parseRelayTS`. Default page is the 50 most-recently-seen nodes — almost all hot repeaters — so the request did O(50) lock acquisitions and hundreds of thousands of timestamp parses on the same set of txs. That's the classic load-then-paginate / per-row N+1 shape called out in the issue (same family as #1226). The `?limit=2000` variant looks faster relatively only because per-node enrichment dwarfs serialization; on staging both still bottleneck on the same loop. ## Fix Two new bulk methods on `PacketStore`: - `GetRepeaterRelayInfoMap(windowHours)` → `pubkey → RepeaterRelayInfo` - `GetRepeaterUsefulnessScoreMap()` → `pubkey → 0..1` Both snapshot `byPathHop` under a single `RLock`, pre-parse each `FirstSeen` exactly once (a tx that appears in N hop buckets used to be parsed N times), and emit one entry per hop key. Cached 15s — same TTL as `GetNodeHashSizeInfo` / `GetMultiByteCapMap`, same status-column freshness budget. `handleNodes` is one map-lookup per node; behavior, output schema, and `RelayActive` / `RelayCount{1h,24h}` / `LastRelayed` / `usefulness_score` semantics are preserved. ## Why no `limit` default change The issue mentioned a default-limit knob. Investigated: `queryInt(r, "limit", 50)` already defaults to 50 — frontends calling `/api/nodes` (no limit) get a 50-row page today. Capping further would change behavior (live.js already passes `?limit=2000` when it wants more); the cost was per-repeater enrichment, not page size. Fixing the N+1 is the correct lever and preserves backward compat. ## Perf Regression test `TestHandleNodesPerfLargeFleet` (600 nodes, 150k non-advert tx, repeaters indexed under `byPathHop`): | | elapsed | vs 2s budget | |---|---|---| | before (master) | 4.72s | ✗ | | after | ~4ms | ✓ (~1000×) | ## TDD - RED: `a2879e12` — test fails at 4.72s on master. - GREEN: `c529d29a` — fix; full `cmd/server` + `cmd/ingestor` suites green. --------- Co-authored-by: corescope-bot <bot@corescope> |
||
|
|
f81ed5b3cf |
perf(#1256): wire /api/analytics/roles into steady-state recomputer (#1259)
RED commit: `0190466d` — failing CI: https://github.com/Kpa-clawbot/CoreScope/actions (will populate after PR creation) ## Problem On staging (commit `d69d9fb`, 78k tx, 2.3M obs), `curl http://localhost/api/analytics/roles` times out at 60s with 0 bytes — the Roles tab is unusable. Issue #1256. PR #1248's steady-state recomputer fan-out (topology / rf / distance / channels / hash-collisions / hash-sizes) **didn't include roles**. The legacy handler: 1. Holds `s.mu.RLock` for the entire compute. 2. Calls `GetFleetClockSkew()`, which drives `clockSkew.Recompute(s)` over all ADVERT transmissions — O(78k) per request. 3. Concurrent ingest writers compound the latency through writer-starvation. Result: every request hits the cold path; the response never comes back inside the 60 s HTTP budget. ## Fix Add `roles` as the 7th endpoint in the recomputer fan-out — same pattern as #1248: - `PacketStore.recompRoles` slot, registered in `StartAnalyticsRecomputers` with default 5-min interval. - `PacketStore.GetAnalyticsRoles()` → atomic-pointer load from the snapshot (sub-ms), with a `computeAnalyticsRoles()` fallback only for the brief startup window before the initial sync compute completes. - Handler is now a thin wrapper — no lock-held work on the request path. - New optional `roles` key under `analytics.recomputeIntervalSeconds` in config; `config.example.json` and `_comment_analytics` updated. ## Latency (unit-scope benchmark) - Worst-of-50 handler latency: **<100 ms** (test budget; well under the 2 s p99 acceptance). - Compute itself is bounded by the existing 5-min recompute window — it runs once in the background, never on the request path. ## Tests - RED `0190466d`: asserts `recompRoles` is registered and the handler returns under the latency budget. Fails on master with `recompRoles not registered`. - GREEN `d7784f76`: registers the recomputer + snapshot accessor — both tests pass. Fixes #1256 --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
d69d9fbf8e |
perf(#1247): surgical fix for resolveWithContext tier-1 hot path (4.6× speedup) (#1253)
## Summary Surgical fix for #1247: analytics endpoints regressed 3-9× between prod `d818527` and master. pprof against staging traced the regression to `resolveWithContext` tier-1 affinity loop running on every analytics `resolveHop` call (post-#1198 plumbing) with redundant per-(cand, ctx) work. **Result: 4.6× speedup on the synthetic hot-shape benchmark (202µs → 44µs / op).** ## Root cause - PR #1198 (`353c5264`) lit up `resolveWithContext` tier 1 from every analytics resolveHop closure (previously they passed `contextPubkeys=nil` and short-circuited the entire tier-1 block). - The inner loop did `N_cand × N_ctx` iterations where each one did: - `graph.Neighbors(strings.ToLower(ctxPK))` — graph RLock + ToLower allocation **per candidate**, redundantly - `strings.ToLower(cand.PublicKey)` per `ctxPK` - `strings.EqualFold(otherPK, ctxPK)` + `EqualFold(otherPK, candPK)` — both sides were already lowercased (`NeighborEdge.NodeA/B` via `makeEdgeKey`; `contextPubkeys` via `buildHopContextPubkeys`) - At staging scale (5k+ contextPubkeys × 30k+ resolveHop calls) this dominated `computeAnalyticsTopology` (37% of its CPU) and `computeAnalyticsRF` (55%). ## pprof attribution (staging, region-keyed queries bypassing #1240 cache) ``` computeAnalyticsTopology cum: 19.24% (5.45s / 28.32s sampled) └─ resolveWithContext 37% ├─ strings.ToLower 41% ├─ strings.EqualFold 28% └─ graph.Neighbors 24% computeAnalyticsRF cum: 10.38% ``` ## Fix (~80 LoC in `cmd/server/store.go`) 1. Lowercase `contextPubkeys` **once per call**, skipped entirely when already lowercased (the analytics fast path). 2. Lowercase candidate pubkeys **once per call**. 3. Invert the loop nesting: outer-ctx / inner-edge / candidate-map lookup. `graph.Neighbors` is called once per context pubkey instead of `N_cand` times. 4. Raw `==` instead of `strings.EqualFold` for pubkey comparisons (both sides lowercased by step 1/2). 5. Added a tiny `hasUpperASCII` byte-loop helper next to `isHexLower` for the fast-path check. Behavior preserved: same `Score × Confidence` formula, same tier-1 ratio + min-observations gate, same per-candidate "best edge wins" semantics. No change to tiers 2/3/4. ## TDD evidence - Red commit (`5f8d1564`): `TestResolveWithContextTier1Floor` asserts `<100 µs/call` on the hot shape. **199 µs/call on regressed master → FAIL.** - Green commit (`e3bdbc65`): surgical fix lands. **44 µs/call → PASS.** - Reverification: locally stashed the fix, ran the test → 199.5 µs FAIL; popped fix → 44 µs PASS. `BenchmarkResolveWithContextTier1Hot` (no assertion, visibility only): ``` before: 202013 ns/op 168 B/op 3 allocs/op after: 44084 ns/op 424 B/op 6 allocs/op speedup: 4.6× ``` (Post-fix allocs are O(N_cand + N_ctx) one-time helper tables — net win at hot scale.) ## Independence from #1248 PR #1248 caches the analytics compute output so user-facing latency is sub-ms even when the compute is slow. That's correct for UX but it masks the regression. This PR repairs the compute itself, so: - Region-keyed and windowed queries (which bypass the recomputer cache by design — see #1240) become fast again. - Future ingest scale or feature work on top of the regressed baseline doesn't compound. ## Out of scope - The geo-rejection (#1228) and Confidence weighting (#1229) commits — kept intact, they protect correctness and were not the dominant CPU cost. - Reverting any suspect commit — surgical only. ## Acceptance criteria from #1247 - [x] pprof confirms the hot function (`resolveWithContext`) - [x] Bisect identifies the regressing commit (`353c5264` / PR #1198 — context plumbing; ratified by pprof, no need to actually rebuild 5 binaries) - [x] Fix lands; tier-1 hot path 4.6× faster - [x] No regression in disambiguator correctness — full `go test ./...` green, all existing `ResolveWithContext` / `HopDisambig` / `NeighborGraph` / `Affinity` tests pass Fixes #1247 --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
356f001027 |
perf(#1240): steady-state background recompute for analytics endpoints (#1248)
RED commit: `27630f6a` — adds latency test that fails on master (p99=225ms > 50ms budget) and a stub `StartAnalyticsRecomputers` that returns a no-op so the assertion (not a build error) gates the change. GREEN commit: `20fbbceb` — wires real background recompute infrastructure. Test passes at p99=~1µs. ## What changed Replaces the on-request "compute-then-cache" pattern for the default-shape analytics queries with a steady-state background recompute loop. Reads always hit an `atomic.Value` snapshot in <1µs regardless of compute cost or writer contention. Operator principle: serving slightly stale data quickly beats real-time data slowly. ## Endpoints converted (default 5min interval each) | Endpoint | Cold compute | Recomputer interval | |---|---|---| | `/api/analytics/topology` | ~5s | 5 min | | `/api/analytics/rf` | ~4s | 5 min | | `/api/analytics/distance` | ~3s | 5 min | | `/api/analytics/channels` | ~0.5s | 5 min | | `/api/analytics/hash-collisions` | ~0.5s | 5 min | | `/api/analytics/hash-sizes` | ~22ms | 5 min | All intervals configurable per-endpoint via `analytics.recomputeIntervalSeconds.<name>` in `config.json`; documented in `config.example.json`. Default override via `analytics.defaultIntervalSeconds`. ## Scope: default query only Only the canonical shape `(region="", window=zero)` is precomputed. Region- or window-filtered requests fall back to the legacy TTL cache + on-request compute — keeps recomputer count bounded (6, not 6×N×M). ## Latency Test `TestAnalyticsRecomputerSteadyStateLatency`: 100 concurrent readers + 4 writers churning `s.mu.Lock` on 20k distHops. - Before: p50=188ms p99=225ms (assertion failed) - After: p50=240ns p99=1.1µs (atomic load + map return) ## Shutdown integration `StartAnalyticsRecomputers` returns a stop closure invoked from `main.go`'s SIGTERM handler BEFORE `dbClose()` so any in-flight SQLite compute drains cleanly. `TestAnalyticsRecomputerShutdownNoLeak` confirms all 6 goroutines are reaped (Δ=6 within 2s). ## Safety details - Initial compute is synchronous in `Start()` — first read after startup never sees nil. - `recover()` inside `runOnce` keeps a compute panic from killing the goroutine; previous snapshot remains valid. - `analyticsRecomputerMu` is a sync.RWMutex; recomputer pointers are read-locked in the hot path. The atomic.Value swap inside `runOnce` is lock-free. Fixes #1240. --------- Co-authored-by: OpenClaw Bot <bot@openclaw.local> |
||
|
|
b881a09f02 |
feat(#1188): show observer IATA on packets + filter grammar (#1189)
Red commit:
|
||
|
|
2754251a53 |
perf(#1239): /api/analytics/distance — TTL 15s→60s + drop main RLock around compute (#1241)
## Summary Fixes #1239 — `/api/analytics/distance` 15s cold on staging under heavy ingest. Two independent fixes. First commit on this branch is the RED test for Fix B (`a539882`), demonstrating reader/writer contention against the main store lock. CI: see Actions tab for the run on the test-only commit — it asserts >150µs avg writer cycle and fails at 82367µs pre-fix. GREEN commit (`d3938f1`) brings it to 1µs. ## Fix A — TTL bump 15s → 60s (`5eae1e0`) - `rfCacheTTL` default in `cmd/server/store.go` changed from `15 * time.Second` to `60 * time.Second`. This is the shared TTL for RF / topology / distance / hash-sizes / subpath / channel analytics caches. - Per operator clarification (issue thread): distance analytics IS viewed live during analysis sessions, not background-glanced. 60s smooths the cold-miss churn during heavy ingest without freezing data. - `config.example.json`: documented `cacheTTL.analyticsRF` with new default + caveat. - Existing assertions (`TestCacheTTLDefaults`, `TestHashCollisionsCacheTTL`) updated to the new default. ## Fix B — Drop main RLock around compute (`a539882` red, `d3938f1` green) `computeAnalyticsDistance` previously held `s.mu.RLock()` for the entire iteration: region match-set construction, hop/path filtering, sort, dedup, histogram, category stats, time series. Readers serialized writers (ingest, `buildDistanceIndex`). Refactor: hold the RLock only long enough to snapshot the `distHops`/`distPaths` slice headers AND build the region match-set (which reads `tx.Observations`, mutated under `s.mu.Lock`). For `region=""` (the hot cold-call path) the lock hold is just the header snapshot — microseconds. Everything else runs on the locally-captured slices outside the lock. Safety: `distHops`/`distPaths` are append-only via re-slice in `buildDistanceIndex` / `updateDistanceIndexForTxs` (both under `s.mu.Lock`). If the backing array reallocates after the snapshot, the snapshot still references the prior array (GC-pinned) at the consistent length captured under the lock. Records are value types — no torn writes. ## Test results `cmd/server/distance_lock_contention_test.go` (8 reader goroutines × 20k synthetic distHops × 200 writer Lock/Unlock cycles): - pre-fix avg writer cycle: **82367µs** (16.5s for 200 cycles) - post-fix avg writer cycle: **1µs** (279µs for 200 cycles) - ~82000× reduction in writer contention; reader result shape unchanged Full `go test ./cmd/server/...` green with `-race`. ## Out of scope (per issue) - Same lock pattern in topology / RF / hash / subpath analytics — file separately if needed. - Per-region cache key sharding. - WebSocket-driven cache invalidation. --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
2e28aa3e04 |
fix(#1229): source-diversity confidence weighting in neighbor-graph tier-1 resolver (#1235)
RED |
||
|
|
eba9e89a72 |
fix(#1203): path-inspector — singleflight + stale-while-revalidate (#1208)
Red commit:
|
||
|
|
11d2026bb1 |
feat(startup): hot startup — load hotStartupHours synchronously, fill retentionHours in background (#1187)
Closes #1183 ## Summary - Adds `packetStore.hotStartupHours` config key (float64, default 0 = disabled). When set, `Load()` loads only that many hours of data synchronously, reducing startup time on large DBs. Background goroutine fills the remaining `retentionHours` window in daily chunks after startup completes. - A background goroutine (`loadBackgroundChunks`) fills the remaining `retentionHours` window in daily chunks after startup completes. Analytics indexes are rebuilt once at the end. - `QueryPackets` and `QueryGroupedPackets` check `oldestLoaded` and fall back to `db.QueryPackets()` for any query whose `Since`/`Until` predates the in-memory window — covering days 8–30 permanently (beyond `retentionHours`) and the background-fill gap during startup. - `/api/perf` gains `hotStartupHours`, `backgroundLoadComplete`, and `backgroundLoadProgress` fields inside `packetStore` so operators can monitor the fill. ### Drive-by fixes - E2E: added `gotoPackets` navigation helper used across packet-related tests - E2E: rewrote stripe assertion to check per-row stripe parity rather than a fragile computed-style comparison - E2E: theme test updated to use `#/home` as the initial route (was `#/`) - `db.go`: removed the RFC3339→unix-timestamp subquery path in `buildTransmissionWhere`; `t.first_seen` is now always compared directly as a string for both RFC3339 and non-RFC3339 inputs ## Configuration ```json "packetStore": { "retentionHours": 168, "hotStartupHours": 24 } ``` `hotStartupHours: 0` (default) preserves existing behavior exactly. Recommended for large DBs to reduce startup time; set to 0 to disable (loads full retentionHours at startup, legacy behavior). ## Test plan - [x] `TestHotStartupConfig_Clamp` — clamping when `hotStartupHours > retentionHours` - [x] `TestHotStartupConfig_ZeroIsDisabled` — zero leaves feature disabled - [x] `TestHotStartup_LoadsOnlyHotWindow` — only hot-window packets in memory after `Load()` - [x] `TestHotStartup_DisabledWhenZero` — all retention packets loaded when disabled - [x] `TestHotStartup_loadChunk_AddsOlderData` — chunk merges correctly, ASC order maintained - [x] `TestHotStartup_BackgroundFillsToRetention` — background goroutine fills to `retentionHours` - [x] `TestHotStartup_ChunkErrorRecovery` — chunk SQL failure logged and skipped, loop terminates - [x] `TestHotStartup_SQLFallback_TriggeredForOldDate` — query before `oldestLoaded` routes to SQL - [x] `TestHotStartup_SQLFallback_NotTriggeredForRecentDate` — recent query stays in-memory - [x] `TestHotStartup_PerfStats` — new fields present in `GetPerfStoreStats()` (backs the perf endpoint) - [x] `TestHotStartup_PerfStoreHTTP` — HTTP-level: GET /api/perf returns `hotStartupHours`, `backgroundLoadComplete`, `backgroundLoadProgress` in `packetStore` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: CoreScope Bot <bot@corescope.local> |
||
|
|
2beeb2b324 |
fix(#1199): 6 deferred quality items from PR #1198 r2 review (#1200)
Red commit: 75563ce (CI run: pending — pushed at branch open) Follows up PR #1198 round-2 adversarial review (issue #1199). Six robustness / perf-hot-path / maintenance items, one commit per logical change. Stacked on top of `fix/issue-1197` (PR #1198) — base must move to `master` after #1198 merges. | # | Item | Commit(s) | Discipline | |---|---|---|---| | 1 | Brittle static-grep regex → go/parser AST walk in `resolve_context_callsites_test.go` | 33d80b6 (RED) → 450236d (GREEN) | red→green | | 2 | `computeAnalyticsTopology` double-pass filter → materialize `filteredTxs` once | 00005f6 | refactor | | 3 | `BenchmarkBuildAggregateHopContextPubkeys` baseline + tiny smoke test | b520048 | net-new bench/test | | 4 | `hopResolverPerTx` CONCURRENCY doc — single-goroutine invariant | 155ff07 | doc-only | | 5 | `schemaDegradationLogged` package-level `sync.Map` → PacketStore field | 75563ce (RED) → 7dbf193 (GREEN) | red→green | | 6 | `buildHopContextPubkeys` `out` slice cap hint (`make([]string, 0, 16)`) | 2040962 | refactor | Items 2 & 6 are pure refactors — no test files modified for items 2 & 6 (per AGENTS.md exemption rule). Existing tests stay green and unaltered. Item 4 is doc-only (CONCURRENCY: comment); no behavior change. Item 3 adds a bench + a smoke assertion for the aggregate helper that previously had no coverage. Local arm64 baseline: ~72ms/op, 130k allocs at 5k txs. Items 1 & 5 follow red→green: 33d80b6 demonstrates the regex blindspot via a synthetic AST-detectable input the regex misses; 75563ce demonstrates per-store log dedup leaks across instances. Both flips visible in branch history. Full `go test ./cmd/server/...` runs clean post-amend. Fixes #1199 --------- Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
353c5264ad |
fix(#1197): plumb hop-context + observation-count tiebreak to disambiguator (#1198)
Red commit:
|
||
|
|
83881e6b71 |
fix(#688): auto-discover hashtag channels from message text (#1071)
## Summary Auto-discovers previously-unknown hashtag channels by scanning decoded channel message text for `#name` mentions and surfacing them via `GetChannels`. Workflow (per the issue): 1. New channel message arrives on a known channel 2. Decoded text is scanned for `#hashtag` mentions 3. Any mention that doesn't match an existing channel is surfaced as a discovered channel (`discovered: true`, `messageCount: 0`) 4. Future traffic on that channel will populate the entry once it has its own packets ## Changes - `cmd/server/discovered_channels.go` — new file. `extractHashtagsFromText` parses `#name` mentions from free text, deduped, order-preserving. Trailing punctuation is excluded by the character class. - `cmd/server/store.go` — `GetChannels` now scans CHAN packet text for hashtags after building the primary channel map, and appends any unseen hashtag mentions as discovered entries. - `cmd/server/discovered_channels_test.go` — new tests covering parser edge cases (single, multi, dedup, punctuation, none, bare `#`) and end-to-end discovery via `GetChannels`. ## TDD - Red: `34f1817` — stub returns `nil`, both new tests fail on assertion (verified). - Green: `d27b3ed` — real implementation, full `cmd/server` test suite passes (21.7s). ## Notes - Discovered channels carry `messageCount: 0` and `lastActivity` set to the most recent mention's `firstSeen`, so they sort naturally alongside real channels. - Names are matched against existing entries by both `#name` and bare `name` so a channel that already has decoded traffic isn't double-listed. - The existing `channelsCache` (15s) covers the new code path; no separate invalidation needed since the source data (`byPayloadType[5]`) drives both maps. Fixes #688 --------- Co-authored-by: corescope-bot <bot@corescope.local> |
||
|
|
d144764d38 |
fix(analytics): multiByteCapability missing under region filter → all rows 'unknown' (#1049)
## Bug `https://meshcore.meshat.se/#/analytics`: - Unfiltered → 0 adopter rows show "unknown" (correct). - Region filter `JKG` → 14 rows show "unknown" (wrong — same nodes, all confirmed when unfiltered). Multi-byte capability is a property of the NODE, derived from its own adverts (the full pubkey is in the advert payload, no prefix collision risk). The observing region should only control which nodes appear in the analytics list — it must not change a node's cap evidence. ## Root cause `PacketStore.GetAnalyticsHashSizes(region)` only attached `result["multiByteCapability"]` when `region == ""`. Under any region filter the field was absent. The frontend (`public/analytics.js:1011`) does `data.multiByteCapability || []`, so every adopter row falls through the merge with no cap status and renders as "unknown". ## Fix Always populate `multiByteCapability`. When a region filter is active, source the global adopter hash-size set from a no-region compute pass so out-of-region observers' adverts still count as evidence. ## TDD Red commit (`0968137`): adds `cmd/server/multibyte_region_filter_test.go`, asserts that `GetAnalyticsHashSizes("JKG")` returns a populated `multiByteCapability` with Node A as `confirmed`. Fails on the assertion (field missing) before the fix. Green commit (`6616730`): always compute capability against the global advert dataset. ## Files changed - `cmd/server/store.go` — `GetAnalyticsHashSizes`: drop the `region == ""` gate, always populate `multiByteCapability`. - `cmd/server/multibyte_region_filter_test.go` — new red→green test. ## Verification ``` go test ./... -count=1 # all server tests pass (21s) ``` --------- Co-authored-by: clawbot <bot@corescope.local> |
||
|
|
9f55ef802b |
fix(#804): attribute analytics by repeater home region, not observer (#1025)
Fixes #804. ## Problem Analytics filtered region purely by **observer** region: a multi-byte repeater whose home is PDX would leak into SJC results whenever its flood adverts were relayed past an SJC observer. Per-node groupings (`multiByteNodes`, `distributionByRepeaters`) inherited the same bug. ## Fix Two new helpers in `cmd/server/store.go`: - `iataMatchesRegion(iata, regionParam)` — case-insensitive IATA→region match using the existing `normalizeRegionCodes` parser. - `computeNodeHomeRegions()` — derives each node's HOME IATA from its zero-hop DIRECT adverts. Path byte for those packets is set locally on the originating radio and the packet has not been relayed, so the observer that hears it must be in direct RF range. Plurality vote when zero-hop adverts span multiple regions. `computeAnalyticsHashSizes` now applies these in two ways: 1. **Observer-region filter is relaxed for ADVERT packets** when the originator's home region matches the requested region. A flood advert from a PDX repeater that's only heard by an SJC observer still attributes to PDX. 2. **Per-node grouping** (`multiByteNodes`, `distributionByRepeaters`) excludes nodes whose HOME region disagrees with the requested region. Falls back to the observer-region filter when home is unknown. Adds `attributionMethod` to the response (`"observer"` or `"repeater"`) so operators can tell which method was applied. ## Backwards compatibility - No region filter requested → behavior unchanged (`attributionMethod` is `"observer"`). - Region filter requested but no zero-hop direct adverts seen for a node → falls back to the prior observer-region check for that node. - Operators without IATA-tagged observers see no change. ## TDD - **Red commit** (`c35d349`): adds `TestIssue804_AnalyticsAttributesByRepeaterRegion` with three subtests (PDX leak into SJC, attributionMethod field present, SJC leak into PDX). Compiles, runs, fails on assertions. - **Green commit** (`11b157f`): the implementation. All subtests pass, full `cmd/server` package green. ## Files changed - `cmd/server/store.go` — helpers + analytics filter logic (+236/-51) - `cmd/server/issue804_repeater_region_test.go` — new test (+147) --------- Co-authored-by: CoreScope Bot <bot@corescope.local> Co-authored-by: openclaw-bot <bot@openclaw.local> |
||
|
|
a56ee5c4fe |
feat(analytics): selectable timeframes via ?window/?from/?to (#842) (#1018)
## Summary Selectable analytics timeframes (#842). Adds backend support for `?window=1h|24h|7d|30d` and `?from=&to=` on the three main analytics endpoints (`/api/analytics/rf`, `/api/analytics/topology`, `/api/analytics/channels`), and a time-window picker in the Analytics page UI that drives them. Default behavior with no query params is unchanged. ## TDD trail - Red: `bbab04d` — adds `TimeWindow` + `ParseTimeWindow` stub and tests; tests fail on assertions because the stub returns the zero window. - Green: `75d27f9` — implements `ParseTimeWindow`, threads `TimeWindow` through `compute*` loops + caches, wires HTTP handlers, adds frontend picker + E2E. ## Backend changes - `cmd/server/time_window.go` — full `ParseTimeWindow` (`?window=` aliases + `?from=/&to=` RFC3339 absolute range; invalid input → zero window for backwards compatibility). - `cmd/server/store.go` — new `GetAnalytics{RF,Topology,Channels}WithWindow` wrappers; `compute*` loops skip transmissions whose `FirstSeen` (or per-obs `Timestamp` for the region+observer slice) falls outside the window. Cache key composes `region|window` so different windows do not poison each other. - `cmd/server/routes.go` — handlers call `ParseTimeWindow(r)` and dispatch to the `*WithWindow` methods. ## Frontend changes - `public/analytics.js` — new `<select id="analyticsTimeWindow">` rendered under the region filter (All / 1h / 24h / 7d / 30d). Selecting an option triggers `loadAnalytics()` which appends `&window=…` to every analytics fetch. ## Tests - `cmd/server/time_window_test.go` — covers all aliases, absolute range, no-params backwards compatibility, `Includes()` bounds, and `CacheKey()` distinctness. - `cmd/server/topology_dedup_test.go`, `cmd/server/channel_analytics_test.go` — updated callers to pass `TimeWindow{}`. ## E2E (rule 18) `test-e2e-playwright.js:592-611` — opens `/#/analytics`, asserts the picker is rendered with a `24h` option, then asserts that selecting `24h` triggers a network request to `/api/analytics/rf?…window=24h`. ## Backwards compatibility No params → zero `TimeWindow` → original code paths (no filter, region-only cache key). Verified by `TestParseTimeWindow_NoParams_BackwardsCompatible` and by the existing analytics tests still passing unchanged on `_wt-fix-842`. Fixes #842 --------- Co-authored-by: you <you@example.com> Co-authored-by: corescope-bot <bot@corescope> |
||
|
|
e86b5a3a0c |
feat: show multi-byte hash support indicator on map markers (#1002)
## Summary Show 2-byte hash support indicator on map markers. Fixes #903. ## What changed ### Backend (`cmd/server/store.go`, `cmd/server/routes.go`) - **`EnrichNodeWithMultiByte()`** — new enrichment function that adds `multi_byte_status` (confirmed/suspected/unknown), `multi_byte_evidence` (advert/path), and `multi_byte_max_hash_size` fields to node API responses - **`GetMultiByteCapMap()`** — cached (15s TTL) map of pubkey → `MultiByteCapEntry`, reusing the existing `computeMultiByteCapability()` logic that combines advert-based and path-hop-based evidence - Wired into both `/api/nodes` (list) and `/api/nodes/{pubkey}` (detail) endpoints ### Frontend (`public/map.js`) - Added **"Multi-byte support"** checkbox in the map Display controls section - When toggled on, repeater markers change color: - 🟢 Green (`#27ae60`) — **confirmed** (advertised with hash_size ≥ 2) - 🟡 Yellow (`#f39c12`) — **suspected** (seen as hop in multi-byte path) - 🔴 Red (`#e74c3c`) — **unknown** (no multi-byte evidence) - Popup tooltip shows multi-byte status and evidence for repeaters - State persisted in localStorage (`meshcore-map-multibyte-overlay`) ## TDD - Red commit: `2f49cbc` — failing test for `EnrichNodeWithMultiByte` - Green commit: `4957782` — implementation + passing tests ## Performance - `GetMultiByteCapMap()` uses a 15s TTL cache (same pattern as `GetNodeHashSizeInfo`) - Enrichment is O(n) over nodes, no per-item API calls - Frontend color override is computed inline during existing marker render loop — no additional DOM rebuilds --------- Co-authored-by: you <you@example.com> |
||
|
|
564d93d6aa |
fix: dedup topology analytics by resolved pubkey (#998)
## Fix topology analytics double-counting repeaters/pairs (#909) ### Problem `computeAnalyticsTopology()` aggregates by raw hop hex string. When firmware emits variable-length path hashes (1-3 bytes per hop), the same physical node appears multiple times with different prefix lengths (e.g. `"07"`, `"0735bc"`, `"0735bc6d"` all referring to the same node). This inflates repeater counts and creates duplicate pair entries. ### Solution Added a confidence-gated dedup pass after frequency counting: 1. **For each hop prefix**, check if it resolves unambiguously (exactly 1 candidate in the prefix map) 2. **Unambiguous prefixes** → group by resolved pubkey, sum counts, keep longest prefix as display identifier 3. **Ambiguous prefixes** (multiple candidates for that prefix) → left as separate entries (conservative) 4. **Same treatment for pairs**: canonicalize by sorted pubkey pair ### Addressing @efiten's collision concern At scale (~2000+ repeaters), 1-byte prefixes (256 buckets) WILL collide. This fix explicitly checks the prefix map candidate count. Ambiguous prefixes (where `len(pm.m[hop]) > 1`) are never merged — they remain as separate entries. Only prefixes with a single matching node are eligible for dedup. ### TDD - **Red commit**: `4dbf9c0` — added 3 failing tests - **Green commit**: `d6cae9a` — implemented dedup, all tests pass ### Tests added - `TestTopologyDedup_RepeatersMergeByPubkey` — verifies entries with different prefix lengths for same node merge to single entry with summed count - `TestTopologyDedup_AmbiguousPrefixNotMerged` — verifies colliding short prefix stays separate from unambiguous longer prefix - `TestTopologyDedup_PairsMergeByPubkey` — verifies pair entries merge by resolved pubkey pair Fixes #909 --------- Co-authored-by: you <you@example.com> |
||
|
|
b7c280c20a |
fix: drop/filter packets with null hash or timestamp (closes #871) (#993)
## Summary Closes #871 The `/api/packets` endpoint could return packets with `null` hash or timestamp fields. This was caused by legacy data in SQLite (rows with empty `hash` or `NULL`/empty `first_seen`) predating the ingestor's existing validation guard (`if hash == "" { return false, nil }` at `cmd/ingestor/db.go:610`). ## Root Cause `cmd/server/store.go` `filterPackets()` had no data-integrity guard. Legacy rows with empty `hash` or `first_seen` were loaded into the in-memory store and returned verbatim. The `strOrNil("")` helper then serialized these as JSON `null`. ## Fix Added a data-integrity predicate at the top of `filterPackets`'s scan callback (`cmd/server/store.go:2278`): ```go if tx.Hash == "" || tx.FirstSeen == "" { return false } ``` This filters bad legacy rows at query time. The write path (ingestor) already rejects empty hashes, so no new bad data enters. ## TDD Evidence - **Red commit:** `15774c3` — test `TestIssue871_NoNullHashOrTimestamp` asserts no packet in API response has null/empty hash or timestamp - **Green commit:** `281fd6f` — adds the filter guard, test passes ## Testing - `go test ./...` in `cmd/server` passes (full suite) - Client-side defensive filter from PR #868 remains as defense-in-depth --------- Co-authored-by: you <you@example.com> |
||
|
|
fc57433f27 |
fix(analytics): merge channel buckets by hash byte; reject rainbow-table mismatches (closes #978) (#980)
## Summary Closes #978 — analytics channels duplicated by encrypted/decrypted split + rainbow-table collisions. ## Root cause Two distinct bugs in `computeAnalyticsChannels` (`cmd/server/store.go`): 1. **Encrypted/decrypted split**: The grouping key included the decoded channel name (`hash + "_" + channel`), so packets from observers that could decrypt a channel created a separate bucket from packets where decryption failed. Same physical channel, two entries. 2. **Rainbow-table collisions**: Some observers' lookup tables map hash bytes to wrong channel names. E.g., hash `72` incorrectly claimed to be `#wardriving` (real hash is `129`). This created ghost 1-message entries. ## Fix 1. **Always group by hash byte alone** (drop `_channel` suffix from `chKey`). When any packet decrypts successfully, upgrade the bucket's display name from placeholder (`chN`) to the real name (first-decrypter-wins for stability). 2. **Validate channel names** against the firmware hash invariant: `SHA256(SHA256("#name")[:16])[0] == channelHash`. Mismatches are treated as encrypted (placeholder name, no trust in decoded channel). Guard is in the analytics handler (not the ingestor) to avoid breaking other surfaces that use the decoded field for display. ## Verification (e2e-fixture.db) | Metric | BEFORE | AFTER | |--------|--------|-------| | Total channels | 22 | 19 | | Duplicate hash bytes | 3 (hashes 217, 202, 17) | 0 | ## Tests added - `TestComputeAnalyticsChannels_MergesEncryptedAndDecrypted` — same hash, mixed encrypted/decrypted → ONE bucket - `TestComputeAnalyticsChannels_RejectsRainbowTableMismatch` — hash 72 claimed as `#wardriving` (real=129) → rejected, stays `ch72` - `TestChannelNameMatchesHash` — unit test for hash validation helper - `TestIsPlaceholderName` — unit test for placeholder detection Anti-tautology gate: both main tests fail when their respective fix lines are reverted. Co-authored-by: you <you@example.com> |
||
|
|
e460932668 |
fix(store): apply retentionHours cutoff in Load() to prevent OOM on cold start (#917)
## Problem `Load()` loaded all transmissions from the DB regardless of `retentionHours`, so `buildSubpathIndex()` processed the full DB history on every startup. On a DB with ~280K paths this produces ~13.5M subpath index entries, OOM-killing the process before it ever starts listening — causing a supervisord crash loop with no useful error message. ## Fix Apply the same `retentionHours` cutoff to `Load()`'s SQL that `EvictStale()` already uses at runtime. Both conditions (`retentionHours` window and `maxPackets` cap) are combined with AND so neither safety limit is bypassed. Startup now builds indexes only over the retention window, making startup time and memory proportional to recent activity rather than total DB history. ## Docs - `config.example.json`: adds `retentionHours` to the `packetStore` block with recommended value `168` (7 days) and a warning about `0` on large DBs - `docs/user-guide/configuration.md`: documents the field and adds an explicit OOM warning ## Test plan - [x] `cd cmd/server && go test ./... -run TestRetentionLoad` — covers the retention-filtered load: verifies packets outside the window are excluded, and that `retentionHours: 0` still loads everything - [x] Deploy on an instance with a large DB (>100K paths) and `retentionHours: 168` — server reaches "listening" in seconds instead of OOM-crashing - [x] Verify `config.example.json` has `retentionHours: 168` in the `packetStore` block - [x] Verify `docs/user-guide/configuration.md` documents the field and warning 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Kpa-clawbot <kpaclawbot@outlook.com> |
||
|
|
54f7f9d35b |
feat: path-prefix candidate inspector with map view (#944) (#945)
## feat: path-prefix candidate inspector with map view (#944) Implements the locked spec from #944: a beam-search-based path prefix inspector that enumerates candidate full-pubkey paths from short hex prefixes and scores them. ### Server (`cmd/server/path_inspect.go`) - **`POST /api/paths/inspect`** — accepts 1-64 hex prefixes (1-3 bytes, uniform length per request) - Beam search (width 20) over cached `prefixMap` + `NeighborGraph` - Per-hop scoring: edge weight (35%), GPS plausibility (20%), recency (15%), prefix selectivity (30%) - Geometric mean aggregation with 0.05 floor per hop - Speculative threshold: score < 0.7 - Score cache: 30s TTL, keyed by (prefixes, observer, window) - Cold-start: synchronous NeighborGraph rebuild with 2s hard timeout → 503 `{retry:true}` - Body limit: 4096 bytes via `http.MaxBytesReader` - Zero SQL queries in handler hot path - Request validation: rejects empty, odd-length, >3 bytes, mixed lengths, >64 hops ### Frontend (`public/path-inspector.js`) - New page under Tools route with input field (comma/space separated hex prefixes) - Client-side validation with error feedback - Results table: rank, score (color-coded speculative), path names, per-hop evidence (collapsed) - "Show on Map" button calls `drawPacketRoute` (one path at a time, clears prior) - Deep link: `#/tools/path-inspector?prefixes=2c,a1,f4` ### Nav reorganization - `Traces` nav item renamed to `Tools` - Backward-compat: `#/traces/<hash>` redirects to `#/tools/trace/<hash>` - Tools sub-routing dispatches to traces or path-inspector ### Store changes - Added `LastSeen time.Time` to `nodeInfo` struct, populated from `nodes.last_seen` - Added `inspectMu` + `inspectCache` fields to `PacketStore` ### Tests - **Go unit tests** (`path_inspect_test.go`): scoreHop components, beam width cap, speculative flag, all validation error cases, valid request integration - **Frontend tests** (`test-path-inspector.js`): parse comma/space/mixed, validation (empty, odd, >3 bytes, mixed lengths, invalid hex, valid) - Anti-tautology gate verified: removing beam pruning fails width test; removing validation fails reject tests ### CSS - `--path-inspector-speculative` variable in both themes (amber, WCAG AA on both dark/light backgrounds) - All colors via CSS variables (no hardcoded hex in production code) Closes #944 --------- Co-authored-by: you <you@example.com> |
||
|
|
5678874128 |
fix: exclude non-repeater nodes from path-hop resolution (#935) (#936)
Fixes #935 ## Problem `buildPrefixMap()` indexed ALL nodes regardless of role, causing companions/sensors to appear as repeater hops when their pubkey prefix collided with a path-hop hash byte. ## Fix ### Server (`cmd/server/store.go`) - Added `canAppearInPath(role string) bool` — allowlist of roles that can forward packets (repeater, room_server, room) - `buildPrefixMap` now skips nodes that fail this check ### Client (`public/hop-resolver.js`) - Added matching `canAppearInPath(role)` helper - `init()` now only populates `prefixIdx` for path-eligible nodes - `pubkeyIdx` remains complete — `resolveFromServer()` still resolves any node type by full pubkey (for server-confirmed `resolved_path` arrays) ## Tests - `cmd/server/prefix_map_role_test.go`: 7 new tests covering role filtering in prefix map and resolveWithContext - `test-hop-resolver-affinity.js`: 4 new tests verifying client-side role filter + pubkeyIdx completeness - All existing tests updated to include `Role: "repeater"` where needed - `go test ./cmd/server/...` — PASS - `node test-hop-resolver-affinity.js` — 16/17 pass (1 pre-existing centroid failure unrelated to this change) ## Commits 1. `fix: filter prefix map to only repeater/room roles (#935)` — server implementation 2. `test: prefix map role filter coverage (#935)` — server tests 3. `ui: filter HopResolver prefix index to repeater/room roles (#935)` — client implementation 4. `test: hop-resolver role filter coverage (#935)` — client tests --------- Co-authored-by: you <you@example.com> |
||
|
|
a605518d6d |
fix(#881): per-observation raw_hex — each observer sees different bytes on air (#882)
## Problem Each MeshCore observer receives a physically distinct over-the-air byte sequence for the same transmission (different path bytes, flags/hops remaining). The `observations` table stored only `path_json` per observer — all observations pointed at one `transmissions.raw_hex`. This prevented the hex pane from updating when switching observations in the packet detail view. ## Changes | Layer | Change | |-------|--------| | **Schema** | `ALTER TABLE observations ADD COLUMN raw_hex TEXT` (nullable). Migration: `observations_raw_hex_v1` | | **Ingestor** | `stmtInsertObservation` now stores per-observer `raw_hex` from MQTT payload | | **View** | `packets_v` uses `COALESCE(o.raw_hex, t.raw_hex)` — backward compatible with NULL historical rows | | **Server** | `enrichObs` prefers `obs.RawHex` when non-empty, falls back to `tx.RawHex` | | **Frontend** | No changes — `effectivePkt.raw_hex` already flows through `renderDetail` | ## Tests - **Ingestor**: `TestPerObservationRawHex` — two MQTT packets for same hash from different observers → both stored with distinct raw_hex - **Server**: `TestPerObservationRawHexEnrich` — enrichObs returns per-obs raw_hex when present, tx fallback when NULL - **E2E**: Playwright assertion in `test-e2e-playwright.js` for hex pane update on observation switch E2E assertion added: `test-e2e-playwright.js:1794` ## Scope - Historical observations: raw_hex stays NULL, UI falls back to transmission raw_hex silently - No backfill, no path_json reconstruction, no frontend changes Closes #881 --------- Co-authored-by: you <you@example.com> |
||
|
|
a371d35bfd |
feat(#847): dedupe Top Longest Hops by pair + add obs count and SNR cues (#848)
## Problem The "Top 20 Longest Hops" RF analytics card shows the same repeater pair filling most slots because the query sorts raw hop records by distance with no pair deduplication. A single long link observed 12+ times dominates the leaderboard. ## Fix Dedupe by unordered `(pk1, pk2)` pair. Per pair, keep the max-distance record and compute reliability metrics: | Column | Description | |--------|-------------| | **Obs** | Total observations of this link | | **Best SNR** | Maximum SNR seen (dB) | | **Median SNR** | Median SNR across all observations (dB) | Tooltip on each row shows the timestamp of the best observation. ### Before | # | From | To | Distance | Type | SNR | Packet | |---|------|----|----------|------|-----|--------| | 1 | NodeX | NodeY | 200 mi | R↔R | 5 dB | abc… | | 2 | NodeX | NodeY | 199 mi | R↔R | 6 dB | def… | | 3 | NodeX | NodeY | 198 mi | R↔R | 4 dB | ghi… | ### After | # | From | To | Distance | Type | Obs | Best SNR | Median SNR | Packet | |---|------|----|----------|------|-----|----------|------------|--------| | 1 | NodeX | NodeY | 200 mi | R↔R | 12 | 8.0 dB | 5.2 dB | abc… | | 2 | NodeA | NodeB | 150 mi | C↔R | 3 | 6.5 dB | 6.5 dB | jkl… | ## Changes - **`cmd/server/store.go`**: Group `filteredHops` by unordered pair key, accumulate obs count / best SNR / median SNR per group, sort by max distance, take top 20 - **`cmd/server/types.go`**: Update `DistanceHop` struct — replace `SNR` with `BestSnr`, `MedianSnr`, add `ObsCount` - **`public/analytics.js`**: Replace single SNR column with Obs, Best SNR, Median SNR; add row tooltip with best observation timestamp - **`cmd/server/store_tophops_test.go`**: 3 unit tests — basic dedupe, reverse-pair merge, nil SNR edge case ## Test Coverage - `TestDedupeTopHopsByPair`: 5 records on pair (A,B) + 1 on (C,D) → 2 results, correct obsCount/dist/bestSnr/medianSnr - `TestDedupeTopHopsReversePairMerges`: (B,A) and (A,B) merge into one entry - `TestDedupeTopHopsNilSNR`: all-nil SNR records → bestSnr and medianSnr both nil - Existing `TestAnalyticsRFEndpoint` and `TestAnalyticsRFWithRegion` still pass Closes #847 --------- Co-authored-by: you <you@example.com> |
||
|
|
7f024b7aa7 |
fix(#673): replace raw JSON text search with byNode index for node packet queries (#803)
## Summary Fixes #673 - GRP_TXT packets whose message text contains a node's pubkey were incorrectly counted as packets for that node, inflating packet counts and type breakdowns - Two code paths in `store.go` used `strings.Contains` on the full `DecodedJSON` blob — this matched pubkeys appearing anywhere in the JSON, including inside chat message text - `filterPackets` slow path (combined node + other filters): replaced substring search with a hash-set membership check against `byNode[nodePK]` - `GetNodeAnalytics`: removed the full-packet-scan + text search branch entirely; always uses the `byNode` index (which already covers `pubKey`/`destPubKey`/`srcPubKey` via structured field indexing) ## Test Plan - [x] `TestGetNodeAnalytics_ExcludesGRPTXTWithPubkeyInText` — verifies a GRP_TXT packet with the node's pubkey in its text field is not counted in that node's analytics - [x] `TestFilterPackets_NodeQueryDoesNotMatchChatText` — verifies the combined-filter slow path of `filterPackets` returns only the indexed ADVERT, not the chat packet Both tests were written as failing tests against the buggy code and pass after the fix. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
d7fe24e2db |
Fix channel filter on Packets page (UI + API) — #812 (#816)
Closes #812 ## Root causes **Server (`/api/packets?channel=…` returned identical totals):** The handler in `cmd/server/routes.go` never read the `channel` query parameter into `PacketQuery`, so it was silently ignored by both the SQLite path (`db.go::buildTransmissionWhere`) and the in-memory path (`store.go::filterPackets`). The codebase already had everything else in place — the `channel_hash` column with an index from #762, decoded `channel` / `channelHashHex` fields on each packet — it just wasn't wired up. **UI (`/#/packets` had no channel filter):** `public/packets.js` rendered observer / type / time-window / region filters but no channel control, and didn't read `?channel=` from the URL. ## Fix ### Server - New `Channel` field on `PacketQuery`; `handlePackets` reads `r.URL.Query().Get("channel")`. - DB path filters by the indexed `channel_hash` column (exact match). - In-memory path: helper `packetMatchesChannel` matches `decoded.channel` (plaintext, e.g. `#test`, `public`) or `enc_<HEX>` against `channelHashHex` for undecryptable GRP_TXT. Uses cached `ParsedDecoded()` so it's O(1) after first parse. Fast-path index guards and the grouped-cache key updated to include channel. - Regression test (`channel_filter_test.go`): `channel=#test` returns ≥1 GRP_TXT packet and fewer than baseline; `channel=nonexistentchannel` returns `total=0`. ### UI - New `<select id="fChannel">` populated from `/api/channels`. - Round-trips via `?channel=…` on the URL hash (read on init, written on change). - Pre-seeds the current value as an option so encrypted hashes not in `/api/channels` still display as selected on reload. - On change, calls `loadPackets()` so the server-side filter applies before pagination. ## Perf Filter adds at most one cached map lookup per packet (DB path uses indexed column, store path uses `ParsedDecoded()` cache). Staging baseline 149–190 ms for `?channel=#test&limit=50`; the new comparison is negligible. Target ≤ 500 ms preserved. ## Tests `cd cmd/server && go test ./... -count=1 -timeout 120s` → PASS. --------- Co-authored-by: you <you@example.com> |
||
|
|
9e90548637 |
perf(#800): remove per-StoreTx ResolvedPath, replace with membership index + on-demand decode (#806)
## Summary Remove `ResolvedPath []*string` field from `StoreTx` and `StoreObs` structs, replacing it with a compact membership index + on-demand SQL decode. This eliminates the dominant heap cost identified in profiling (#791, #799). **Spec:** #800 (consolidated from two rounds of expert + implementer review on #799) Closes #800 Closes #791 ## Design ### Removed - `StoreTx.ResolvedPath []*string` - `StoreObs.ResolvedPath []*string` - `TransmissionResp.ResolvedPath`, `ObservationResp.ResolvedPath` struct fields ### Added | Structure | Purpose | Est. cost at 1M obs | |---|---|---:| | `resolvedPubkeyIndex map[uint64][]int` | FNV-1a(pubkey) → []txID forward index | 50–120 MB | | `resolvedPubkeyReverse map[int][]uint64` | txID → []hashes for clean removal | ~40 MB | | `apiResolvedPathLRU` (10K entries) | FIFO cache for on-demand API decode | ~2 MB | ### Decode-window discipline `resolved_path` JSON decoded once per packet. Consumers fed in order, temp slice dropped — never stored on struct: 1. `addToByNode` — relay node indexing 2. `touchRelayLastSeen` — relay liveness DB updates 3. `byPathHop` resolved-key entries 4. `resolvedPubkeyIndex` + reverse insert 5. WebSocket broadcast map (raw JSON bytes) 6. Persist batch (raw JSON bytes for SQL UPDATE) ### Collision safety When the forward index returns candidates, a batched SQL query confirms exact pubkey presence using `LIKE '%"pubkey"%'` on the `resolved_path` column. ### Feature flag `useResolvedPathIndex` (default `true`). Off-path is conservative: all candidates kept, index not consulted. For one-release rollback safety. ## Files changed | File | Changes | |---|---| | `resolved_index.go` | **New** — index structures, LRU cache, on-demand SQL helpers, collision safety | | `store.go` | Remove RP fields, decode-window discipline in Load/Ingest, on-demand txToMap/obsToMap/enrichObs, eviction cleanup via SQL, memory accounting update | | `types.go` | Remove RP fields from TransmissionResp/ObservationResp | | `routes.go` | Replace `nodeInResolvedPath` with `nodeInResolvedPathViaIndex`, remove RP from mapSlice helpers | | `neighbor_persist.go` | Refactor backfill: reverse-map removal → forward+reverse insert → LRU invalidation | ## Tests added (27 new) **Unit:** - `TestStoreTx_ResolvedPathFieldAbsent` — reflection guard - `TestResolvedPubkeyIndex_BuildFromLoad` — forward+reverse consistency - `TestResolvedPubkeyIndex_HashCollision` — SQL collision safety - `TestResolvedPubkeyIndex_IngestUpdate` — maps reflect new ingests - `TestResolvedPubkeyIndex_RemoveOnEvict` — clean removal via reverse map - `TestResolvedPubkeyIndex_PerObsCoverage` — non-best obs pubkeys indexed - `TestAddToByNode_WithoutResolvedPathField` - `TestTouchRelayLastSeen_WithoutResolvedPathField` - `TestWebSocketBroadcast_IncludesResolvedPath` - `TestBackfill_InvalidatesLRU` - `TestEviction_ByNodeCleanup_OnDemandSQL` - `TestExtractResolvedPubkeys`, `TestMergeResolvedPubkeys` - `TestResolvedPubkeyHash_Deterministic` - `TestLRU_EvictionOnFull` **Endpoint:** - `TestPathsThroughNode_NilResolvedPathFallback` - `TestPacketsAPI_OnDemandResolvedPath` - `TestPacketsAPI_OnDemandResolvedPath_LRUHit` - `TestPacketsAPI_OnDemandResolvedPath_Empty` **Feature flag:** - `TestFeatureFlag_OffPath_PreservesOldBehavior` - `TestFeatureFlag_Toggle_NoStateLeak` **Concurrency:** - `TestReverseMap_NoLeakOnPartialFailure` - `TestDecodeWindow_LockHoldTimeBounded` - `TestLivePolling_LRUUnderConcurrentIngest` **Regression:** - `TestRepeaterLiveness_StillAccurate` **Benchmarks:** - `BenchmarkLoad_BeforeAfter` - `BenchmarkResolvedPubkeyIndex_Memory` - `BenchmarkPathsThroughNode_Latency` - `BenchmarkLivePolling_UnderIngest` ## Benchmark results ``` BenchmarkResolvedPubkeyIndex_Memory/pubkeys=50K 429ms 103MB 777K allocs BenchmarkResolvedPubkeyIndex_Memory/pubkeys=500K 4205ms 896MB 7.67M allocs BenchmarkLoad_BeforeAfter 65ms 20MB 202K allocs BenchmarkPathsThroughNode_Latency 3.9µs 0B 0 allocs BenchmarkLivePolling_UnderIngest 5.4µs 545B 7 allocs ``` Key: per-obs `[]*string` overhead completely eliminated. At 1M obs with 3 hops average, this saves ~72 bytes/obs × 1M = ~68 MB just from the slice headers + pointers, plus the JSON-decoded string data (~900 MB at scale per profiling). ## Design choices - **FNV-1a instead of xxhash**: stdlib availability, no external dependency. Performance is equivalent for this use case (pubkey strings are short). - **FIFO LRU instead of true LRU**: simpler implementation, adequate for the access pattern (mostly sequential obs IDs from live polling). - **Grouped packets view omits resolved_path**: cold path, not worth SQL round-trip per page render. - **Backfill pending check uses reverse-map presence** instead of per-obs field: if a tx has any indexed pubkeys, its observations are considered resolved. Closes #807 --------- Co-authored-by: you <you@example.com> |
||
|
|
a8e1cea683 |
fix: use payload type bits only in content hash (not full header byte) (#787)
## Problem The firmware computes packet content hash as: ``` SHA256(payload_type_byte + [path_len for TRACE] + payload) ``` Where `payload_type_byte = (header >> 2) & 0x0F` — just the payload type bits (2-5). CoreScope was using the **full header byte** in its hash computation, which includes route type bits (0-1) and version bits (6-7). This meant the same logical packet produced different content hashes depending on route type — breaking dedup and packet lookup. **Firmware reference:** `Packet.cpp::calculatePacketHash()` uses `getPayloadType()` which returns `(header >> PH_TYPE_SHIFT) & PH_TYPE_MASK`. ## Fix - Extract only payload type bits: `payloadType := (headerByte >> 2) & 0x0F` - Include `path_len` byte in hash for TRACE packets (matching firmware behavior) - Applied to both `cmd/server/decoder.go` and `cmd/ingestor/decoder.go` ## Tests Added - **Route type independence:** Same payload with FLOOD vs DIRECT route types produces identical hash - **TRACE path_len inclusion:** TRACE packets with different `path_len` produce different hashes - **Firmware compatibility:** Hash output matches manual computation of firmware algorithm ## Migration Impact Existing packets in the DB have content hashes computed with the old (incorrect) formula. Options: 1. **Recompute hashes** via migration (recommended for clean state) 2. **Dual lookup** — check both old and new hash on queries (backward compat) 3. **Accept the break** — old hashes become stale, new packets get correct hashes Recommend option 1 (migration) as a follow-up. The volume of affected packets depends on how many distinct route types were seen for the same logical packet. Fixes #786 --------- Co-authored-by: you <you@example.com> |
||
|
|
d596becca3 |
feat: bounded cold load — limit Load() by memory budget (#790)
## Implements #748 M1 — Bounded Cold Load ### Problem `Load()` pulls the ENTIRE database into RAM before eviction runs. On a 1GB database, this means 3+ GB peak memory at startup, regardless of `maxMemoryMB`. This is the root cause of #743 (OOM on 2GB VMs). ### Solution Calculate the maximum number of transmissions that fit within the `maxMemoryMB` budget and use a SQL subquery LIMIT to load only the newest packets. **Two-phase approach** (avoids the JOIN-LIMIT row count problem): ```sql SELECT ... FROM transmissions t LEFT JOIN observations o ON ... WHERE t.id IN (SELECT id FROM transmissions ORDER BY first_seen DESC LIMIT ?) ORDER BY t.first_seen ASC, o.timestamp DESC ``` ### Changes - **`estimateStoreTxBytesTypical(numObs)`** — estimates memory cost of a typical transmission without needing an actual `StoreTx` instance. Used for budget calculation. - **Budget calculation in `Load()`** — `maxPackets = (maxMemoryMB * 1048576) / avgBytesPerPacket` with a floor of 1000 packets. - **Subquery LIMIT** — loads only the newest N transmissions when bounded. - **`oldestLoaded` tracking** — records the oldest packet timestamp in memory so future SQL fallback queries (M2+) know where in-memory data ends. - **Perf stats** — `oldestLoaded` exposed in `/api/perf/store-stats`. - **Logging** — bounded loads show `Loaded X/Y transmissions (limited by ZMB budget)`. ### When `maxMemoryMB=0` (unlimited) Behavior is completely unchanged — no LIMIT clause, all packets loaded. ### Tests (6 new) | Test | Validates | |------|-----------| | `TestBoundedLoad_LimitedMemory` | With 1MB budget, loads fewer than total (hits 1000 minimum) | | `TestBoundedLoad_NewestFirst` | Loaded packets are the newest, not oldest | | `TestBoundedLoad_OldestLoadedSet` | `oldestLoaded` matches first packet's `FirstSeen` | | `TestBoundedLoad_UnlimitedWithZero` | `maxMemoryMB=0` loads all packets | | `TestBoundedLoad_AscendingOrder` | Packets remain in ascending `first_seen` order after bounded load | | `TestEstimateStoreTxBytesTypical` | Estimate grows with observation count, exceeds floor | Plus benchmarks: `BenchmarkLoad_Bounded` vs `BenchmarkLoad_Unlimited`. ### Perf justification On a 5000-transmission test DB with 1MB budget: - Bounded: loads 1000 packets (the minimum) in ~1.3s - The subquery uses SQLite's index on `first_seen` — O(N log N) for the LIMIT, then indexed JOIN for observations - No full table scan needed when bounded ### Next milestones - **M2**: Packet list/search SQL fallback (uses `oldestLoaded` boundary) - **M3**: Node analytics SQL fallback - **M4-M5**: Remaining endpoint fallbacks + live-only memory store --------- Co-authored-by: you <you@example.com> |
||
|
|
6a648dea11 |
fix: multi-byte adopters — all node types, role column, advert precedence (#754) (#767)
## Fix: Multi-Byte Adopters Table — Three Bugs (#754) ### Bug 1: Companions in "Unknown" `computeMultiByteCapability()` was repeater-only. Extended to classify **all node types** (companions, rooms, sensors). A companion advertising with 2-byte hash is now correctly "Confirmed". ### Bug 2: No Role Column Added a **Role** column to the merged Multi-Byte Hash Adopters table, color-coded using `ROLE_COLORS` from `roles.js`. Users can now distinguish repeaters from companions without clicking through to node detail. ### Bug 3: Data Source Disagreement When adopter data (from `computeAnalyticsHashSizes`) shows `hashSize >= 2` but capability only found path evidence ("Suspected"), the advert-based adopter data now takes precedence → "Confirmed". The adopter hash sizes are passed into `computeMultiByteCapability()` as an additional confirmed evidence source. ### Changes - `cmd/server/store.go`: Extended capability to all node types, accept adopter hash sizes, prioritize advert evidence - `public/analytics.js`: Added Role column with color-coded badges - `cmd/server/multibyte_capability_test.go`: 3 new tests (companion confirmed, role populated, adopter precedence) ### Tests - All 10 multi-byte capability tests pass - All 544 frontend helper tests pass - All 62 packet filter tests pass - All 29 aging tests pass --------- Co-authored-by: you <you@example.com> |
||
|
|
401fd070f8 |
fix: improve trackedBytes accuracy for memory estimation (#751)
## Problem Fixes #743 — High memory usage / OOM with relatively small dataset. `trackedBytes` severely undercounted actual per-packet memory because it only tracked base struct sizes and string field lengths, missing major allocations: | Structure | Untracked Cost | Scale Impact | |-----------|---------------|--------------| | `spTxIndex` (O(path²) subpath entries) | 40 bytes × path combos | 50-150MB | | `ResolvedPath` on observations | 24 bytes × elements | ~25MB | | Per-tx maps (`obsKeys`, `observerSet`) | 200 bytes/tx flat | ~11MB | | `byPathHop` index entries | 50 bytes/hop | 20-40MB | This caused eviction to trigger too late (or not at all), leading to OOM. ## Fix Expanded `estimateStoreTxBytes` and `estimateStoreObsBytes` to account for: - **Per-tx maps**: +200 bytes flat for `obsKeys` + `observerSet` map headers - **Path hop index**: +50 bytes per hop in `byPathHop` - **Subpath index**: +40 bytes × `hops*(hops-1)/2` combinations for `spTxIndex` - **Resolved paths**: +24 bytes per `ResolvedPath` element on observations Updated the existing `TestEstimateStoreTxBytes` to match new formula. All existing eviction tests continue to pass — the eviction logic itself is unchanged. Also exposed `avgBytesPerPacket` in the perf API (`/api/perf`) so operators can monitor per-packet memory costs. ## Performance Benchmark confirms negligible overhead (called on every insert): ``` BenchmarkEstimateStoreTxBytes 159M ops 7.5 ns/op 0 B/op 0 allocs BenchmarkEstimateStoreObsBytes 1B ops 1.0 ns/op 0 B/op 0 allocs ``` ## Tests - 6 new tests in `tracked_bytes_test.go`: - Reasonable value ranges for different packet sizes - 10-hop packets estimate significantly more than 2-hop (subpath cost) - Observations with `ResolvedPath` estimate more than without - 15 observations estimate >10x a single observation - `trackedBytes` matches sum of individual estimates after batch insert - Eviction triggers correctly with improved estimates - 2 benchmarks confirming sub-10ns estimate cost - Updated existing `TestEstimateStoreTxBytes` for new formula - Full test suite passes --------- Co-authored-by: you <you@example.com> |
||
|
|
a815e70975 |
feat: Clock skew detection — backend computation (M1) (#746)
## Summary Implements **Milestone 1** of #690 — backend clock skew computation for nodes and observers. ## What's New ### Clock Skew Engine (`clock_skew.go`) **Phase 1 — Raw Skew Calculation:** For every ADVERT observation: `raw_skew = advert_timestamp - observation_timestamp` **Phase 2 — Observer Calibration:** Same packet seen by multiple observers → compute each observer's clock offset as the median deviation from the per-packet median observation timestamp. This identifies observers with their own clock drift. **Phase 3 — Corrected Node Skew:** `corrected_skew = raw_skew + observer_offset` — compensates for observer clock error. **Phase 4 — Trend Analysis:** Linear regression over time-ordered skew samples estimates drift rate in seconds/day. Detects crystal drift vs stable offset vs sudden jumps. ### Severity Classification | Level | Threshold | Meaning | |-------|-----------|---------| | ✅ OK | < 5 min | Normal | | ⚠️ Warning | 5 min – 1 hour | Clock drifting | | 🔴 Critical | 1 hour – 30 days | Likely no time source | | 🟣 Absurd | > 30 days | Firmware default or epoch 0 | ### New API Endpoints - `GET /api/nodes/{pubkey}/clock-skew` — per-node skew data (mean, median, last, drift, severity) - `GET /api/observers/clock-skew` — observer calibration offsets - Clock skew also included in `GET /api/nodes/{pubkey}/analytics` response as `clockSkew` field ### Performance - 30-second compute cache avoids reprocessing on every request - Operates on in-memory `byPayloadType[ADVERT]` index — no DB queries - O(n) in total ADVERT observations, O(m log m) for median calculations ## Tests 15 unit tests covering: - Severity classification at all thresholds - Median/mean math helpers - ISO timestamp parsing - Timestamp extraction from decoded JSON (nested and top-level) - Observer calibration with single and multi-observer scenarios - Observer offset correction direction (verified the sign is `+obsOffset`) - Drift estimation: stable, linear, insufficient data, short time span - JSON number extraction edge cases ## What's NOT in This PR - No UI changes (M2–M4) - No customizer integration (M5) - Thresholds are hardcoded constants (will be configurable in M5) Implements #690 M1. --------- Co-authored-by: you <you@example.com> |
||
|
|
aa84ce1e6a |
fix: correct hash_size detection for transport routes and zero-hop adverts (#747)
## Summary Fixes #744 Fixes #722 Three bugs in hash_size computation caused zero-hop adverts to incorrectly report `hash_size=1`, masking nodes that actually use multi-byte hashes. ## Bugs Fixed ### 1. Wrong path byte offset for transport routes (`computeNodeHashSizeInfo`) Transport routes (types 0 and 3) have 4 transport code bytes before the path byte. The code read the path byte from offset 1 (byte index `RawHex[2:4]`) for all route types. For transport routes, the correct offset is 5 (`RawHex[10:12]`). ### 2. Missing RouteTransportDirect skip (`computeNodeHashSizeInfo`) Zero-hop adverts from `RouteDirect` (type 2) were correctly skipped, but `RouteTransportDirect` (type 3) zero-hop adverts were not. Both have locally-generated path bytes with unreliable hash_size bits. ### 3. Zero-hop adverts not skipped in analytics (`computeAnalyticsHashSizes`) `computeAnalyticsHashSizes()` unconditionally overwrote a node's `hashSize` with whatever the latest advert reported. A zero-hop direct advert with `hash_size=1` could overwrite a previously-correct `hash_size=2` from a multi-hop flood advert. Fix: skip hash_size update for zero-hop direct/transport-direct adverts while still counting the packet and updating `lastSeen`. ## Tests Added - `TestHashSizeTransportRoutePathByteOffset` — verifies transport routes read path byte at offset 5, regular flood reads at offset 1 - `TestHashSizeTransportDirectZeroHopSkipped` — verifies both RouteDirect and RouteTransportDirect zero-hop adverts are skipped - `TestAnalyticsHashSizesZeroHopSkip` — verifies analytics hash_size is not overwritten by zero-hop adverts - Fixed 3 existing tests (`FlipFlop`, `Dominant`, `LatestWins`) that used route_type 0 (TransportFlood) header bytes without proper transport code padding ## Complexity All changes are O(1) per packet — no new loops or data structures. The additional offset computation and zero-hop check are constant-time operations within the existing packet scan loop. Co-authored-by: you <you@example.com> |
||
|
|
84f03f4f41 |
fix: hide undecryptable channel messages by default (#727) (#728)
## Problem Channels page shows 53K 'Unknown' messages — undecryptable GRP_TXT packets with no content. Pure noise. ## Fix - Backend: channels API filters out undecrypted messages by default - `?includeEncrypted=true` param to include them - Frontend: 'Show encrypted' toggle in channels sidebar - Unknown channels grayed out with '(no key)' label - Toggle persists in localStorage Fixes #727 --------- Co-authored-by: you <you@example.com> |
||
|
|
65482ff6f6 |
fix: cache invalidation tuning — 7% → 50-80% hit rate (#721)
## Cache Invalidation Tuning — 7% → 50-80% Hit Rate Fixes #720 ### Problem Server-side cache hit rate was 7% (48 hits / 631 misses over 4.7 days). Root causes from the [cache audit report](https://github.com/Kpa-clawbot/CoreScope/issues/720): 1. **`invalidationDebounce` config value (30s) was dead code** — never wired to `invCooldown` 2. **`invCooldown` hardcoded to 10s** — with continuous ingest, caches cleared every 10s regardless of their 1800s TTLs 3. **`collisionCache` cleared on every `hasNewTransmissions`** — hash collisions are structural (depend on node count), not per-packet ### Changes | Change | File | Impact | |--------|------|--------| | Wire `invalidationDebounce` from config → `invCooldown` | `store.go` | Config actually works now | | Default `invCooldown` 10s → 300s (5 min) | `store.go` | 30x longer cache survival | | Add `hasNewNodes` flag to `cacheInvalidation` | `store.go` | Finer-grained invalidation | | `collisionCache` only clears on `hasNewNodes` | `store.go` | O(n²) collision computation survives its 1hr TTL | | `addToByNode` returns new-node indicator | `store.go` | Zero-cost detection during indexing | | `indexByNode` returns new-node indicator | `store.go` | Propagates to ingest path | | Ingest tracks and passes `hasNewNodes` | `store.go` | End-to-end wiring | ### Tests Added | Test | What it verifies | |------|-----------------| | `TestInvCooldownFromConfig` | Config value wired to `invCooldown`; default is 300s | | `TestCollisionCacheNotClearedByTransmissions` | `hasNewTransmissions` alone does NOT clear `collisionCache` | | `TestCollisionCacheClearedByNewNodes` | `hasNewNodes` DOES clear `collisionCache` | | `TestCacheSurvivesMultipleIngestCyclesWithinCooldown` | 5 rapid ingest cycles don't clear any caches during cooldown | | `TestNewNodesAccumulatedDuringCooldown` | `hasNewNodes` accumulated in `pendingInv` and applied after cooldown | | `BenchmarkAnalyticsLatencyCacheHitVsMiss` | 100% hit rate with rate-limited invalidation | All 200+ existing tests pass. Both benchmarks show 100% hit rate. ### Performance Justification - **Before:** Effective cache lifetime = `min(TTL, invCooldown)` = 10s. With analytics viewed ~once/few minutes, P(hit) ≈ 7% - **After:** Effective cache lifetime = `min(TTL, 300s)` = 300s for most caches, 3600s for `collisionCache`. Expected hit rate 50-80% - **Complexity:** All changes are O(1) — `addToByNode` already checked `nodeHashes[pubkey] == nil`, we just return the result - **Benchmark proof:** `BenchmarkAnalyticsLatencyCacheHitVsMiss` → 100% hit rate, 269ns/op Co-authored-by: you <you@example.com> |
||
|
|
f95aa49804 |
fix: exclude TRACE packets from multi-byte capability suspected detection (#715)
## Summary Exclude TRACE packets (payload_type 8) from the "suspected" multi-byte capability inference logic. TRACE packets carry hash size in their own flags — forwarding repeaters read it from the TRACE header, not their compile-time `PATH_HASH_SIZE`. Pre-1.14 repeaters can forward multi-byte TRACEs without actually supporting multi-byte hashes, creating false positives. Fixes #714 ## Changes ### `cmd/server/store.go` - In `computeMultiByteCapability()`, skip packets with `payload_type == 8` (TRACE) when scanning `byPathHop` for suspected multi-byte nodes - "Confirmed" detection (from adverts) is unaffected ### `cmd/server/multibyte_capability_test.go` - `TestMultiByteCapability_TraceExcluded`: TRACE packet with 2-byte path does NOT mark repeater as suspected - `TestMultiByteCapability_NonTraceStillSuspected`: Non-TRACE packet with 2-byte path still marks as suspected - `TestMultiByteCapability_ConfirmedUnaffectedByTraceExclusion`: Confirmed status from advert unaffected by TRACE exclusion ## Testing All 7 multi-byte capability tests pass. Full `cmd/server` and `cmd/ingestor` test suites pass. Co-authored-by: you <you@example.com> |