mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-05-11 20:24:43 +00:00
153308134ecbde4ee9fc951dd0f97fdfcbc23f35
245 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
153308134e |
feat: add global observer IATA whitelist config (#1001)
## Summary
Adds a global `observerIATAWhitelist` config field that restricts which
observer IATA regions are processed by the ingestor.
## Problem
Operators running regional instances (e.g., Sweden) want to ensure only
observers physically in their region contribute data. The existing
per-source `iataFilter` only filters packet messages but still allows
status messages through, meaning observers from other regions appear in
the database.
## Solution
New top-level config field `observerIATAWhitelist`:
- When non-empty, **all** messages (status + packets) from observers
outside the whitelist are silently dropped
- Case-insensitive matching
- Empty list = all regions allowed (fully backwards compatible)
- Lazy O(1) lookup via cached uppercase set (same pattern as
`observerBlacklist`)
### Config example
```json
{
"observerIATAWhitelist": ["ARN", "GOT"]
}
```
## TDD
- **Red commit:** `f19c2b2` — tests for `ObserverIATAWhitelist` field
and `IsObserverIATAAllowed` method (build fails)
- **Green commit:** `782f516` — implementation + integration test
## Files changed
- `cmd/ingestor/config.go` — new field, new method
`IsObserverIATAAllowed`
- `cmd/ingestor/main.go` — whitelist check in `handleMessage` before
status processing
- `cmd/ingestor/config_test.go` — unit tests for config parsing and
matching
- `cmd/ingestor/main_test.go` — integration test for handleMessage
filtering
Fixes #914
---------
Co-authored-by: you <you@example.com>
|
||
|
|
e86b5a3a0c |
feat: show multi-byte hash support indicator on map markers (#1002)
## Summary Show 2-byte hash support indicator on map markers. Fixes #903. ## What changed ### Backend (`cmd/server/store.go`, `cmd/server/routes.go`) - **`EnrichNodeWithMultiByte()`** — new enrichment function that adds `multi_byte_status` (confirmed/suspected/unknown), `multi_byte_evidence` (advert/path), and `multi_byte_max_hash_size` fields to node API responses - **`GetMultiByteCapMap()`** — cached (15s TTL) map of pubkey → `MultiByteCapEntry`, reusing the existing `computeMultiByteCapability()` logic that combines advert-based and path-hop-based evidence - Wired into both `/api/nodes` (list) and `/api/nodes/{pubkey}` (detail) endpoints ### Frontend (`public/map.js`) - Added **"Multi-byte support"** checkbox in the map Display controls section - When toggled on, repeater markers change color: - 🟢 Green (`#27ae60`) — **confirmed** (advertised with hash_size ≥ 2) - 🟡 Yellow (`#f39c12`) — **suspected** (seen as hop in multi-byte path) - 🔴 Red (`#e74c3c`) — **unknown** (no multi-byte evidence) - Popup tooltip shows multi-byte status and evidence for repeaters - State persisted in localStorage (`meshcore-map-multibyte-overlay`) ## TDD - Red commit: `2f49cbc` — failing test for `EnrichNodeWithMultiByte` - Green commit: `4957782` — implementation + passing tests ## Performance - `GetMultiByteCapMap()` uses a 15s TTL cache (same pattern as `GetNodeHashSizeInfo`) - Enrichment is O(n) over nodes, no per-item API calls - Frontend color override is computed inline during existing marker render loop — no additional DOM rebuilds --------- Co-authored-by: you <you@example.com> |
||
|
|
2e3a94b86d |
chore(db): one-time cleanup of legacy packets with empty hash or null timestamp (closes #994) (#997)
## Summary One-time startup migration that deletes legacy packets (transmissions + observations) with empty hash or empty `first_seen` timestamp. This is the write-side cleanup following #993's read-side filter. ### Migration: `cleanup_legacy_null_hash_ts` - Checks `_migrations` table for marker - If not present: deletes observations referencing bad transmissions, then deletes the transmissions themselves - Logs count of deleted rows - Records marker for idempotency ### TDD - **Red commit:** `b1a24a1` — test asserts migration deletes bad rows (fails without implementation) - **Green commit:** `2b94522` — implements the migration, all tests pass Fixes #994 --------- Co-authored-by: you <you@example.com> |
||
|
|
564d93d6aa |
fix: dedup topology analytics by resolved pubkey (#998)
## Fix topology analytics double-counting repeaters/pairs (#909) ### Problem `computeAnalyticsTopology()` aggregates by raw hop hex string. When firmware emits variable-length path hashes (1-3 bytes per hop), the same physical node appears multiple times with different prefix lengths (e.g. `"07"`, `"0735bc"`, `"0735bc6d"` all referring to the same node). This inflates repeater counts and creates duplicate pair entries. ### Solution Added a confidence-gated dedup pass after frequency counting: 1. **For each hop prefix**, check if it resolves unambiguously (exactly 1 candidate in the prefix map) 2. **Unambiguous prefixes** → group by resolved pubkey, sum counts, keep longest prefix as display identifier 3. **Ambiguous prefixes** (multiple candidates for that prefix) → left as separate entries (conservative) 4. **Same treatment for pairs**: canonicalize by sorted pubkey pair ### Addressing @efiten's collision concern At scale (~2000+ repeaters), 1-byte prefixes (256 buckets) WILL collide. This fix explicitly checks the prefix map candidate count. Ambiguous prefixes (where `len(pm.m[hop]) > 1`) are never merged — they remain as separate entries. Only prefixes with a single matching node are eligible for dedup. ### TDD - **Red commit**: `4dbf9c0` — added 3 failing tests - **Green commit**: `d6cae9a` — implemented dedup, all tests pass ### Tests added - `TestTopologyDedup_RepeatersMergeByPubkey` — verifies entries with different prefix lengths for same node merge to single entry with summed count - `TestTopologyDedup_AmbiguousPrefixNotMerged` — verifies colliding short prefix stays separate from unambiguous longer prefix - `TestTopologyDedup_PairsMergeByPubkey` — verifies pair entries merge by resolved pubkey pair Fixes #909 --------- Co-authored-by: you <you@example.com> |
||
|
|
b7c280c20a |
fix: drop/filter packets with null hash or timestamp (closes #871) (#993)
## Summary Closes #871 The `/api/packets` endpoint could return packets with `null` hash or timestamp fields. This was caused by legacy data in SQLite (rows with empty `hash` or `NULL`/empty `first_seen`) predating the ingestor's existing validation guard (`if hash == "" { return false, nil }` at `cmd/ingestor/db.go:610`). ## Root Cause `cmd/server/store.go` `filterPackets()` had no data-integrity guard. Legacy rows with empty `hash` or `first_seen` were loaded into the in-memory store and returned verbatim. The `strOrNil("")` helper then serialized these as JSON `null`. ## Fix Added a data-integrity predicate at the top of `filterPackets`'s scan callback (`cmd/server/store.go:2278`): ```go if tx.Hash == "" || tx.FirstSeen == "" { return false } ``` This filters bad legacy rows at query time. The write path (ingestor) already rejects empty hashes, so no new bad data enters. ## TDD Evidence - **Red commit:** `15774c3` — test `TestIssue871_NoNullHashOrTimestamp` asserts no packet in API response has null/empty hash or timestamp - **Green commit:** `281fd6f` — adds the filter guard, test passes ## Testing - `go test ./...` in `cmd/server` passes (full suite) - Client-side defensive filter from PR #868 remains as defense-in-depth --------- Co-authored-by: you <you@example.com> |
||
|
|
d43c95a4bb |
fix(ingestor): warn when TRACE payload decode fails but observation stored (closes #889) (#992)
## Summary Closes #889. When a TRACE packet's payload is too short to decode (< 9 bytes), `decodeTrace` returns an error in `Payload.Error` but the observation is still stored with empty `Path.Hops`. Previously this was completely silent — no log, no anomaly flag, no indication the row is degraded. This fix populates `DecodedPacket.Anomaly` with the decode error message (e.g., `"TRACE payload decode failed: too short"`) so operators and downstream consumers can identify degraded observations. ## TDD Commit History 1. **Red commit** `04e0165` — failing test asserting `Anomaly` is set when TRACE payload decode fails 2. **Green commit** `d3e72d1` — 3-line fix in `decoder.go` line 601-603: check `payload.Error != ""` for TRACE packets and set anomaly ## What Changed `cmd/ingestor/decoder.go` (lines 601-603): Added a check before the existing TRACE path-parsing block. If `payload.Error` is non-empty for a TRACE packet, `anomaly` is set to `"TRACE payload decode failed: <error>"`. `cmd/ingestor/decoder_test.go`: Added `TestDecodeTracePayloadFailSetsAnomaly` — constructs a TRACE packet with a 4-byte payload (too short), asserts the packet is still returned (observation stored) and `Anomaly` is populated. ## Verification - `go build ./...` ✓ - `go test ./...` ✓ (all pass including new test) - Anti-tautology: reverting the fix causes the new test to fail (asserts `pkt.Anomaly == ""` → error) --------- Co-authored-by: you <you@example.com> |
||
|
|
dd2f044f2b |
fix: cache RW SQLite connection + dedup DBConfig (closes #921) (#982)
Closes #921 ## Summary Follow-up to #920 (incremental auto-vacuum). Addresses both items from the adversarial review: ### 1. RW connection caching Previously, every call to `openRW(dbPath)` opened a new SQLite RW connection and closed it after use. This happened in: - `runIncrementalVacuum` (~4x/hour) - `PruneOldPackets`, `PruneOldMetrics`, `RemoveStaleObservers` - `buildAndPersistEdges`, `PruneNeighborEdges` - All neighbor persist operations Now a single `*sql.DB` handle (with `MaxOpenConns(1)`) is cached process-wide via `cachedRW(dbPath)`. The underlying connection pool manages serialization. The original `openRW()` function is retained for one-shot test usage. ### 2. DBConfig dedup `DBConfig` was defined identically in both `cmd/server/config.go` and `cmd/ingestor/config.go`. Extracted to `internal/dbconfig/` as a shared package; both binaries now use a type alias (`type DBConfig = dbconfig.DBConfig`). ## Tests added | Test | File | |------|------| | `TestCachedRW_ReturnsSameHandle` | `cmd/server/rw_cache_test.go` | | `TestCachedRW_100Calls_SingleConnection` | `cmd/server/rw_cache_test.go` | | `TestGetIncrementalVacuumPages_Default` | `internal/dbconfig/dbconfig_test.go` | | `TestGetIncrementalVacuumPages_Configured` | `internal/dbconfig/dbconfig_test.go` | ## Verification ``` ok github.com/corescope/server 20.069s ok github.com/corescope/ingestor 47.117s ok github.com/meshcore-analyzer/dbconfig 0.003s ``` Both binaries build cleanly. 100 sequential `cachedRW()` calls return the same handle with exactly 1 entry in the cache map. --------- Co-authored-by: you <you@example.com> |
||
|
|
58484ad924 |
feat(ingestor): backfill observations.path_json from raw_hex (closes #888) (#983)
## Summary Adds an idempotent startup migration to the ingestor that backfills `observations.path_json` from per-observation `raw_hex` (added in #882). **Approach: Server-side migration (Option B)** — runs automatically at startup, chunked in batches of 1000, tracked via `_migrations` table. Chosen over a standalone script because: 1. Follows existing migration pattern (channel_hash, last_packet_at, etc.) 2. Zero operator action required — just deploy 3. Idempotent — safe to restart mid-migration (uncommitted rows get picked up next run) ## What it does - Selects observations where `raw_hex` is populated but `path_json` is NULL/empty/`[]` - Excludes TRACE packets (`payload_type = 9`) at the SQL level — their header bytes are SNR values, not hops - Decodes hops via `packetpath.DecodePathFromRawHex` (reuses existing helper) - Updates `path_json` with the decoded JSON array - Marks rows with undecoded/empty hops as `'[]'` to prevent infinite re-scanning - Records `backfill_path_json_from_raw_hex_v1` in `_migrations` when complete ## Safety - **Never overwrites** existing non-empty `path_json` — only fills where missing - **Batched** (1000 rows per iteration) — won't OOM on large DBs - **TRACE-safe** — excluded at query level per `packetpath.PathBytesAreHops` semantics ## Test `TestBackfillPathJsonFromRawHex` — creates synthetic observations with: - Empty path_json + valid raw_hex → verifies backfill populates correctly - NULL path_json → verifies backfill populates - Existing path_json → verifies NO overwrite - TRACE packet → verifies skip Anti-tautology: test asserts specific decoded values (`["AABB","CCDD"]`) from known raw_hex input, not just "something changed." Closes #888 Co-authored-by: you <you@example.com> |
||
|
|
fc57433f27 |
fix(analytics): merge channel buckets by hash byte; reject rainbow-table mismatches (closes #978) (#980)
## Summary Closes #978 — analytics channels duplicated by encrypted/decrypted split + rainbow-table collisions. ## Root cause Two distinct bugs in `computeAnalyticsChannels` (`cmd/server/store.go`): 1. **Encrypted/decrypted split**: The grouping key included the decoded channel name (`hash + "_" + channel`), so packets from observers that could decrypt a channel created a separate bucket from packets where decryption failed. Same physical channel, two entries. 2. **Rainbow-table collisions**: Some observers' lookup tables map hash bytes to wrong channel names. E.g., hash `72` incorrectly claimed to be `#wardriving` (real hash is `129`). This created ghost 1-message entries. ## Fix 1. **Always group by hash byte alone** (drop `_channel` suffix from `chKey`). When any packet decrypts successfully, upgrade the bucket's display name from placeholder (`chN`) to the real name (first-decrypter-wins for stability). 2. **Validate channel names** against the firmware hash invariant: `SHA256(SHA256("#name")[:16])[0] == channelHash`. Mismatches are treated as encrypted (placeholder name, no trust in decoded channel). Guard is in the analytics handler (not the ingestor) to avoid breaking other surfaces that use the decoded field for display. ## Verification (e2e-fixture.db) | Metric | BEFORE | AFTER | |--------|--------|-------| | Total channels | 22 | 19 | | Duplicate hash bytes | 3 (hashes 217, 202, 17) | 0 | ## Tests added - `TestComputeAnalyticsChannels_MergesEncryptedAndDecrypted` — same hash, mixed encrypted/decrypted → ONE bucket - `TestComputeAnalyticsChannels_RejectsRainbowTableMismatch` — hash 72 claimed as `#wardriving` (real=129) → rejected, stays `ch72` - `TestChannelNameMatchesHash` — unit test for hash validation helper - `TestIsPlaceholderName` — unit test for placeholder detection Anti-tautology gate: both main tests fail when their respective fix lines are reverted. Co-authored-by: you <you@example.com> |
||
|
|
5aa8f795cd |
feat(ingestor): per-source MQTT connect timeout (#931) (#977)
## Summary Per-source MQTT connect timeout, correctly targeting the `WaitTimeout` startup gate (#931). ## What changed - Added `connectTimeoutSec` field to `MQTTSource` struct (per-source, not global) — `config.go:24` - Added `ConnectTimeoutOrDefault()` helper returning configured value or 30 (default from #926) — `config.go:29` - Replaced hardcoded `WaitTimeout(30 * time.Second)` with `WaitTimeout(time.Duration(connectTimeout) * time.Second)` — `main.go:173` - Updated `config.example.json` with field at source level - Unit tests for default (30) and custom values ## Why this supersedes #976 PR #976 made paho's `SetConnectTimeout` (per-TCP-dial, was 10s) configurable via a **global** `mqttConnectTimeoutSeconds` field. Issue #931 explicitly references the **30s timeout** — which is `WaitTimeout(30s)`, the startup gate from #926. It also requests **per-source** config, not global. This PR targets the correct timeout at the correct granularity. ## Live verification (Rule 18) Two sources pointed at unreachable brokers: - `fast` (`connectTimeoutSec: 5`): timed out in 5s ✅ - `default` (unset): timed out in 30s ✅ ``` 19:00:35 MQTT [fast] connect timeout: 5s 19:00:40 MQTT [fast] initial connection timed out — retrying in background 19:00:40 MQTT [default] connect timeout: 30s 19:01:10 MQTT [default] initial connection timed out — retrying in background ``` Closes #931 Supersedes #976 Co-authored-by: you <you@example.com> |
||
|
|
1e7c187521 |
fix(ingestor): address review BLOCKERs from PR #926 (goroutine leak + guard semantics) [v2] (#974)
## fix(ingestor): address review BLOCKERs from PR #926 (goroutine leak + guard semantics) Supersedes #970. Rebased onto current master to resolve merge conflicts. ### Changes (same as #970) - **BL1 (goroutine leak):** Call `client.Disconnect(0)` on the error path after `Connect()` fails with `ConnectRetry=true`, preventing Paho's internal retry goroutines from leaking. - **BL2 (guard semantics):** Use `connectedCount == 0` instead of `len(clients) == 0` to detect zero-connected state, since timed-out clients are appended to the slice. - **Tests:** `TestBL1_GoroutineLeakOnHardFailure` and `TestBL2_ZeroConnectedFatals` covering both blockers. ### Context - Fixes blockers raised in review of #926 - Related: #910 (original hang bug) Co-authored-by: you <you@example.com> |
||
|
|
4b8d8143f4 |
feat(server): explicit CORS policy with configurable origin allowlist (#883) (#971)
## Summary Adds explicit CORS policy support to the CoreScope API server, closing #883. ### Problem The API relied on browser same-origin defaults with no way for operators to configure cross-origin access. Operators running dashboards or third-party frontends on different origins had no supported way to make API calls. ### Solution **New config option:** `corsAllowedOrigins` (string array, default `[]`) **Middleware behavior:** | Config | Behavior | |--------|----------| | `[]` (default) | No `Access-Control-*` headers added — browsers enforce same-origin. **Preserves current behavior.** | | `["https://dashboard.example.com"]` | Echoes matching `Origin`, sets `Allow-Methods`/`Allow-Headers` | | `["*"]` | Sets `Access-Control-Allow-Origin: *` (explicit opt-in only) | **Headers set when origin matches:** - `Access-Control-Allow-Origin: <origin>` (or `*`) - `Access-Control-Allow-Methods: GET, POST, OPTIONS` - `Access-Control-Allow-Headers: Content-Type, X-API-Key` - `Vary: Origin` (non-wildcard only) **Preflight handling:** `OPTIONS` → `204 No Content` with CORS headers (or `403` if origin not in allowlist). ### Config example ```json { "corsAllowedOrigins": ["https://dashboard.example.com", "https://monitor.internal"] } ``` ### Files changed | File | Change | |------|--------| | `cmd/server/cors.go` | New CORS middleware | | `cmd/server/cors_test.go` | 7 unit tests covering all branches | | `cmd/server/config.go` | `CORSAllowedOrigins` field | | `cmd/server/routes.go` | Wire middleware before all routes | ### Testing **Unit tests (7):** - Default config → no CORS headers - Allowlist match → headers present with `Vary: Origin` - Allowlist miss → no CORS headers - Preflight allowed → 204 with headers - Preflight rejected → 403 - Wildcard → `*` without `Vary` - No `Origin` header → pass-through **Live verification (Rule 18):** ``` # Default (empty corsAllowedOrigins): $ curl -I -H "Origin: https://evil.example" localhost:19883/api/health HTTP/1.1 200 OK # No Access-Control-* headers ✓ # With corsAllowedOrigins: ["https://good.example"]: $ curl -I -H "Origin: https://good.example" localhost:19884/api/health Access-Control-Allow-Origin: https://good.example Access-Control-Allow-Methods: GET, POST, OPTIONS Access-Control-Allow-Headers: Content-Type, X-API-Key Vary: Origin ✓ $ curl -I -H "Origin: https://evil.example" localhost:19884/api/health # No Access-Control-* headers ✓ $ curl -I -X OPTIONS -H "Origin: https://good.example" localhost:19884/api/health HTTP/1.1 204 No Content Access-Control-Allow-Origin: https://good.example ✓ ``` Closes #883 Co-authored-by: you <you@example.com> |
||
|
|
3364eed303 |
feat: separate "Last Status Update" from "Last Packet Observation" for observers (v3 rebase) (#969)
Rebased version of #968 (which was itself a rebase of #905) — resolves merge conflict with #906 (clock-skew UI) that landed on master. ## Conflict resolution **`public/observers.js`** — master (#906) added "Clock Offset" column to observer table; #968 split "Last Seen" into "Last Status" + "Last Packet" columns. Combined both: the table now has Status | Name | Region | Last Status | Last Packet | Packets | Packets/Hour | Clock Offset | Uptime. ## What this PR adds (unchanged from #968/#905) - `last_packet_at` column in observers DB table - Separate "Last Status Update" and "Last Packet Observation" display in observers list and detail page - Server-side migration to add the column automatically - Backfill heuristic for existing data - Tests for ingestor and server ## Verification - All Go tests pass (`cmd/server`, `cmd/ingestor`) - Frontend tests pass (`test-packets.js`, `test-hash-color.js`) - Built server, hit `/api/observers` — `last_packet_at` field present in JSON - Observer table header has all 9 columns including both Last Packet and Clock Offset ## Prior PRs - #905 — original (conflicts with master) - #968 — first rebase (conflicts after #906 landed) - This PR — second rebase, resolves #906 conflict Supersedes #968. Closes #905. --------- Co-authored-by: you <you@example.com> |
||
|
|
d65122491e |
fix(ingestor): unblock startup when one of multiple MQTT sources is unreachable (#926)
## Summary - With `ConnectRetry=true`, paho's `token.Wait()` only returns on success — it blocks forever for unreachable brokers, stalling the entire startup loop before any other source connects - Switches to `token.WaitTimeout(30s)`: on timeout the client is still tracked so `ConnectRetry` keeps retrying in background; `OnConnect` fires and subscribes when it eventually connects - Adds `TestMQTTConnectRetryTimeoutDoesNotBlock` to confirm `WaitTimeout` returns within deadline for unreachable brokers (regression guard for this exact failure mode) Fixes #910 ## Test plan - [x] Two MQTT sources configured, one unreachable: ingestor reaches `Running` status and ingests from the reachable source immediately on startup - [x] Unreachable source logs `initial connection timed out — retrying in background` and reconnects automatically when the broker comes back - [x] Single source, reachable: behaviour unchanged (`Running — 1 MQTT source(s) connected`) - [x] Single source, unreachable: `Running — 0 MQTT source(s) connected, 1 retrying in background`; ingestion starts once broker is available - [x] `go test ./...` passes (excluding pre-existing `TestOpenStoreInvalidPath` failure on master) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
40c3aa13f9 |
fix(paths): exclude false-positive paths from short-prefix collisions (#930)
Fixes #929 ## Summary - `handleNodePaths` pulls candidates from `byPathHop` using 2-char and 4-char prefix keys (e.g. `"7a"` for a node using 1-byte adverts) - When two nodes share the same short prefix, paths through the *other* node are included as candidates - The `resolved_path` post-filter covers decoded packets but falls through conservatively (`inIndex = true`) when `resolved_path` is NULL, letting false positives reach the response **Fix:** during the aggregation phase (which already calls `resolveHop` per hop), add a `containsTarget` check. If every hop resolves to a different node's pubkey, skip the path. Packets confirmed via the full-pubkey index key or via SQL bypass the check. Unresolvable hops are kept conservatively. ## Test plan - [x] `TestHandleNodePaths_PrefixCollisionExclusion`: two nodes sharing `"7a"` prefix; verifies the path with no `resolved_path` (false positive) is excluded and the SQL-confirmed path (true positive) is included - [x] Full test suite: `go test github.com/corescope/server` — all pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
b47587f031 |
feat(#690): expose observer skew + per-hash evidence in clock UI (#906)
## Summary UI completion of #690 — surfaces observer clock skew and per-hash evidence that the backend already computes but wasn't exposed in the frontend. **Not related to #845/PR #894** (bimodal detection) — this is the UI surface for the original #690 scope. ## Changes ### Backend: per-hash evidence in node clock-skew API (commit 1) - Extended `GET /api/nodes/{pubkey}/clock-skew` to return `recentHashEvidence` (most recent 10 hashes with per-observer raw/corrected skew and observer offset) and `calibrationSummary` (total/calibrated/uncalibrated counts). - Evidence is cached during `ClockSkewEngine.Recompute()` — route handler is cheap. - Fleet endpoint omits evidence to keep payload small. ### Frontend: observer list page — clock offset column (commit 2) - Added "Clock Offset" column to observers table. - Fetches `/api/observers/clock-skew` once on page load, joins by ObserverID. - Color-coded severity badge + sample count tooltip. - Singleton observers show "—" not "0". ### Frontend: observer-detail clock card (commit 3) - Added clock offset card mirroring node clock card style. - Shows: offset value, sample count, severity badge. - Inline explainer describing how offset is computed from multi-observer packets. ### Frontend: node clock card evidence panel (commit 4) - Collapsible "Evidence" section in existing node clock skew card. - Per-hash breakdown: observer count, median corrected skew, per-observer raw/corrected/offset. - Calibration summary line and plain-English severity reason at top. ## Test Results ``` go test ./... (cmd/server) — PASS (19.3s) go test ./... (cmd/ingestor) — PASS (31.6s) Frontend helpers: 610 passed, 0 failed ``` New test: `TestNodeClockSkew_EvidencePayload` — 3-observer scenario verifying per-hash array shape, corrected = raw + offset math, and median. No frontend JS smoke test added — no existing test harness for clock/observer rendering. Noted for future. ## Screenshots Screenshots TBD ## Perf justification Evidence is computed inside the existing `Recompute()` cycle (already O(n) on samples). The `hashEvidence` map adds ~32 bytes per sample of memory. Evidence is stripped from fleet responses. Per-node endpoint returns at most 10 evidence entries — bounded payload. --------- Co-authored-by: you <you@example.com> |
||
|
|
b3a9677c52 |
feat(ingestor + server): observerBlacklist config (#962) (#963)
## Summary Implements `observerBlacklist` config — mirrors the existing `nodeBlacklist` pattern for observers. Drop observers by pubkey at ingest, with defense-in-depth filtering on the server side. Closes #962 ## Changes ### Ingestor (`cmd/ingestor/`) - **`config.go`**: Added `ObserverBlacklist []string` field + `IsObserverBlacklisted()` method (case-insensitive, whitespace-trimmed) - **`main.go`**: Early return in `handleMessage` when `parts[2]` (observer ID from MQTT topic) matches blacklist — before status handling, before IATA filter. No UpsertObserver, no observations, no metrics insert. Log line: `observer <pubkey-short> blacklisted, dropping` ### Server (`cmd/server/`) - **`config.go`**: Same `ObserverBlacklist` field + `IsObserverBlacklisted()` with `sync.Once` cached set (same pattern as `nodeBlacklist`) - **`routes.go`**: Defense-in-depth filtering in `handleObservers` (skip blacklisted in list) and `handleObserverDetail` (404 for blacklisted ID) - **`main.go`**: Startup `softDeleteBlacklistedObservers()` marks matching rows `inactive=1` so historical data is hidden - **`neighbor_persist.go`**: `softDeleteBlacklistedObservers()` implementation ### Tests - `cmd/ingestor/observer_blacklist_test.go`: config method tests (case-insensitive, empty, nil) - `cmd/server/observer_blacklist_test.go`: config tests + HTTP handler tests (list excludes blacklisted, detail returns 404, no-blacklist passes all, concurrent safety) ## Config ```json { "observerBlacklist": [ "EE550DE547D7B94848A952C98F585881FCF946A128E72905E95517475F83CFB1" ] } ``` ## Verification (Rule 18 — actual server output) **Before blacklist** (no config): ``` Total: 31 DUBLIN in list: True ``` **After blacklist** (DUBLIN Observer pubkey in `observerBlacklist`): ``` [observer-blacklist] soft-deleted 1 blacklisted observer(s) Total: 30 DUBLIN in list: False ``` Detail endpoint for blacklisted observer returns **404**. All existing tests pass (`go test ./...` for both server and ingestor). --------- Co-authored-by: you <you@example.com> |
||
|
|
e1a1be1735 |
fix(server): add observers.inactive column at startup if missing (root cause of CI flake) (#961)
## The actual root cause PR #954 added `WHERE inactive IS NULL OR inactive = 0` to the server's observer queries, but the `inactive` column is only added by the **ingestor** migration (`cmd/ingestor/db.go:344-354`). When the server runs against a DB the ingestor never touched (e.g. the e2e fixture), the column doesn't exist: ``` $ sqlite3 test-fixtures/e2e-fixture.db "SELECT COUNT(*) FROM observers WHERE inactive IS NULL OR inactive = 0;" Error: no such column: inactive ``` The server's `db.QueryRow().Scan()` swallows that error → `totalObservers` stays 0 → `/api/observers` returns empty → map test fails with "No map markers/overlays found". This explains all the failing CI runs since #954 merged. PR #957 (freshen fixture) helped with the `nodes` time-rot but couldn't fix the missing-column problem. PR #960 (freshen observers) added the right timestamps but the column was still missing. PR #959 (data-loaded in finally) fixed a different real bug. None of those touched the actual mechanism. ## Fix Mirror the existing `ensureResolvedPathColumn` pattern: add `ensureObserverInactiveColumn` that runs at server startup, checks if the column exists via `PRAGMA table_info`, adds it with `ALTER TABLE observers ADD COLUMN inactive INTEGER DEFAULT 0` if missing. Wired into `cmd/server/main.go` immediately after `ensureResolvedPathColumn`. ## Verification End-to-end on a freshened fixture: ``` $ sqlite3 /tmp/e2e-verify.db "PRAGMA table_info(observers);" | grep inactive (no output — column absent) $ ./cs-fixed -port 13702 -db /tmp/e2e-verify.db -public public & [store] Added inactive column to observers $ curl 'http://localhost:13702/api/observers' returned=31 # was 0 before fix ``` `go test ./...` passes (19.8s). ## Lessons I should have run `sqlite3 fixture "SELECT ... WHERE inactive ..."` directly the first time the map test failed after #954 instead of writing four "fix" PRs that didn't address the actual mechanism. Apologies for the wild goose chase. Co-authored-by: Kpa-clawbot <bot@example.invalid> |
||
|
|
568de4b441 |
fix(observers): exclude soft-deleted observers from /api/observers and totalObservers (#954)
## Bug `/api/observers` returned soft-deleted (inactive=1) observers. Operators saw stale observers in the UI even after the auto-prune marked them inactive on schedule. Reproduced on staging: 14 observers older than 14 days returned by the API; all of them had `inactive=1` in the DB. ## Root cause `DB.GetObservers()` (`cmd/server/db.go:974`) ran `SELECT ... FROM observers ORDER BY last_seen DESC` with no WHERE filter. The `RemoveStaleObservers` path correctly soft-deletes by setting `inactive=1`, but the read path didn't honor it. `statsRow` (`cmd/server/db.go:234`) had the same bug — `totalObservers` count included soft-deleted rows. ## Fix Add `WHERE inactive IS NULL OR inactive = 0` to both: ```go // GetObservers "SELECT ... FROM observers WHERE inactive IS NULL OR inactive = 0 ORDER BY last_seen DESC" // statsRow.TotalObservers "SELECT COUNT(*) FROM observers WHERE inactive IS NULL OR inactive = 0" ``` `NULL` check preserves backward compatibility with rows from before the `inactive` migration. ## Tests Added regression `TestGetObservers_ExcludesInactive`: - Seed two observers, mark one inactive, assert `GetObservers()` returns only the other. - **Anti-tautology gate verified**: reverting the WHERE clause causes the test to fail with `expected 1 observer, got 2` and `inactive observer obs2 should be excluded`. `go test ./...` passes (19.6s). ## Out of scope - `GetObserverByID` lookup at line 1009 still returns inactive observers — this is intentional, so an old deep link to `/observers/<id>` shows "inactive" rather than 404. - Frontend may also have its own caching layer; this fix is server-side only. --------- Co-authored-by: Kpa-clawbot <bot@example.invalid> Co-authored-by: you <you@example.com> Co-authored-by: KpaBap <kpabap@gmail.com> |
||
|
|
57e272494d |
feat(server): /api/healthz readiness endpoint gated on store load (#955) (#956)
## Summary Fixes RCA #2 from #955: the HTTP listener and `/api/stats` go live before background goroutines (pickBestObservation, neighbor graph build) finish, causing CI readiness checks to pass prematurely. ## Changes 1. **`cmd/server/healthz.go`** — New `GET /api/healthz` endpoint: - Returns `503 {"ready":false,"reason":"loading"}` while background init is running - Returns `200 {"ready":true,"loadedTx":N,"loadedObs":N}` once ready 2. **`cmd/server/main.go`** — Added `sync.WaitGroup` tracking pickBestObservation and neighbor graph build goroutines. A coordinator goroutine sets `readiness.Store(1)` when all complete. `backfillResolvedPathsAsync` is NOT gated (async by design, can take 20+ min). 3. **`cmd/server/routes.go`** — Wired `/api/healthz` before system endpoints. 4. **`.github/workflows/deploy.yml`** — CI wait-for-ready loop now polls `/api/healthz` instead of `/api/stats`. 5. **`cmd/server/healthz_test.go`** — Tests for 503-before-ready, 200-after-ready, JSON shape, and anti-tautology gate. ## Rule 18 Verification Built and ran against `test-fixtures/e2e-fixture.db` (499 tx): - With the small fixture DB, init completes in <300ms so both immediate and delayed curls return 200 - Unit tests confirm 503 behavior when `readiness=0` (simulating slow init) - On production DBs with 100K+ txs, the 503 window would be 5-15s (pickBestObservation processes in 5000-tx chunks with 10ms yields) ## Test Results ``` === RUN TestHealthzNotReady --- PASS === RUN TestHealthzReady --- PASS === RUN TestHealthzAntiTautology --- PASS ok github.com/corescope/server 19.662s (full suite) ``` Co-authored-by: you <you@example.com> |
||
|
|
6345c6fb05 |
fix(ingestor): observability + bounded backoff for MQTT reconnect (#947) (#949)
## Summary Fixes #947 — MQTT ingestor silently stalls after `pingresp not received` disconnect due to paho's default 10-minute reconnect backoff and zero observability of reconnect attempts. ## Changes ### `cmd/ingestor/main.go` - **Extract `buildMQTTOpts()`** — encapsulates MQTT client option construction for testability - **`SetMaxReconnectInterval(30s)`** — bounds paho's default 10-minute exponential backoff (source: `options.go:137` in `paho.mqtt.golang@v1.5.0`) - **`SetConnectTimeout(10s)`** — prevents stuck connect attempts from blocking reconnect cycle - **`SetWriteTimeout(10s)`** — prevents stuck publish writes - **`SetReconnectingHandler`** — logs `MQTT [<tag>] reconnecting to <broker>` on every reconnect attempt, giving operators visibility into retry behavior - **Enhanced `SetConnectionLostHandler`** — now includes broker address in log line for multi-source disambiguation ### `cmd/ingestor/mqtt_opts_test.go` (new) - Tests verify `MaxReconnectInterval`, `ConnectTimeout`, `WriteTimeout` are set correctly - Tests verify credential and TLS configuration - Anti-tautology: tests fail if timing settings are removed from `buildMQTTOpts()` ## Operator impact After this change, a pingresp disconnect produces: ``` MQTT [staging] disconnected from tcp://broker:1883: pingresp not received, disconnecting MQTT [staging] reconnecting to tcp://broker:1883 MQTT [staging] reconnecting to tcp://broker:1883 MQTT [staging] connected to tcp://broker:1883 MQTT [staging] subscribed to meshcore/# ``` Max gap between disconnect and first reconnect attempt: ~30s (was up to 10 minutes). --------- Co-authored-by: you <you@example.com> |
||
|
|
e460932668 |
fix(store): apply retentionHours cutoff in Load() to prevent OOM on cold start (#917)
## Problem `Load()` loaded all transmissions from the DB regardless of `retentionHours`, so `buildSubpathIndex()` processed the full DB history on every startup. On a DB with ~280K paths this produces ~13.5M subpath index entries, OOM-killing the process before it ever starts listening — causing a supervisord crash loop with no useful error message. ## Fix Apply the same `retentionHours` cutoff to `Load()`'s SQL that `EvictStale()` already uses at runtime. Both conditions (`retentionHours` window and `maxPackets` cap) are combined with AND so neither safety limit is bypassed. Startup now builds indexes only over the retention window, making startup time and memory proportional to recent activity rather than total DB history. ## Docs - `config.example.json`: adds `retentionHours` to the `packetStore` block with recommended value `168` (7 days) and a warning about `0` on large DBs - `docs/user-guide/configuration.md`: documents the field and adds an explicit OOM warning ## Test plan - [x] `cd cmd/server && go test ./... -run TestRetentionLoad` — covers the retention-filtered load: verifies packets outside the window are excluded, and that `retentionHours: 0` still loads everything - [x] Deploy on an instance with a large DB (>100K paths) and `retentionHours: 168` — server reaches "listening" in seconds instead of OOM-crashing - [x] Verify `config.example.json` has `retentionHours: 168` in the `packetStore` block - [x] Verify `docs/user-guide/configuration.md` documents the field and warning 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Kpa-clawbot <kpaclawbot@outlook.com> |
||
|
|
aeae7813bc |
fix: enable SQLite incremental auto-vacuum so DB shrinks after retention (#919) (#920)
Closes #919 ## Summary Enables SQLite incremental auto-vacuum so the database file actually shrinks after retention reaper deletes old data. Previously, `DELETE` operations freed pages internally but never returned disk space to the OS. ## Changes ### 1. Auto-vacuum on new databases - `PRAGMA auto_vacuum = INCREMENTAL` set via DSN pragma before `journal_mode(WAL)` in the ingestor's `OpenStoreWithInterval` - Must be set before any tables are created; DSN ordering ensures this ### 2. Post-reaper incremental vacuum - `PRAGMA incremental_vacuum(N)` runs after every retention reaper cycle (packets, metrics, observers, neighbor edges) - N defaults to 1024 pages, configurable via `db.incrementalVacuumPages` - Noop on `auto_vacuum=NONE` databases (safe before migration) - Added to both server and ingestor ### 3. Opt-in full VACUUM for existing databases - Startup check logs a clear warning if `auto_vacuum != INCREMENTAL` - `db.vacuumOnStartup: true` config triggers one-time `PRAGMA auto_vacuum = INCREMENTAL; VACUUM` - Logs start/end time for operator visibility ### 4. Documentation - `docs/user-guide/configuration.md`: retention section notes that lowering retention doesn't immediately shrink the DB - `docs/user-guide/database.md`: new guide covering WAL, auto-vacuum, migration, manual VACUUM ### 5. Tests - `TestNewDBHasIncrementalAutoVacuum` — fresh DB gets `auto_vacuum=2` - `TestExistingDBHasAutoVacuumNone` — old DB stays at `auto_vacuum=0` - `TestVacuumOnStartupMigratesDB` — full VACUUM sets `auto_vacuum=2` - `TestIncrementalVacuumReducesFreelist` — DELETE + vacuum shrinks freelist - `TestCheckAutoVacuumLogs` — handles both modes without panic - `TestConfigIncrementalVacuumPages` — config defaults and overrides ## Migration path for existing databases 1. On startup, CoreScope logs: `[db] auto_vacuum=NONE — DB needs one-time VACUUM...` 2. Set `db.vacuumOnStartup: true` in config.json 3. Restart — VACUUM runs (blocks startup, minutes on large DBs) 4. Remove `vacuumOnStartup` after migration ## Test results ``` ok github.com/corescope/server 19.448s ok github.com/corescope/ingestor 30.682s ``` --------- Co-authored-by: you <you@example.com> |
||
|
|
54f7f9d35b |
feat: path-prefix candidate inspector with map view (#944) (#945)
## feat: path-prefix candidate inspector with map view (#944) Implements the locked spec from #944: a beam-search-based path prefix inspector that enumerates candidate full-pubkey paths from short hex prefixes and scores them. ### Server (`cmd/server/path_inspect.go`) - **`POST /api/paths/inspect`** — accepts 1-64 hex prefixes (1-3 bytes, uniform length per request) - Beam search (width 20) over cached `prefixMap` + `NeighborGraph` - Per-hop scoring: edge weight (35%), GPS plausibility (20%), recency (15%), prefix selectivity (30%) - Geometric mean aggregation with 0.05 floor per hop - Speculative threshold: score < 0.7 - Score cache: 30s TTL, keyed by (prefixes, observer, window) - Cold-start: synchronous NeighborGraph rebuild with 2s hard timeout → 503 `{retry:true}` - Body limit: 4096 bytes via `http.MaxBytesReader` - Zero SQL queries in handler hot path - Request validation: rejects empty, odd-length, >3 bytes, mixed lengths, >64 hops ### Frontend (`public/path-inspector.js`) - New page under Tools route with input field (comma/space separated hex prefixes) - Client-side validation with error feedback - Results table: rank, score (color-coded speculative), path names, per-hop evidence (collapsed) - "Show on Map" button calls `drawPacketRoute` (one path at a time, clears prior) - Deep link: `#/tools/path-inspector?prefixes=2c,a1,f4` ### Nav reorganization - `Traces` nav item renamed to `Tools` - Backward-compat: `#/traces/<hash>` redirects to `#/tools/trace/<hash>` - Tools sub-routing dispatches to traces or path-inspector ### Store changes - Added `LastSeen time.Time` to `nodeInfo` struct, populated from `nodes.last_seen` - Added `inspectMu` + `inspectCache` fields to `PacketStore` ### Tests - **Go unit tests** (`path_inspect_test.go`): scoreHop components, beam width cap, speculative flag, all validation error cases, valid request integration - **Frontend tests** (`test-path-inspector.js`): parse comma/space/mixed, validation (empty, odd, >3 bytes, mixed lengths, invalid hex, valid) - Anti-tautology gate verified: removing beam pruning fails width test; removing validation fails reject tests ### CSS - `--path-inspector-speculative` variable in both themes (amber, WCAG AA on both dark/light backgrounds) - All colors via CSS variables (no hardcoded hex in production code) Closes #944 --------- Co-authored-by: you <you@example.com> |
||
|
|
5678874128 |
fix: exclude non-repeater nodes from path-hop resolution (#935) (#936)
Fixes #935 ## Problem `buildPrefixMap()` indexed ALL nodes regardless of role, causing companions/sensors to appear as repeater hops when their pubkey prefix collided with a path-hop hash byte. ## Fix ### Server (`cmd/server/store.go`) - Added `canAppearInPath(role string) bool` — allowlist of roles that can forward packets (repeater, room_server, room) - `buildPrefixMap` now skips nodes that fail this check ### Client (`public/hop-resolver.js`) - Added matching `canAppearInPath(role)` helper - `init()` now only populates `prefixIdx` for path-eligible nodes - `pubkeyIdx` remains complete — `resolveFromServer()` still resolves any node type by full pubkey (for server-confirmed `resolved_path` arrays) ## Tests - `cmd/server/prefix_map_role_test.go`: 7 new tests covering role filtering in prefix map and resolveWithContext - `test-hop-resolver-affinity.js`: 4 new tests verifying client-side role filter + pubkeyIdx completeness - All existing tests updated to include `Role: "repeater"` where needed - `go test ./cmd/server/...` — PASS - `node test-hop-resolver-affinity.js` — 16/17 pass (1 pre-existing centroid failure unrelated to this change) ## Commits 1. `fix: filter prefix map to only repeater/room roles (#935)` — server implementation 2. `test: prefix map role filter coverage (#935)` — server tests 3. `ui: filter HopResolver prefix index to repeater/room roles (#935)` — client implementation 4. `test: hop-resolver role filter coverage (#935)` — client tests --------- Co-authored-by: you <you@example.com> |
||
|
|
6ca5e86df6 |
fix: compute hex-dump byte ranges client-side from per-obs raw_hex (#891)
## Symptom The colored byte strip in the packet detail pane is offset from the labeled byte breakdown below it. Off by N bytes where N is the difference between the top-level packet's path length and the displayed observation's path length. ## Root cause Server computes `breakdown.ranges` once from the top-level packet's raw_hex (in `BuildBreakdown`) and ships it in the API response. After #882 we render each observation's own raw_hex, but we keep using the top-level breakdown — so a 7-hop top-level packet shipped "Path: bytes 2-8", and when we rendered an 8-hop observation we coloured 7 of the 8 path bytes and bled into the payload. The labeled rows below (which use `buildFieldTable`) parse the displayed raw_hex on the client, so they were correct — they just didn't match the strip above. ## Fix Port `BuildBreakdown()` to JS as `computeBreakdownRanges()` in `app.js`. Use it in `renderDetail()` from the actually-rendered (per-obs) raw_hex. ## Test Manually verified the JS function output matches the Go implementation for FLOOD/non-transport, transport, ADVERT, and direct-advert (zero hops) cases. Closes nothing (caught in post-tag bug bash). --------- Co-authored-by: you <you@example.com> |
||
|
|
56ec590bc4 |
fix(#886): derive path_json from raw_hex at ingest (#887)
## Problem Per-observation `path_json` disagrees with `raw_hex` path section for TRACE packets. **Reproducer:** packet `af081a2c41281b1e`, observer `lutin🏡` - `path_json`: `["67","33","D6","33","67"]` (5 hops — from TRACE payload) - `raw_hex` path section: `30 2D 0D 23` (4 bytes — SNR values in header) ## Root Cause `DecodePacket` correctly parses TRACE packets by replacing `path.Hops` with hop IDs from the payload's `pathData` field (the actual route). However, the header path bytes for TRACE packets contain **SNR values** (one per completed hop), not hop IDs. `BuildPacketData` used `decoded.Path.Hops` to build `path_json`, which for TRACE packets contained the payload-derived hops — not the header path bytes that `raw_hex` stores. This caused `path_json` and `raw_hex` to describe completely different paths. ## Fix - Added `DecodePathFromRawHex(rawHex)` — extracts header path hops directly from raw hex bytes, independent of any TRACE payload overwriting. - `BuildPacketData` now calls `DecodePathFromRawHex(msg.Raw)` instead of using `decoded.Path.Hops`, guaranteeing `path_json` always matches the `raw_hex` path section. ## Tests (8 new) **`DecodePathFromRawHex` unit tests:** - hash_size 1, 2, 3, 4 - zero-hop direct packets - transport route (4-byte transport codes before path) **`BuildPacketData` integration tests:** - TRACE packet: asserts path_json matches raw_hex header path (not payload hops) - Non-TRACE packet: asserts path_json matches raw_hex header path All existing tests continue to pass (`go test ./...` for both ingestor and server). Fixes #886 --------- Co-authored-by: you <you@example.com> |
||
|
|
a605518d6d |
fix(#881): per-observation raw_hex — each observer sees different bytes on air (#882)
## Problem Each MeshCore observer receives a physically distinct over-the-air byte sequence for the same transmission (different path bytes, flags/hops remaining). The `observations` table stored only `path_json` per observer — all observations pointed at one `transmissions.raw_hex`. This prevented the hex pane from updating when switching observations in the packet detail view. ## Changes | Layer | Change | |-------|--------| | **Schema** | `ALTER TABLE observations ADD COLUMN raw_hex TEXT` (nullable). Migration: `observations_raw_hex_v1` | | **Ingestor** | `stmtInsertObservation` now stores per-observer `raw_hex` from MQTT payload | | **View** | `packets_v` uses `COALESCE(o.raw_hex, t.raw_hex)` — backward compatible with NULL historical rows | | **Server** | `enrichObs` prefers `obs.RawHex` when non-empty, falls back to `tx.RawHex` | | **Frontend** | No changes — `effectivePkt.raw_hex` already flows through `renderDetail` | ## Tests - **Ingestor**: `TestPerObservationRawHex` — two MQTT packets for same hash from different observers → both stored with distinct raw_hex - **Server**: `TestPerObservationRawHexEnrich` — enrichObs returns per-obs raw_hex when present, tx fallback when NULL - **E2E**: Playwright assertion in `test-e2e-playwright.js` for hex pane update on observation switch E2E assertion added: `test-e2e-playwright.js:1794` ## Scope - Historical observations: raw_hex stays NULL, UI falls back to transmission raw_hex silently - No backfill, no path_json reconstruction, no frontend changes Closes #881 --------- Co-authored-by: you <you@example.com> |
||
|
|
42ff5a291b |
fix(#866): full-page obs-switch — update hex + path + direction per observation (#870)
## Problem On `/#/packets/<hash>?obs=<id>`, clicking a different observation updated summary fields (Observer, SNR/RSSI, Timestamp) but **not** hex payload or path details. Sister bug to #849 (fixed in #851 for the detail dialog). ## Root Causes | Cause | Impact | |-------|--------| | `selectPacket` called `renderDetail` without `selectedObservationId` | Initial render missed observation context on some code paths | | `ObservationResp` missing `direction`, `resolved_path`, `raw_hex` | Frontend obs-switch lost direction and resolved_path context | | `obsPacket` construction omitted `direction` field | Direction not preserved when switching observations | ## Fix - `selectPacket` explicitly passes `selectedObservationId` to `renderDetail` - `ObservationResp` gains `Direction`, `ResolvedPath`, `RawHex` fields - `mapSliceToObservations` copies the three new fields - `obsPacket` spreads include `direction` from the observation ## Tests 7 new tests in `test-frontend-helpers.js`: - Observation switch updates `effectivePkt` path - `raw_hex` preserved from packet when obs has none - `raw_hex` from obs overrides when API provides it - `direction` carried through observation spread - `resolved_path` carried through observation spread - `getPathLenOffset` cross-check for transport routes - URL hash `?obs=` round-trip encoding All 584 frontend + 62 filter + 29 aging tests pass. Go server tests pass. Fixes #866 Co-authored-by: you <you@example.com> |
||
|
|
441409203e |
feat(#845): bimodal_clock severity — surface flaky-RTC nodes instead of hiding as 'No Clock' (#850)
## Problem Nodes with flaky RTC (firmware emitting interleaved good and nonsense timestamps) were classified as `no_clock` because the broken samples poisoned the recent median. Operators lost visibility into these nodes — they showed "No Clock" even though ~60% of their adverts had valid timestamps. Observed on staging: a node with 31K samples where recent adverts interleave good skew (-6.8s, -13.6s) with firmware nonsense (-56M, -60M seconds). Under the old logic, median of the mixed window → `no_clock`. ## Solution New `bimodal_clock` severity tier that surfaces flaky-RTC nodes with their real (good-sample) skew value. ### Classification order (first match wins) | Severity | Good Fraction | Description | |----------|--------------|-------------| | `no_clock` | < 10% | Essentially no real clock | | `bimodal_clock` | 10–80% (and bad > 0) | Mixed good/bad — flaky RTC | | `ok`/`warn`/`critical`/`absurd` | ≥ 80% | Normal classification | "Good" = `|skew| <= 1 hour`; "bad" = likely uninitialized RTC nonsense. When `bimodal_clock`, `recentMedianSkewSec` is computed from **good samples only**, so the dashboard shows the real working-clock value (e.g. -7s) instead of the broken median. ### Backend changes - New constant `BimodalSkewThresholdSec = 3600` - New severity `bimodal_clock` in classification logic - New API fields: `goodFraction`, `recentBadSampleCount`, `recentSampleCount` ### Frontend changes - Amber `Bimodal` badge with tooltip showing bad-sample percentage - Bimodal nodes render skew value like ok/warn/severe (not the "No Clock" path) - Warning line below sparkline: "⚠️ X of last Y adverts had nonsense timestamps (likely RTC reset)" ### Tests - 3 new Go unit tests: bimodal (60% good → bimodal_clock), all-bad (→ no_clock), 90%-good (→ ok) - 1 new frontend test: bimodal badge rendering with tooltip - Existing `TestReporterScenario_789` passes unchanged Builds on #789 (recent-window severity). Closes #845 --------- Co-authored-by: you <you@example.com> |
||
|
|
a371d35bfd |
feat(#847): dedupe Top Longest Hops by pair + add obs count and SNR cues (#848)
## Problem The "Top 20 Longest Hops" RF analytics card shows the same repeater pair filling most slots because the query sorts raw hop records by distance with no pair deduplication. A single long link observed 12+ times dominates the leaderboard. ## Fix Dedupe by unordered `(pk1, pk2)` pair. Per pair, keep the max-distance record and compute reliability metrics: | Column | Description | |--------|-------------| | **Obs** | Total observations of this link | | **Best SNR** | Maximum SNR seen (dB) | | **Median SNR** | Median SNR across all observations (dB) | Tooltip on each row shows the timestamp of the best observation. ### Before | # | From | To | Distance | Type | SNR | Packet | |---|------|----|----------|------|-----|--------| | 1 | NodeX | NodeY | 200 mi | R↔R | 5 dB | abc… | | 2 | NodeX | NodeY | 199 mi | R↔R | 6 dB | def… | | 3 | NodeX | NodeY | 198 mi | R↔R | 4 dB | ghi… | ### After | # | From | To | Distance | Type | Obs | Best SNR | Median SNR | Packet | |---|------|----|----------|------|-----|----------|------------|--------| | 1 | NodeX | NodeY | 200 mi | R↔R | 12 | 8.0 dB | 5.2 dB | abc… | | 2 | NodeA | NodeB | 150 mi | C↔R | 3 | 6.5 dB | 6.5 dB | jkl… | ## Changes - **`cmd/server/store.go`**: Group `filteredHops` by unordered pair key, accumulate obs count / best SNR / median SNR per group, sort by max distance, take top 20 - **`cmd/server/types.go`**: Update `DistanceHop` struct — replace `SNR` with `BestSnr`, `MedianSnr`, add `ObsCount` - **`public/analytics.js`**: Replace single SNR column with Obs, Best SNR, Median SNR; add row tooltip with best observation timestamp - **`cmd/server/store_tophops_test.go`**: 3 unit tests — basic dedupe, reverse-pair merge, nil SNR edge case ## Test Coverage - `TestDedupeTopHopsByPair`: 5 records on pair (A,B) + 1 on (C,D) → 2 results, correct obsCount/dist/bestSnr/medianSnr - `TestDedupeTopHopsReversePairMerges`: (B,A) and (A,B) merge into one entry - `TestDedupeTopHopsNilSNR`: all-nil SNR records → bestSnr and medianSnr both nil - Existing `TestAnalyticsRFEndpoint` and `TestAnalyticsRFWithRegion` still pass Closes #847 --------- Co-authored-by: you <you@example.com> |
||
|
|
3f26dc7190 |
obs: surface real RSS alongside tracked store bytes in /api/stats (#832) (#835)
Closes #832. ## Root cause confirmed \`trackedMB\` (\`s.trackedBytes\` in \`store.go\`) only sums per-packet struct + payload sizes recorded at insertion. It excludes the index maps (\`byHash\`, \`byTxID\`, \`byNode\`, \`byObserver\`, \`byPathHop\`, \`byPayloadType\`, hash-prefix maps, name lookups), the analytics LRUs (rfCache/topoCache/hashCache/distCache/subpathCache/chanCache/collisionCache), WS broadcast queues, and Go runtime overhead. It's \"useful packet bytes,\" not RSS — typically 3–5× off on staging. ## Fix (Option C from the issue) Expose four memory fields on \`/api/stats\` from a single cached snapshot: | Field | Source | Semantics | |---|---|---| | \`storeDataMB\` | \`s.trackedBytes\` | in-store packet bytes; eviction watermark input | | \`goHeapInuseMB\` | \`runtime.MemStats.HeapInuse\` | live Go heap | | \`goSysMB\` | \`runtime.MemStats.Sys\` | total Go-managed memory | | \`processRSSMB\` | \`/proc/self/status VmRSS\` (Linux), falls back to \`goSysMB\` | what the kernel sees | \`trackedMB\` is retained as a deprecated alias for \`storeDataMB\` so existing dashboards/QA scripts keep working. Field invariants are documented on \`MemorySnapshot\`: \`processRSSMB ≥ goSysMB ≥ goHeapInuseMB ≥ storeDataMB\` (typical). ## Performance Single \`getMemorySnapshot\` call cached for 1s — \`runtime.ReadMemStats\` (stop-the-world) and the \`/proc/self/status\` read are amortized across burst polling. \`/proc\` read is bounded to 8 KiB, parsed with \`strconv\` only — no shell-out, no untrusted input. \`cgoBytesMB\` is omitted: the build uses pure-Go \`modernc.org/sqlite\`, so there is no cgo allocator to measure. Documented in code comment. ## Tests \`cmd/server/stats_memory_test.go\` asserts presence, types, sign, and ordering invariants. Avoids the flaky \"matches RSS to ±X%\" pattern. \`\`\` $ go test ./... -count=1 -timeout 180s ok github.com/corescope/server 19.410s \`\`\` ## QA plan §1.4 now compares \`processRSSMB\` against procfs RSS (the right invariant); threshold stays at 0.20. --------- Co-authored-by: MeshCore Agent <meshcore-agent@openclaw.local> |
||
|
|
886aabf0ae |
fix(#827): /api/packets/{hash} falls back to DB when in-memory store misses (#831)
Closes #827. ## Problem `/api/packets/{hash}` only consulted the in-memory `PacketStore`. When a packet aged out of memory, the handler 404'd — even though SQLite still had it and `/api/nodes/{pubkey}` `recentAdverts` (which reads from the DB) was actively surfacing the hash. Net effect: the **Analyze →** link on older adverts in the node detail page led to a dead "Not found". Two-store inconsistency: DB has the packet, in-memory doesn't, node detail surfaces it from DB → packet detail can't serve it. ## Fix In `handlePacketDetail`: - After in-memory miss, fall back to `db.GetPacketByHash` (already existed) for hash lookups, and `db.GetTransmissionByID` for numeric IDs. - Track when the result came from the DB; if so and the store has no observations, populate from DB via a new `db.GetObservationsForHash` so the response shows real observations instead of the misleading `observation_count = 1` fallback. ## Tests - `TestPacketDetailFallsBackToDBWhenStoreMisses` — insert a packet directly into the DB after `store.Load()`, confirm store doesn't have it, assert 200 + populated observations. - `TestPacketDetail404WhenAbsentFromBoth` — neither store nor DB → 404 (no false positives). - `TestPacketDetailPrefersStoreOverDB` — both have it; store result wins (no double-fetch). - `TestHandlePacketDetailNoStore` updated: it previously asserted the old buggy 404 behavior; now asserts the correct DB-fallback 200. All `go test ./... -run "PacketDetail|Packet|GetPacket"` and the full `cmd/server` suite pass. ## Out of scope The `/api/packets?hash=` filter is the live in-memory list endpoint and intentionally store-only for performance. Not touched here — happy to file a follow-up if you'd rather harmonise. ## Repro context Verified against prod with a recently-adverting repeater whose recent advert hash lives in `recentAdverts` (DB) but had been evicted from the in-memory store; pre-fix 404, post-fix 200 with full observations. Co-authored-by: you <you@example.com> |
||
|
|
a0fddb50aa |
fix(#789): severity from recent samples; Theil-Sen drift with outlier rejection (#828)
Closes #789. ## The two bugs 1. **Severity from stale median.** `classifySkew(absMedian)` used the all-time `MedianSkewSec` over every advert ever recorded for the node. A repeater that was off for hours and then GPS-corrected stayed pinned to `absurd` because hundreds of historical bad samples poisoned the median. Reporter's case: `medianSkewSec: -59,063,561.8` while `lastSkewSec: -0.8` — current health was perfect, dashboard said catastrophic. 2. **Drift from a single correction jump.** Drift used OLS over every `(ts, skew)` pair, with no outlier rejection. A single GPS-correction event (skew jumps millions of seconds in ~30s) dominated the regression and produced `+1,793,549.9 s/day` — physically nonsense; the existing `maxReasonableDriftPerDay` cap then zeroed it (better than absurd, but still useless). ## The two fixes 1. **Recent-window severity.** New field `recentMedianSkewSec` = median over the last `N=5` samples or last `1h`, whichever is narrower (more current view). Severity now derives from `abs(recentMedianSkewSec)`. `MeanSkewSec`, `MedianSkewSec`, `LastSkewSec` are preserved unchanged so the frontend, fleet view, and any external consumers continue to work. 2. **Theil-Sen drift with outlier filter.** Drift now uses the Theil-Sen estimator (median of all pairwise slopes — textbook robust regression, ~29% breakdown point) on a series pre-filtered to drop samples whose skew jumps more than `maxPlausibleSkewJumpSec = 60s` from the previous accepted point. Real µC drift is fractions of a second per advert; clock corrections fall well outside. Capped at `theilSenMaxPoints = 200` (most-recent) so O(n²) stays bounded for chatty nodes. ## What stays the same - Epoch-0 / out-of-range advert filter (PR #769). - `minDriftSamples = 5` floor. - `maxReasonableDriftPerDay = 86400` hard backstop. - API shape: only additions (`recentMedianSkewSec`); no fields removed or renamed. ## Tests All in `cmd/server/clock_skew_test.go`: - `TestSeverityUsesRecentNotMedian` — 100 bad samples (-60s) + 5 good (-1s) → severity = `ok`, historical median still huge. - `TestDriftRejectsCorrectionJump` — 30 min of clean linear drift + one 1000s jump → drift small (~12 s/day). - `TestTheilSenMatchesOLSWhenClean` — clean linear data, Theil-Sen within ~1% of OLS. - `TestReporterScenario_789` — exact reproducer: 1662 samples, 1657 @ -683 days then 5 @ -1s → severity `ok`, `recentMedianSkewSec ≈ 0`, drift bounded; legacy `medianSkewSec` preserved as historical context. `go test ./... -count=1` (cmd/server) and `node test-frontend-helpers.js` both pass. --------- Co-authored-by: clawbot <bot@corescope.local> Co-authored-by: you <you@example.com> |
||
|
|
cad1f11073 |
fix: bypass IATA filter for status messages, fill SNR on duplicate obs (#694) (#802)
## Problems Two independent ingestor bugs identified in #694: ### 1. IATA filter drops status messages from out-of-region observers The IATA filter ran at the top of `handleMessage()` before any message-type discrimination. Status messages carrying observer metadata (`noise_floor`, battery, airtime) from observers outside the configured IATA regions were silently discarded before `UpsertObserver()` and `InsertMetrics()` ran. **Impact:** Observers running `meshcoretomqtt/1.0.8.0` in BFL and LAX — the only client versions that include `noise_floor` in status messages — had their health data dropped entirely on prod instances filtering to SJC. **Fix:** Moved the IATA filter to the packet path only (after the `parts[3] == "status"` branch). Status messages now always populate observer health data regardless of configured region filter. ### 2. `INSERT OR IGNORE` discards SNR/RSSI on late arrival When the same `(transmission_id, observer_idx, path_json)` observation arrived twice — first without RF fields, then with — `INSERT OR IGNORE` silently discarded the SNR/RSSI from the second arrival. **Fix:** Changed to `ON CONFLICT(...) DO UPDATE SET snr = COALESCE(excluded.snr, snr), rssi = ..., score = ...`. A later arrival with SNR fills in a `NULL`; a later arrival without SNR does not overwrite an existing value. ## Tests - `TestIATAFilterDoesNotDropStatusMessages` — verifies BFL status message is processed when IATA filter includes only SJC, and that BFL packet is still filtered - `TestInsertObservationSNRFillIn` — verifies SNR fills in on second arrival, and is not overwritten by a subsequent null arrival ## Related Partially addresses #694 (upstream client issue of missing SNR in packet messages is out of scope) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
7f024b7aa7 |
fix(#673): replace raw JSON text search with byNode index for node packet queries (#803)
## Summary Fixes #673 - GRP_TXT packets whose message text contains a node's pubkey were incorrectly counted as packets for that node, inflating packet counts and type breakdowns - Two code paths in `store.go` used `strings.Contains` on the full `DecodedJSON` blob — this matched pubkeys appearing anywhere in the JSON, including inside chat message text - `filterPackets` slow path (combined node + other filters): replaced substring search with a hash-set membership check against `byNode[nodePK]` - `GetNodeAnalytics`: removed the full-packet-scan + text search branch entirely; always uses the `byNode` index (which already covers `pubKey`/`destPubKey`/`srcPubKey` via structured field indexing) ## Test Plan - [x] `TestGetNodeAnalytics_ExcludesGRPTXTWithPubkeyInText` — verifies a GRP_TXT packet with the node's pubkey in its text field is not counted in that node's analytics - [x] `TestFilterPackets_NodeQueryDoesNotMatchChatText` — verifies the combined-filter slow path of `filterPackets` returns only the indexed ADVERT, not the chat packet Both tests were written as failing tests against the buggy code and pass after the fix. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
2460e33f94 |
fix(#810): /health.recentPackets resolved_path falls back to longest sibling obs (#821)
## What + why
`fetchResolvedPathForTxBest` (used by every API path that fills the
top-level `resolved_path`, including
`/api/nodes/{pk}/health.recentPackets`) picked the observation with the
longest `path_json` and queried SQL for that single obs ID. When the
longest-path obs had `resolved_path` NULL but a shorter sibling had one,
the helper returned nil and the top-level field was dropped — even
though the data exists. QA #809 §2.1 caught it on the health endpoint
because that page surfaces it per-tx.
Fix: keep the LRU-friendly fast path (try the longest-path obs), then
fall back to scanning all observations of the tx and picking the longest
`path_json` that actually has a stored `resolved_path`.
## Changes
- `cmd/server/resolved_index.go`: extend `fetchResolvedPathForTxBest`
with a fallback through `fetchResolvedPathsForTx`.
- `cmd/server/issue810_repro_test.go`: regression test — seeds a tx
whose longest-path obs lacks `resolved_path` and a shorter sibling has
it, then asserts `/api/packets` and
`/api/nodes/{pk}/health.recentPackets` agree.
## Tests
`go test ./... -count=1` from `cmd/server` — PASS (full suite, ~19s).
## Perf
Fast path unchanged (single LRU/SQL lookup, dominant case). Fallback
only runs when the longest-path obs has NULL `resolved_path` — one
indexed query per affected tx, bounded by observations-per-tx (small).
Closes #810
---------
Co-authored-by: you <you@example.com>
|
||
|
|
d7fe24e2db |
Fix channel filter on Packets page (UI + API) — #812 (#816)
Closes #812 ## Root causes **Server (`/api/packets?channel=…` returned identical totals):** The handler in `cmd/server/routes.go` never read the `channel` query parameter into `PacketQuery`, so it was silently ignored by both the SQLite path (`db.go::buildTransmissionWhere`) and the in-memory path (`store.go::filterPackets`). The codebase already had everything else in place — the `channel_hash` column with an index from #762, decoded `channel` / `channelHashHex` fields on each packet — it just wasn't wired up. **UI (`/#/packets` had no channel filter):** `public/packets.js` rendered observer / type / time-window / region filters but no channel control, and didn't read `?channel=` from the URL. ## Fix ### Server - New `Channel` field on `PacketQuery`; `handlePackets` reads `r.URL.Query().Get("channel")`. - DB path filters by the indexed `channel_hash` column (exact match). - In-memory path: helper `packetMatchesChannel` matches `decoded.channel` (plaintext, e.g. `#test`, `public`) or `enc_<HEX>` against `channelHashHex` for undecryptable GRP_TXT. Uses cached `ParsedDecoded()` so it's O(1) after first parse. Fast-path index guards and the grouped-cache key updated to include channel. - Regression test (`channel_filter_test.go`): `channel=#test` returns ≥1 GRP_TXT packet and fewer than baseline; `channel=nonexistentchannel` returns `total=0`. ### UI - New `<select id="fChannel">` populated from `/api/channels`. - Round-trips via `?channel=…` on the URL hash (read on init, written on change). - Pre-seeds the current value as an option so encrypted hashes not in `/api/channels` still display as selected on reload. - On change, calls `loadPackets()` so the server-side filter applies before pagination. ## Perf Filter adds at most one cached map lookup per packet (DB path uses indexed column, store path uses `ParsedDecoded()` cache). Staging baseline 149–190 ms for `?channel=#test&limit=50`; the new comparison is negligible. Target ≤ 500 ms preserved. ## Tests `cd cmd/server && go test ./... -count=1 -timeout 120s` → PASS. --------- Co-authored-by: you <you@example.com> |
||
|
|
9e90548637 |
perf(#800): remove per-StoreTx ResolvedPath, replace with membership index + on-demand decode (#806)
## Summary Remove `ResolvedPath []*string` field from `StoreTx` and `StoreObs` structs, replacing it with a compact membership index + on-demand SQL decode. This eliminates the dominant heap cost identified in profiling (#791, #799). **Spec:** #800 (consolidated from two rounds of expert + implementer review on #799) Closes #800 Closes #791 ## Design ### Removed - `StoreTx.ResolvedPath []*string` - `StoreObs.ResolvedPath []*string` - `TransmissionResp.ResolvedPath`, `ObservationResp.ResolvedPath` struct fields ### Added | Structure | Purpose | Est. cost at 1M obs | |---|---|---:| | `resolvedPubkeyIndex map[uint64][]int` | FNV-1a(pubkey) → []txID forward index | 50–120 MB | | `resolvedPubkeyReverse map[int][]uint64` | txID → []hashes for clean removal | ~40 MB | | `apiResolvedPathLRU` (10K entries) | FIFO cache for on-demand API decode | ~2 MB | ### Decode-window discipline `resolved_path` JSON decoded once per packet. Consumers fed in order, temp slice dropped — never stored on struct: 1. `addToByNode` — relay node indexing 2. `touchRelayLastSeen` — relay liveness DB updates 3. `byPathHop` resolved-key entries 4. `resolvedPubkeyIndex` + reverse insert 5. WebSocket broadcast map (raw JSON bytes) 6. Persist batch (raw JSON bytes for SQL UPDATE) ### Collision safety When the forward index returns candidates, a batched SQL query confirms exact pubkey presence using `LIKE '%"pubkey"%'` on the `resolved_path` column. ### Feature flag `useResolvedPathIndex` (default `true`). Off-path is conservative: all candidates kept, index not consulted. For one-release rollback safety. ## Files changed | File | Changes | |---|---| | `resolved_index.go` | **New** — index structures, LRU cache, on-demand SQL helpers, collision safety | | `store.go` | Remove RP fields, decode-window discipline in Load/Ingest, on-demand txToMap/obsToMap/enrichObs, eviction cleanup via SQL, memory accounting update | | `types.go` | Remove RP fields from TransmissionResp/ObservationResp | | `routes.go` | Replace `nodeInResolvedPath` with `nodeInResolvedPathViaIndex`, remove RP from mapSlice helpers | | `neighbor_persist.go` | Refactor backfill: reverse-map removal → forward+reverse insert → LRU invalidation | ## Tests added (27 new) **Unit:** - `TestStoreTx_ResolvedPathFieldAbsent` — reflection guard - `TestResolvedPubkeyIndex_BuildFromLoad` — forward+reverse consistency - `TestResolvedPubkeyIndex_HashCollision` — SQL collision safety - `TestResolvedPubkeyIndex_IngestUpdate` — maps reflect new ingests - `TestResolvedPubkeyIndex_RemoveOnEvict` — clean removal via reverse map - `TestResolvedPubkeyIndex_PerObsCoverage` — non-best obs pubkeys indexed - `TestAddToByNode_WithoutResolvedPathField` - `TestTouchRelayLastSeen_WithoutResolvedPathField` - `TestWebSocketBroadcast_IncludesResolvedPath` - `TestBackfill_InvalidatesLRU` - `TestEviction_ByNodeCleanup_OnDemandSQL` - `TestExtractResolvedPubkeys`, `TestMergeResolvedPubkeys` - `TestResolvedPubkeyHash_Deterministic` - `TestLRU_EvictionOnFull` **Endpoint:** - `TestPathsThroughNode_NilResolvedPathFallback` - `TestPacketsAPI_OnDemandResolvedPath` - `TestPacketsAPI_OnDemandResolvedPath_LRUHit` - `TestPacketsAPI_OnDemandResolvedPath_Empty` **Feature flag:** - `TestFeatureFlag_OffPath_PreservesOldBehavior` - `TestFeatureFlag_Toggle_NoStateLeak` **Concurrency:** - `TestReverseMap_NoLeakOnPartialFailure` - `TestDecodeWindow_LockHoldTimeBounded` - `TestLivePolling_LRUUnderConcurrentIngest` **Regression:** - `TestRepeaterLiveness_StillAccurate` **Benchmarks:** - `BenchmarkLoad_BeforeAfter` - `BenchmarkResolvedPubkeyIndex_Memory` - `BenchmarkPathsThroughNode_Latency` - `BenchmarkLivePolling_UnderIngest` ## Benchmark results ``` BenchmarkResolvedPubkeyIndex_Memory/pubkeys=50K 429ms 103MB 777K allocs BenchmarkResolvedPubkeyIndex_Memory/pubkeys=500K 4205ms 896MB 7.67M allocs BenchmarkLoad_BeforeAfter 65ms 20MB 202K allocs BenchmarkPathsThroughNode_Latency 3.9µs 0B 0 allocs BenchmarkLivePolling_UnderIngest 5.4µs 545B 7 allocs ``` Key: per-obs `[]*string` overhead completely eliminated. At 1M obs with 3 hops average, this saves ~72 bytes/obs × 1M = ~68 MB just from the slice headers + pointers, plus the JSON-decoded string data (~900 MB at scale per profiling). ## Design choices - **FNV-1a instead of xxhash**: stdlib availability, no external dependency. Performance is equivalent for this use case (pubkey strings are short). - **FIFO LRU instead of true LRU**: simpler implementation, adequate for the access pattern (mostly sequential obs IDs from live polling). - **Grouped packets view omits resolved_path**: cold path, not worth SQL round-trip per page render. - **Backfill pending check uses reverse-map presence** instead of per-obs field: if a tx has any indexed pubkeys, its observations are considered resolved. Closes #807 --------- Co-authored-by: you <you@example.com> |
||
|
|
a8e1cea683 |
fix: use payload type bits only in content hash (not full header byte) (#787)
## Problem The firmware computes packet content hash as: ``` SHA256(payload_type_byte + [path_len for TRACE] + payload) ``` Where `payload_type_byte = (header >> 2) & 0x0F` — just the payload type bits (2-5). CoreScope was using the **full header byte** in its hash computation, which includes route type bits (0-1) and version bits (6-7). This meant the same logical packet produced different content hashes depending on route type — breaking dedup and packet lookup. **Firmware reference:** `Packet.cpp::calculatePacketHash()` uses `getPayloadType()` which returns `(header >> PH_TYPE_SHIFT) & PH_TYPE_MASK`. ## Fix - Extract only payload type bits: `payloadType := (headerByte >> 2) & 0x0F` - Include `path_len` byte in hash for TRACE packets (matching firmware behavior) - Applied to both `cmd/server/decoder.go` and `cmd/ingestor/decoder.go` ## Tests Added - **Route type independence:** Same payload with FLOOD vs DIRECT route types produces identical hash - **TRACE path_len inclusion:** TRACE packets with different `path_len` produce different hashes - **Firmware compatibility:** Hash output matches manual computation of firmware algorithm ## Migration Impact Existing packets in the DB have content hashes computed with the old (incorrect) formula. Options: 1. **Recompute hashes** via migration (recommended for clean state) 2. **Dual lookup** — check both old and new hash on queries (backward compat) 3. **Accept the break** — old hashes become stale, new packets get correct hashes Recommend option 1 (migration) as a follow-up. The volume of affected packets depends on how many distinct route types were seen for the same logical packet. Fixes #786 --------- Co-authored-by: you <you@example.com> |
||
|
|
bf674ebfa2 |
feat: validate advert signatures on ingest, reject corrupt packets (#794)
## Summary
Validates ed25519 signatures on ADVERT packets during MQTT ingest.
Packets with invalid signatures are rejected before storage, preventing
corrupt/truncated adverts from polluting the database.
## Changes
### Ingestor (`cmd/ingestor/`)
- **Signature validation on ingest**: After decoding an ADVERT, checks
`SignatureValid` from the decoder. Invalid signatures → packet dropped,
never stored.
- **Config flag**: `validateSignatures` (default `true`). Set to `false`
to disable validation for backward compatibility with existing installs.
- **`dropped_packets` table**: New SQLite table recording every rejected
packet with full attribution:
- `hash`, `raw_hex`, `reason`, `observer_id`, `observer_name`,
`node_pubkey`, `node_name`, `dropped_at`
- Indexed on `observer_id` and `node_pubkey` for investigation queries
- **`SignatureDrops` counter**: New atomic counter in `DBStats`, logged
in periodic stats output as `sig_drops=N`
- **Retention**: `dropped_packets` pruned alongside metrics on the same
`retention.metricsDays` schedule
### Server (`cmd/server/`)
- **`GET /api/dropped-packets`** (API key required): Returns recent
drops with optional `?observer=` and `?pubkey=` filters, `?limit=`
(default 100, max 500)
- **`signatureDrops`** field added to `/api/stats` response (count from
`dropped_packets` table)
### Tests (8 new)
| Test | What it verifies |
|------|-----------------|
| `TestSigValidation_ValidAdvertStored` | Valid advert passes validation
and is stored |
| `TestSigValidation_TamperedSignatureDropped` | Tampered signature →
dropped, recorded in `dropped_packets` with correct fields |
| `TestSigValidation_TruncatedAppdataDropped` | Truncated appdata
invalidates signature → dropped |
| `TestSigValidation_DisabledByConfig` | `validateSignatures: false`
skips validation, stores tampered packet |
| `TestSigValidation_DropCounterIncrements` | Counter increments
correctly across multiple drops |
| `TestSigValidation_LogContainsFields` | `dropped_packets` row contains
hash, reason, observer, pubkey, name |
| `TestPruneDroppedPackets` | Old entries pruned, recent entries
retained |
| `TestShouldValidateSignatures_Default` | Config helper returns correct
defaults |
### Config example
```json
{
"validateSignatures": true
}
```
Fixes #793
---------
Co-authored-by: you <you@example.com>
|
||
|
|
d596becca3 |
feat: bounded cold load — limit Load() by memory budget (#790)
## Implements #748 M1 — Bounded Cold Load ### Problem `Load()` pulls the ENTIRE database into RAM before eviction runs. On a 1GB database, this means 3+ GB peak memory at startup, regardless of `maxMemoryMB`. This is the root cause of #743 (OOM on 2GB VMs). ### Solution Calculate the maximum number of transmissions that fit within the `maxMemoryMB` budget and use a SQL subquery LIMIT to load only the newest packets. **Two-phase approach** (avoids the JOIN-LIMIT row count problem): ```sql SELECT ... FROM transmissions t LEFT JOIN observations o ON ... WHERE t.id IN (SELECT id FROM transmissions ORDER BY first_seen DESC LIMIT ?) ORDER BY t.first_seen ASC, o.timestamp DESC ``` ### Changes - **`estimateStoreTxBytesTypical(numObs)`** — estimates memory cost of a typical transmission without needing an actual `StoreTx` instance. Used for budget calculation. - **Budget calculation in `Load()`** — `maxPackets = (maxMemoryMB * 1048576) / avgBytesPerPacket` with a floor of 1000 packets. - **Subquery LIMIT** — loads only the newest N transmissions when bounded. - **`oldestLoaded` tracking** — records the oldest packet timestamp in memory so future SQL fallback queries (M2+) know where in-memory data ends. - **Perf stats** — `oldestLoaded` exposed in `/api/perf/store-stats`. - **Logging** — bounded loads show `Loaded X/Y transmissions (limited by ZMB budget)`. ### When `maxMemoryMB=0` (unlimited) Behavior is completely unchanged — no LIMIT clause, all packets loaded. ### Tests (6 new) | Test | Validates | |------|-----------| | `TestBoundedLoad_LimitedMemory` | With 1MB budget, loads fewer than total (hits 1000 minimum) | | `TestBoundedLoad_NewestFirst` | Loaded packets are the newest, not oldest | | `TestBoundedLoad_OldestLoadedSet` | `oldestLoaded` matches first packet's `FirstSeen` | | `TestBoundedLoad_UnlimitedWithZero` | `maxMemoryMB=0` loads all packets | | `TestBoundedLoad_AscendingOrder` | Packets remain in ascending `first_seen` order after bounded load | | `TestEstimateStoreTxBytesTypical` | Estimate grows with observation count, exceeds floor | Plus benchmarks: `BenchmarkLoad_Bounded` vs `BenchmarkLoad_Unlimited`. ### Perf justification On a 5000-transmission test DB with 1MB budget: - Bounded: loads 1000 packets (the minimum) in ~1.3s - The subquery uses SQLite's index on `first_seen` — O(N log N) for the LIMIT, then indexed JOIN for observations - No full table scan needed when bounded ### Next milestones - **M2**: Packet list/search SQL fallback (uses `oldestLoaded` boundary) - **M3**: Node analytics SQL fallback - **M4-M5**: Remaining endpoint fallbacks + live-only memory store --------- Co-authored-by: you <you@example.com> |
||
|
|
b9ba447046 |
feat: add nodeBlacklist config to hide abusive/troll nodes (#742)
## Problem
Some mesh participants set offensive names, report deliberately false
GPS positions, or otherwise troll the network. Instance operators
currently have no way to hide these nodes from public-facing APIs
without deleting the underlying data.
## Solution
Add a `nodeBlacklist` array to `config.json` containing public keys of
nodes to exclude from all API responses.
### Blacklisted nodes are filtered from:
- `GET /api/nodes` — list endpoint
- `GET /api/nodes/search` — search results
- `GET /api/nodes/{pubkey}` — detail (returns 404)
- `GET /api/nodes/{pubkey}/health` — returns 404
- `GET /api/nodes/{pubkey}/paths` — returns 404
- `GET /api/nodes/{pubkey}/analytics` — returns 404
- `GET /api/nodes/{pubkey}/neighbors` — returns 404
- `GET /api/nodes/bulk-health` — filtered from results
### Config example
```json
{
"nodeBlacklist": [
"aabbccdd...",
"11223344..."
]
}
```
### Design decisions
- **Case-insensitive** — public keys normalized to lowercase
- **Whitespace trimming** — leading/trailing whitespace handled
- **Empty entries ignored** — `""` or `" "` do not cause false positives
- **Nil-safe** — `IsBlacklisted()` on nil Config returns false
- **Backward-compatible** — empty/missing `nodeBlacklist` has zero
effect
- **Lazy-cached set** — blacklist converted to `map[string]bool` on
first lookup
### What this does NOT do (intentionally)
- Does **not** delete or modify database data — only filters API
responses
- Does **not** block packet ingestion — data still flows for analytics
- Does **not** filter `/api/packets` — only node-facing endpoints are
affected
## Testing
- Unit tests for `Config.IsBlacklisted()` (case sensitivity, whitespace,
empty entries, nil config)
- Integration tests for `/api/nodes`, `/api/nodes/{pubkey}`,
`/api/nodes/search`
- Full test suite passes with no regressions
|
||
|
|
fa3f623bd6 |
feat: add observer retention — remove stale observers after configurable days (#764)
## Summary
Observers that stop actively sending data now get removed after a
configurable retention period (default 14 days).
Previously, observers remained in the `observers` table forever. This
meant nodes that were once observers for an instance but are no longer
connected (even if still active in the mesh elsewhere) would continue
appearing in the observer list indefinitely.
## Key Design Decisions
- **Active data requirement**: `last_seen` is only updated when the
observer itself sends packets (via `stmtUpdateObserverLastSeen`). Being
seen by another node does NOT update this field. So an observer must
actively send data to stay listed.
- **Default: 14 days** — observers not seen in 14 days are removed
- **`-1` = keep forever** — for users who want observers to never be
removed
- **`0` = use default (14 days)** — same as not setting the field
- **Runs on startup + daily ticker** — staggered 3 minutes after metrics
prune to avoid DB contention
## Changes
| File | Change |
|------|--------|
| `cmd/ingestor/config.go` | Add `ObserverDays` to `RetentionConfig`,
add `ObserverDaysOrDefault()` |
| `cmd/ingestor/db.go` | Add `RemoveStaleObservers()` — deletes
observers with `last_seen` before cutoff |
| `cmd/ingestor/main.go` | Wire up startup + daily ticker for observer
retention |
| `cmd/server/config.go` | Add `ObserverDays` to `RetentionConfig`, add
`ObserverDaysOrDefault()` |
| `cmd/server/db.go` | Add `RemoveStaleObservers()` (server-side, uses
read-write connection) |
| `cmd/server/main.go` | Wire up startup + daily ticker, shutdown
cleanup |
| `cmd/server/routes.go` | Admin prune API now also removes stale
observers |
| `config.example.json` | Add `observerDays: 14` with documentation |
| `cmd/ingestor/coverage_boost_test.go` | 4 tests: basic removal, empty
store, keep forever (-1), default (0→14) |
| `cmd/server/config_test.go` | 4 tests: `ObserverDaysOrDefault` edge
cases |
## Config Example
```json
{
"retention": {
"nodeDays": 7,
"observerDays": 14,
"packetDays": 30,
"_comment": "observerDays: -1 = keep forever, 0 = use default (14)"
}
}
```
## Admin API
The `/api/admin/prune` endpoint now also removes stale observers (using
`observerDays` from config) and reports `observers_removed` in the
response alongside `packets_deleted`.
## Test Plan
- [x] `TestRemoveStaleObservers` — old observer removed, recent observer
kept
- [x] `TestRemoveStaleObserversNone` — empty store, no errors
- [x] `TestRemoveStaleObserversKeepForever` — `-1` keeps even year-old
observers
- [x] `TestRemoveStaleObserversDefault` — `0` defaults to 14 days
- [x] `TestObserverDaysOrDefault` (ingestor) —
nil/zero/positive/keep-forever
- [x] `TestObserverDaysOrDefault` (server) —
nil/zero/positive/keep-forever
- [x] Both binaries compile cleanly (`go build`)
- [ ] Manual: verify observer count decreases after retention period on
a live instance
|
||
|
|
ceea136e97 |
feat: observer graph representation (M1+M2) (#774)
## Summary Fixes #753 — Milestones M1 and M2: Observer nodes in the neighbor graph are now correctly labeled, colored, and filterable. ### M1: Label + color observers **Backend** (`cmd/server/neighbor_api.go`): - `buildNodeInfoMap()` now queries the `observers` table after building from `nodes` - Observer-only pubkeys (not already in the map as repeaters etc.) get `role: "observer"` and their name from the observers table - Observer-repeaters keep their repeater role (not overwritten) **Frontend**: - CSS variable `--role-observer: #8b5cf6` added to `:root` - `ROLE_COLORS.observer` was already defined in `roles.js` ### M2: Observer filter checkbox (default unchecked) **Frontend** (`public/analytics.js`): - Observer checkbox added to the role filter section, **unchecked by default** - Observers create hub-and-spoke patterns (one observer can have 100+ edges) that drown out the actual repeater topology — hiding them by default keeps the graph clean - Fixed `applyNGFilters()` which previously always showed observers regardless of checkbox state ### Tests - Backend: `TestBuildNodeInfoMap_ObserverEnrichment` — verifies observer-only pubkeys get name+role from observers table, and observer-repeaters keep their repeater role - All existing Go tests pass - All frontend helper tests pass (544/544) --------- Co-authored-by: you <you@example.com> |
||
|
|
ba7cd0fba7 |
fix: clock skew sanity checks — filter epoch-0, cap drift, min samples (#769)
Nodes with dead RTCs show -690d skew and -3 billion s/day drift. Fix: 1. **No Clock severity**: |skew| > 365d → `no_clock`, skip drift 2. **Drift cap**: |drift| > 86400 s/day → nil (physically impossible) 3. **Min samples**: < 5 samples → no drift regression 4. **Frontend**: 'No Clock' badge, '–' for unreliable drift Fixes the crazy stats on the Clock Health fleet view. --------- Co-authored-by: you <you@example.com> |
||
|
|
6a648dea11 |
fix: multi-byte adopters — all node types, role column, advert precedence (#754) (#767)
## Fix: Multi-Byte Adopters Table — Three Bugs (#754) ### Bug 1: Companions in "Unknown" `computeMultiByteCapability()` was repeater-only. Extended to classify **all node types** (companions, rooms, sensors). A companion advertising with 2-byte hash is now correctly "Confirmed". ### Bug 2: No Role Column Added a **Role** column to the merged Multi-Byte Hash Adopters table, color-coded using `ROLE_COLORS` from `roles.js`. Users can now distinguish repeaters from companions without clicking through to node detail. ### Bug 3: Data Source Disagreement When adopter data (from `computeAnalyticsHashSizes`) shows `hashSize >= 2` but capability only found path evidence ("Suspected"), the advert-based adopter data now takes precedence → "Confirmed". The adopter hash sizes are passed into `computeMultiByteCapability()` as an additional confirmed evidence source. ### Changes - `cmd/server/store.go`: Extended capability to all node types, accept adopter hash sizes, prioritize advert evidence - `public/analytics.js`: Added Role column with color-coded badges - `cmd/server/multibyte_capability_test.go`: 3 new tests (companion confirmed, role populated, adopter precedence) ### Tests - All 10 multi-byte capability tests pass - All 544 frontend helper tests pass - All 62 packet filter tests pass - All 29 aging tests pass --------- Co-authored-by: you <you@example.com> |
||
|
|
29157742eb |
feat: show collision details in Hash Usage Matrix for all hash sizes (#758)
## Summary Shows which prefixes are colliding in the Hash Usage Matrix, making the "PREFIX COLLISIONS: N" count actionable. Fixes #757 ## Changes ### Frontend (`public/analytics.js`) - **Clickable collision count**: When collisions > 0, the stat card is clickable and scrolls to the collision details section. Shows a `▼` indicator. - **3-byte collision table**: The collision risk section and `renderCollisionsFromServer` now render for all hash sizes including 3-byte (was previously hidden/skipped for 3-byte). - **Helpful hint**: 3-byte panel now says "See collision details below" when collisions exist. ### Backend (`cmd/server/collision_details_test.go`) - Test that collision details include correct prefix and node name/pubkey pairs - Test that collision details are empty when no collisions exist ### Frontend Tests (`test-frontend-helpers.js`) - Test clickable stat card renders `onclick` and `cursor:pointer` when collisions > 0 - Test non-clickable card when collisions = 0 - Test collision table renders correct node links (`#/nodes/{pubkey}`) - Test no-collision message renders correctly ## What was already there The backend already returned full collision details (prefix, nodes with pubkeys/names/coords, distance classification) in the `hash-collisions` API. The frontend already had `renderCollisionsFromServer` rendering a rich table with node links. The gap was: 1. The 3-byte tab hid the collision risk section entirely 2. No visual affordance to navigate from the stat count to the details ## Perf justification No new computation — collision data was already computed and returned by the API. The only change is rendering it for 3-byte (same as 1-byte/2-byte). The collision list is already limited by the backend sort+slice pattern. --------- Co-authored-by: you <you@example.com> |
||
|
|
0e286d85fd |
fix: channel query performance — add channel_hash column, SQL-level filtering (#762) (#763)
## Problem Channel API endpoints scan entire DB — 2.4s for channel list, 30s for messages. ## Fix - Added `channel_hash` column to transmissions (populated on ingest, backfilled on startup) - `GetChannels()` rewrites to GROUP BY channel_hash (one row per channel vs scanning every packet) - `GetChannelMessages()` filters by channel_hash at SQL level with proper LIMIT/OFFSET - 60s cache for channel list - Index: `idx_tx_channel_hash` for fast lookups Expected: 2.4s → <100ms for list, 30s → <500ms for messages. Fixes #762 --------- Co-authored-by: you <you@example.com> |
||
|
|
3bdf72b4cf |
feat: clock skew UI — node badges, detail sparkline, fleet analytics (#690 M2+M3) (#752)
## Summary Frontend visualizations for clock skew detection. Implements #690 M2 and M3. Does NOT close #690 — M4+M5 remain. ### M2: Node badges + detail sparkline - Severity badges (⏰ green/yellow/orange/red) on node list next to each node - Node detail: Clock Skew section with current value, severity, drift rate - Inline SVG sparkline showing skew history, color-coded by severity zones ### M3: Fleet analytics view - 'Clock Health' section on Analytics page - Sortable table: Name | Skew | Severity | Drift | Last Advert - Filter buttons by severity (OK/Warning/Critical/Absurd) - Summary stats: X nodes OK, Y warning, Z critical - Color-coded rows ### Changes - `public/nodes.js` — badge rendering + detail section - `public/analytics.js` — fleet clock health view - `public/roles.js` — severity color helpers - `public/style.css` — badge + sparkline + fleet table styles - `cmd/server/clock_skew.go` — added fleet summary endpoint - `cmd/server/routes.go` — wired fleet endpoint - `test-frontend-helpers.js` — 11 new tests --------- Co-authored-by: you <you@example.com> |