Commit Graph

201 Commits

Author SHA1 Message Date
Kpa-clawbot
ceea136e97 feat: observer graph representation (M1+M2) (#774)
## Summary

Fixes #753 — Milestones M1 and M2: Observer nodes in the neighbor graph
are now correctly labeled, colored, and filterable.

### M1: Label + color observers

**Backend** (`cmd/server/neighbor_api.go`):
- `buildNodeInfoMap()` now queries the `observers` table after building
from `nodes`
- Observer-only pubkeys (not already in the map as repeaters etc.) get
`role: "observer"` and their name from the observers table
- Observer-repeaters keep their repeater role (not overwritten)

**Frontend**:
- CSS variable `--role-observer: #8b5cf6` added to `:root`
- `ROLE_COLORS.observer` was already defined in `roles.js`

### M2: Observer filter checkbox (default unchecked)

**Frontend** (`public/analytics.js`):
- Observer checkbox added to the role filter section, **unchecked by
default**
- Observers create hub-and-spoke patterns (one observer can have 100+
edges) that drown out the actual repeater topology — hiding them by
default keeps the graph clean
- Fixed `applyNGFilters()` which previously always showed observers
regardless of checkbox state

### Tests

- Backend: `TestBuildNodeInfoMap_ObserverEnrichment` — verifies
observer-only pubkeys get name+role from observers table, and
observer-repeaters keep their repeater role
- All existing Go tests pass
- All frontend helper tests pass (544/544)

---------

Co-authored-by: you <you@example.com>
2026-04-16 21:35:14 -07:00
Kpa-clawbot
ba7cd0fba7 fix: clock skew sanity checks — filter epoch-0, cap drift, min samples (#769)
Nodes with dead RTCs show -690d skew and -3 billion s/day drift. Fix:

1. **No Clock severity**: |skew| > 365d → `no_clock`, skip drift
2. **Drift cap**: |drift| > 86400 s/day → nil (physically impossible)
3. **Min samples**: < 5 samples → no drift regression
4. **Frontend**: 'No Clock' badge, '–' for unreliable drift

Fixes the crazy stats on the Clock Health fleet view.

---------

Co-authored-by: you <you@example.com>
2026-04-16 08:10:47 -07:00
Kpa-clawbot
6a648dea11 fix: multi-byte adopters — all node types, role column, advert precedence (#754) (#767)
## Fix: Multi-Byte Adopters Table — Three Bugs (#754)

### Bug 1: Companions in "Unknown"
`computeMultiByteCapability()` was repeater-only. Extended to classify
**all node types** (companions, rooms, sensors). A companion advertising
with 2-byte hash is now correctly "Confirmed".

### Bug 2: No Role Column
Added a **Role** column to the merged Multi-Byte Hash Adopters table,
color-coded using `ROLE_COLORS` from `roles.js`. Users can now
distinguish repeaters from companions without clicking through to node
detail.

### Bug 3: Data Source Disagreement
When adopter data (from `computeAnalyticsHashSizes`) shows `hashSize >=
2` but capability only found path evidence ("Suspected"), the
advert-based adopter data now takes precedence → "Confirmed". The
adopter hash sizes are passed into `computeMultiByteCapability()` as an
additional confirmed evidence source.

### Changes
- `cmd/server/store.go`: Extended capability to all node types, accept
adopter hash sizes, prioritize advert evidence
- `public/analytics.js`: Added Role column with color-coded badges
- `cmd/server/multibyte_capability_test.go`: 3 new tests (companion
confirmed, role populated, adopter precedence)

### Tests
- All 10 multi-byte capability tests pass
- All 544 frontend helper tests pass
- All 62 packet filter tests pass
- All 29 aging tests pass

---------

Co-authored-by: you <you@example.com>
2026-04-16 00:51:38 -07:00
Kpa-clawbot
29157742eb feat: show collision details in Hash Usage Matrix for all hash sizes (#758)
## Summary

Shows which prefixes are colliding in the Hash Usage Matrix, making the
"PREFIX COLLISIONS: N" count actionable.

Fixes #757

## Changes

### Frontend (`public/analytics.js`)
- **Clickable collision count**: When collisions > 0, the stat card is
clickable and scrolls to the collision details section. Shows a `▼`
indicator.
- **3-byte collision table**: The collision risk section and
`renderCollisionsFromServer` now render for all hash sizes including
3-byte (was previously hidden/skipped for 3-byte).
- **Helpful hint**: 3-byte panel now says "See collision details below"
when collisions exist.

### Backend (`cmd/server/collision_details_test.go`)
- Test that collision details include correct prefix and node
name/pubkey pairs
- Test that collision details are empty when no collisions exist

### Frontend Tests (`test-frontend-helpers.js`)
- Test clickable stat card renders `onclick` and `cursor:pointer` when
collisions > 0
- Test non-clickable card when collisions = 0
- Test collision table renders correct node links (`#/nodes/{pubkey}`)
- Test no-collision message renders correctly

## What was already there

The backend already returned full collision details (prefix, nodes with
pubkeys/names/coords, distance classification) in the `hash-collisions`
API. The frontend already had `renderCollisionsFromServer` rendering a
rich table with node links. The gap was:
1. The 3-byte tab hid the collision risk section entirely
2. No visual affordance to navigate from the stat count to the details

## Perf justification

No new computation — collision data was already computed and returned by
the API. The only change is rendering it for 3-byte (same as
1-byte/2-byte). The collision list is already limited by the backend
sort+slice pattern.

---------

Co-authored-by: you <you@example.com>
2026-04-16 00:18:25 -07:00
Kpa-clawbot
0e286d85fd fix: channel query performance — add channel_hash column, SQL-level filtering (#762) (#763)
## Problem
Channel API endpoints scan entire DB — 2.4s for channel list, 30s for
messages.

## Fix
- Added `channel_hash` column to transmissions (populated on ingest,
backfilled on startup)
- `GetChannels()` rewrites to GROUP BY channel_hash (one row per channel
vs scanning every packet)
- `GetChannelMessages()` filters by channel_hash at SQL level with
proper LIMIT/OFFSET
- 60s cache for channel list
- Index: `idx_tx_channel_hash` for fast lookups

Expected: 2.4s → <100ms for list, 30s → <500ms for messages.

Fixes #762

---------

Co-authored-by: you <you@example.com>
2026-04-16 00:09:36 -07:00
Kpa-clawbot
3bdf72b4cf feat: clock skew UI — node badges, detail sparkline, fleet analytics (#690 M2+M3) (#752)
## Summary

Frontend visualizations for clock skew detection.

Implements #690 M2 and M3. Does NOT close #690 — M4+M5 remain.

### M2: Node badges + detail sparkline
- Severity badges ( green/yellow/orange/red) on node list next to each
node
- Node detail: Clock Skew section with current value, severity, drift
rate
- Inline SVG sparkline showing skew history, color-coded by severity
zones

### M3: Fleet analytics view
- 'Clock Health' section on Analytics page
- Sortable table: Name | Skew | Severity | Drift | Last Advert
- Filter buttons by severity (OK/Warning/Critical/Absurd)
- Summary stats: X nodes OK, Y warning, Z critical
- Color-coded rows

### Changes
- `public/nodes.js` — badge rendering + detail section
- `public/analytics.js` — fleet clock health view
- `public/roles.js` — severity color helpers
- `public/style.css` — badge + sparkline + fleet table styles
- `cmd/server/clock_skew.go` — added fleet summary endpoint
- `cmd/server/routes.go` — wired fleet endpoint
- `test-frontend-helpers.js` — 11 new tests

---------

Co-authored-by: you <you@example.com>
2026-04-15 15:25:50 -07:00
Kpa-clawbot
401fd070f8 fix: improve trackedBytes accuracy for memory estimation (#751)
## Problem

Fixes #743 — High memory usage / OOM with relatively small dataset.

`trackedBytes` severely undercounted actual per-packet memory because it
only tracked base struct sizes and string field lengths, missing major
allocations:

| Structure | Untracked Cost | Scale Impact |
|-----------|---------------|--------------|
| `spTxIndex` (O(path²) subpath entries) | 40 bytes × path combos |
50-150MB |
| `ResolvedPath` on observations | 24 bytes × elements | ~25MB |
| Per-tx maps (`obsKeys`, `observerSet`) | 200 bytes/tx flat | ~11MB |
| `byPathHop` index entries | 50 bytes/hop | 20-40MB |

This caused eviction to trigger too late (or not at all), leading to
OOM.

## Fix

Expanded `estimateStoreTxBytes` and `estimateStoreObsBytes` to account
for:

- **Per-tx maps**: +200 bytes flat for `obsKeys` + `observerSet` map
headers
- **Path hop index**: +50 bytes per hop in `byPathHop`
- **Subpath index**: +40 bytes × `hops*(hops-1)/2` combinations for
`spTxIndex`
- **Resolved paths**: +24 bytes per `ResolvedPath` element on
observations

Updated the existing `TestEstimateStoreTxBytes` to match new formula.
All existing eviction tests continue to pass — the eviction logic itself
is unchanged.

Also exposed `avgBytesPerPacket` in the perf API (`/api/perf`) so
operators can monitor per-packet memory costs.

## Performance

Benchmark confirms negligible overhead (called on every insert):

```
BenchmarkEstimateStoreTxBytes    159M ops    7.5 ns/op    0 B/op    0 allocs
BenchmarkEstimateStoreObsBytes   1B ops      1.0 ns/op    0 B/op    0 allocs
```

## Tests

- 6 new tests in `tracked_bytes_test.go`:
  - Reasonable value ranges for different packet sizes
  - 10-hop packets estimate significantly more than 2-hop (subpath cost)
  - Observations with `ResolvedPath` estimate more than without
  - 15 observations estimate >10x a single observation
- `trackedBytes` matches sum of individual estimates after batch insert
  - Eviction triggers correctly with improved estimates
- 2 benchmarks confirming sub-10ns estimate cost
- Updated existing `TestEstimateStoreTxBytes` for new formula
- Full test suite passes

---------

Co-authored-by: you <you@example.com>
2026-04-15 07:53:32 -07:00
Kpa-clawbot
a815e70975 feat: Clock skew detection — backend computation (M1) (#746)
## Summary

Implements **Milestone 1** of #690 — backend clock skew computation for
nodes and observers.

## What's New

### Clock Skew Engine (`clock_skew.go`)

**Phase 1 — Raw Skew Calculation:**
For every ADVERT observation: `raw_skew = advert_timestamp -
observation_timestamp`

**Phase 2 — Observer Calibration:**
Same packet seen by multiple observers → compute each observer's clock
offset as the median deviation from the per-packet median observation
timestamp. This identifies observers with their own clock drift.

**Phase 3 — Corrected Node Skew:**
`corrected_skew = raw_skew + observer_offset` — compensates for observer
clock error.

**Phase 4 — Trend Analysis:**
Linear regression over time-ordered skew samples estimates drift rate in
seconds/day. Detects crystal drift vs stable offset vs sudden jumps.

### Severity Classification

| Level | Threshold | Meaning |
|-------|-----------|---------|
|  OK | < 5 min | Normal |
| ⚠️ Warning | 5 min – 1 hour | Clock drifting |
| 🔴 Critical | 1 hour – 30 days | Likely no time source |
| 🟣 Absurd | > 30 days | Firmware default or epoch 0 |

### New API Endpoints

- `GET /api/nodes/{pubkey}/clock-skew` — per-node skew data (mean,
median, last, drift, severity)
- `GET /api/observers/clock-skew` — observer calibration offsets
- Clock skew also included in `GET /api/nodes/{pubkey}/analytics`
response as `clockSkew` field

### Performance

- 30-second compute cache avoids reprocessing on every request
- Operates on in-memory `byPayloadType[ADVERT]` index — no DB queries
- O(n) in total ADVERT observations, O(m log m) for median calculations

## Tests

15 unit tests covering:
- Severity classification at all thresholds
- Median/mean math helpers
- ISO timestamp parsing
- Timestamp extraction from decoded JSON (nested and top-level)
- Observer calibration with single and multi-observer scenarios
- Observer offset correction direction (verified the sign is
`+obsOffset`)
- Drift estimation: stable, linear, insufficient data, short time span
- JSON number extraction edge cases

## What's NOT in This PR

- No UI changes (M2–M4)
- No customizer integration (M5)
- Thresholds are hardcoded constants (will be configurable in M5)

Implements #690 M1.

---------

Co-authored-by: you <you@example.com>
2026-04-14 23:22:35 -07:00
Kpa-clawbot
aa84ce1e6a fix: correct hash_size detection for transport routes and zero-hop adverts (#747)
## Summary

Fixes #744
Fixes #722

Three bugs in hash_size computation caused zero-hop adverts to
incorrectly report `hash_size=1`, masking nodes that actually use
multi-byte hashes.

## Bugs Fixed

### 1. Wrong path byte offset for transport routes
(`computeNodeHashSizeInfo`)

Transport routes (types 0 and 3) have 4 transport code bytes before the
path byte. The code read the path byte from offset 1 (byte index
`RawHex[2:4]`) for all route types. For transport routes, the correct
offset is 5 (`RawHex[10:12]`).

### 2. Missing RouteTransportDirect skip (`computeNodeHashSizeInfo`)

Zero-hop adverts from `RouteDirect` (type 2) were correctly skipped, but
`RouteTransportDirect` (type 3) zero-hop adverts were not. Both have
locally-generated path bytes with unreliable hash_size bits.

### 3. Zero-hop adverts not skipped in analytics
(`computeAnalyticsHashSizes`)

`computeAnalyticsHashSizes()` unconditionally overwrote a node's
`hashSize` with whatever the latest advert reported. A zero-hop direct
advert with `hash_size=1` could overwrite a previously-correct
`hash_size=2` from a multi-hop flood advert.

Fix: skip hash_size update for zero-hop direct/transport-direct adverts
while still counting the packet and updating `lastSeen`.

## Tests Added

- `TestHashSizeTransportRoutePathByteOffset` — verifies transport routes
read path byte at offset 5, regular flood reads at offset 1
- `TestHashSizeTransportDirectZeroHopSkipped` — verifies both
RouteDirect and RouteTransportDirect zero-hop adverts are skipped
- `TestAnalyticsHashSizesZeroHopSkip` — verifies analytics hash_size is
not overwritten by zero-hop adverts
- Fixed 3 existing tests (`FlipFlop`, `Dominant`, `LatestWins`) that
used route_type 0 (TransportFlood) header bytes without proper transport
code padding

## Complexity

All changes are O(1) per packet — no new loops or data structures. The
additional offset computation and zero-hop check are constant-time
operations within the existing packet scan loop.

Co-authored-by: you <you@example.com>
2026-04-14 23:04:26 -07:00
Kpa-clawbot
84f03f4f41 fix: hide undecryptable channel messages by default (#727) (#728)
## Problem

Channels page shows 53K 'Unknown' messages — undecryptable GRP_TXT
packets with no content. Pure noise.

## Fix

- Backend: channels API filters out undecrypted messages by default
- `?includeEncrypted=true` param to include them
- Frontend: 'Show encrypted' toggle in channels sidebar
- Unknown channels grayed out with '(no key)' label
- Toggle persists in localStorage

Fixes #727

---------

Co-authored-by: you <you@example.com>
2026-04-13 19:40:20 +00:00
Kpa-clawbot
14367488e2 fix: TRACE path_json uses path_sz from flags byte, not header hash_size (#732)
## Summary

TRACE packets encode their route hash size in the flags byte (`flags &
0x03`), not the header path byte. The decoder was using `path.HashSize`
from the header, which could be wrong or zero for direct-route TRACEs,
producing incorrect hop counts in `path_json`.

## Protocol Note

Per firmware, TRACE packets are **always direct-routed** (route_type 2 =
DIRECT, or 3 = TRANSPORT_DIRECT). FLOOD-routed TRACEs (route_type 1) are
anomalous — firmware explicitly rejects TRACE via flood. The decoder
handles these gracefully without crashing.

## Changes

**`cmd/server/decoder.go` and `cmd/ingestor/decoder.go`:**
- Read `pathSz` from TRACE flags byte: `(traceFlags & 0x03) + 1`
(0→1byte, 1→2byte, 2→3byte)
- Use `pathSz` instead of `path.HashSize` for splitting TRACE payload
path data into hops
- Update `path.HashSize` to reflect the actual TRACE path size
- Added `HopsCompleted` field to ingestor `Path` struct for parity with
server
- Updated comments to clarify TRACE is always direct-routed per firmware

**`cmd/server/decoder_test.go` — 5 new tests:**
- `TraceFlags1_TwoBytePathSz`: flags=1 → 2-byte hashes via DIRECT route
- `TraceFlags2_ThreeBytePathSz`: flags=2 → 3-byte hashes via DIRECT
route
- `TracePathSzUnevenPayload`: payload not evenly divisible by path_sz
- `TraceTransportDirect`: route_type=3 with transport codes + TRACE path
parsing
- `TraceFloodRouteGraceful`: anomalous FLOOD+TRACE handled without crash

All existing TRACE tests (flags=0, 1-byte hashes) continue to pass.

Fixes #731

---------

Co-authored-by: you <you@example.com>
2026-04-13 08:20:09 -07:00
Kpa-clawbot
71be54f085 feat: DB-backed channel messages for full history (#725 M1) (#726)
## Summary

Switches channel API endpoints to query SQLite instead of the in-memory
packet store, giving users access to the full message history.

Implements #725 (M1 only — DB-backed channel messages). Does NOT close
#725 — M2-M5 (custom channels, PSK, persistence, retroactive decryption)
remain.

## Problem

Channel endpoints (`/api/channels`, `/api/channels/{hash}/messages`)
preferred the in-memory packet store when available. The store is
bounded by `packetStore.maxMemoryMB` — typically showing only recent
messages. The SQLite database has the complete history (weeks/months of
channel messages) but was only used as a fallback when the store was nil
(never in production).

## Fix

Reversed the preference order: DB first, in-memory store fallback.
Region filtering added to the DB path.

Co-authored-by: you <you@example.com>
2026-04-12 23:22:52 -07:00
Kpa-clawbot
c233c14156 feat: CLI tool to decrypt and export hashtag channel messages (#724)
## Summary

Adds `corescope-decrypt` — a standalone CLI tool that decrypts and
exports MeshCore hashtag channel messages from a CoreScope SQLite
database.

### What it does

MeshCore hashtag channels use symmetric encryption with keys derived
from the channel name. The CoreScope ingestor stores **all** GRP_TXT
packets, even those it can't decrypt. This tool enables retroactive
decryption — decrypt historical messages for any channel whose name you
learn after the fact.

### Architecture

- **`internal/channel/`** — Shared crypto package extracted from
ingestor logic:
  - `DeriveKey()` — `SHA-256("#name")[:16]`
  - `ChannelHash()` — 1-byte packet filter (`SHA-256(key)[0]`)
  - `Decrypt()` — HMAC-SHA256 MAC verify + AES-128-ECB
  - `ParsePlaintext()` — timestamp + flags + "sender: message" parsing

- **`cmd/decrypt/`** — CLI binary with three output formats:
  - `--format json` — Full metadata (observers, path, raw hex)
  - `--format html` — Self-contained interactive viewer with search/sort
  - `--format irc` (or `log`) — Plain-text IRC-style log, greppable

### Usage

```bash
# JSON export
corescope-decrypt --channel "#wardriving" --db meshcore.db

# Interactive HTML viewer
corescope-decrypt --channel wardriving --db meshcore.db --format html --output wardriving.html

# Greppable log
corescope-decrypt --channel "#wardriving" --db meshcore.db --format irc | grep "KE6QR"

# From Docker
docker exec corescope-prod /app/corescope-decrypt --channel "#wardriving" --db /app/data/meshcore.db
```

### Build & deployment

- Statically linked (`CGO_ENABLED=0`) — zero dependencies
- Added to Dockerfile (available at `/app/corescope-decrypt` in
container)
- CI: builds and tests in go-test job
- CI: attaches linux/amd64 and linux/arm64 binaries to GitHub Releases
on tags

### Testing

- `internal/channel/` — 9 tests: key derivation, encrypt/decrypt
round-trip, MAC rejection, wrong-channel rejection, plaintext parsing
- `cmd/decrypt/` — 7 tests: payload extraction, channel hash
consistency, all 3 output formats, JSON parseability, fixture DB
integration
- Verified against real fixture DB: successfully decrypts 17
`#wardriving` messages

### Limitations

- Hashtag channels only (name-derived keys). Custom PSK channels not
supported.
- No DM decryption (asymmetric, per-peer keys).
- Read-only database access.

Fixes #723

---------

Co-authored-by: you <you@example.com>
2026-04-12 22:07:41 -07:00
Kpa-clawbot
65482ff6f6 fix: cache invalidation tuning — 7% → 50-80% hit rate (#721)
## Cache Invalidation Tuning — 7% → 50-80% Hit Rate

Fixes #720

### Problem

Server-side cache hit rate was 7% (48 hits / 631 misses over 4.7 days).
Root causes from the [cache audit
report](https://github.com/Kpa-clawbot/CoreScope/issues/720):

1. **`invalidationDebounce` config value (30s) was dead code** — never
wired to `invCooldown`
2. **`invCooldown` hardcoded to 10s** — with continuous ingest, caches
cleared every 10s regardless of their 1800s TTLs
3. **`collisionCache` cleared on every `hasNewTransmissions`** — hash
collisions are structural (depend on node count), not per-packet

### Changes

| Change | File | Impact |
|--------|------|--------|
| Wire `invalidationDebounce` from config → `invCooldown` | `store.go` |
Config actually works now |
| Default `invCooldown` 10s → 300s (5 min) | `store.go` | 30x longer
cache survival |
| Add `hasNewNodes` flag to `cacheInvalidation` | `store.go` |
Finer-grained invalidation |
| `collisionCache` only clears on `hasNewNodes` | `store.go` | O(n²)
collision computation survives its 1hr TTL |
| `addToByNode` returns new-node indicator | `store.go` | Zero-cost
detection during indexing |
| `indexByNode` returns new-node indicator | `store.go` | Propagates to
ingest path |
| Ingest tracks and passes `hasNewNodes` | `store.go` | End-to-end
wiring |

### Tests Added

| Test | What it verifies |
|------|-----------------|
| `TestInvCooldownFromConfig` | Config value wired to `invCooldown`;
default is 300s |
| `TestCollisionCacheNotClearedByTransmissions` | `hasNewTransmissions`
alone does NOT clear `collisionCache` |
| `TestCollisionCacheClearedByNewNodes` | `hasNewNodes` DOES clear
`collisionCache` |
| `TestCacheSurvivesMultipleIngestCyclesWithinCooldown` | 5 rapid ingest
cycles don't clear any caches during cooldown |
| `TestNewNodesAccumulatedDuringCooldown` | `hasNewNodes` accumulated in
`pendingInv` and applied after cooldown |
| `BenchmarkAnalyticsLatencyCacheHitVsMiss` | 100% hit rate with
rate-limited invalidation |

All 200+ existing tests pass. Both benchmarks show 100% hit rate.

### Performance Justification

- **Before:** Effective cache lifetime = `min(TTL, invCooldown)` = 10s.
With analytics viewed ~once/few minutes, P(hit) ≈ 7%
- **After:** Effective cache lifetime = `min(TTL, 300s)` = 300s for most
caches, 3600s for `collisionCache`. Expected hit rate 50-80%
- **Complexity:** All changes are O(1) — `addToByNode` already checked
`nodeHashes[pubkey] == nil`, we just return the result
- **Benchmark proof:** `BenchmarkAnalyticsLatencyCacheHitVsMiss` → 100%
hit rate, 269ns/op

Co-authored-by: you <you@example.com>
2026-04-12 18:09:23 -07:00
Kpa-clawbot
7af91f7ef6 fix: perf page shows tracked memory instead of heap allocation (#718)
## Summary

The perf page "Memory Used" tile displayed `estimatedMB` (Go
`runtime.HeapAlloc`), which includes all Go runtime allocations — not
just packet store data. This made the displayed value misleading: it
showed ~2.4GB heap when only ~833MB was actual tracked packet data.

## Changes

### Frontend (`public/perf.js`)
- Primary tile now shows `trackedMB` as **"Tracked Memory"** — the
self-accounted packet store memory
- Added separate **"Heap (debug)"** tile showing `estimatedMB` for
runtime visibility

### Backend
- **`types.go`**: Added `TrackedMB` field to `HealthPacketStoreStats`
struct
- **`routes.go`**: Populate `TrackedMB` in `/health` endpoint response
from `GetPerfStoreStatsTyped()`
- **`routes_test.go`**: Assert `trackedMB` exists in health endpoint's
`packetStore`
- **`testdata/golden/shapes.json`**: Updated shape fixture with new
field

### What was already correct
- `/api/perf/stats` already exposed both `estimatedMB` and `trackedMB`
- `trackedMemoryMB()` method already existed in store.go
- Eviction logic already used `trackedBytes` (not HeapAlloc)

## Testing
- All Go tests pass (`go test ./... -count=1`)
- No frontend logic changes beyond template string field swap

Fixes #717

Co-authored-by: you <you@example.com>
2026-04-12 12:40:17 -07:00
Kpa-clawbot
f95aa49804 fix: exclude TRACE packets from multi-byte capability suspected detection (#715)
## Summary

Exclude TRACE packets (payload_type 8) from the "suspected" multi-byte
capability inference logic. TRACE packets carry hash size in their own
flags — forwarding repeaters read it from the TRACE header, not their
compile-time `PATH_HASH_SIZE`. Pre-1.14 repeaters can forward multi-byte
TRACEs without actually supporting multi-byte hashes, creating false
positives.

Fixes #714

## Changes

### `cmd/server/store.go`
- In `computeMultiByteCapability()`, skip packets with `payload_type ==
8` (TRACE) when scanning `byPathHop` for suspected multi-byte nodes
- "Confirmed" detection (from adverts) is unaffected

### `cmd/server/multibyte_capability_test.go`
- `TestMultiByteCapability_TraceExcluded`: TRACE packet with 2-byte path
does NOT mark repeater as suspected
- `TestMultiByteCapability_NonTraceStillSuspected`: Non-TRACE packet
with 2-byte path still marks as suspected
- `TestMultiByteCapability_ConfirmedUnaffectedByTraceExclusion`:
Confirmed status from advert unaffected by TRACE exclusion

## Testing

All 7 multi-byte capability tests pass. Full `cmd/server` and
`cmd/ingestor` test suites pass.

Co-authored-by: you <you@example.com>
2026-04-12 00:11:20 -07:00
Kpa-clawbot
4a7e20a8cb fix: redesign memory eviction — self-accounting trackedBytes, watermarks, safety cap (#711)
## Problem

`HeapAlloc`-based eviction cascades on large databases — evicts down to
near-zero packets because Go runtime overhead exceeds `maxMemoryMB` even
with an empty packet store.

## Fix (per Carmack spec on #710)

1. **Self-accounting `trackedBytes`** — running counter maintained on
insert/evict, computed from actual struct sizes. No
`runtime.ReadMemStats`.
2. **High/low watermark hysteresis** (100%/85%) — evict to 85% of
budget, don't re-trigger until 100% crossed again.
3. **25% per-pass safety cap** — never evict more than a quarter of
packets in one cycle.
4. **Oldest-first** — evict from sorted head, O(1) candidate selection.

`maxMemoryMB` now means packet store budget, not total process heap.

Fixes #710

Co-authored-by: you <you@example.com>
2026-04-11 23:06:48 -07:00
Kpa-clawbot
e893a1b3c4 fix: index relay hops in byNode for liveness tracking (#708)
## Problem

Nodes that only appear as relay hops in packet paths (via
`resolved_path`) were never indexed in `byNode`, so `last_heard` was
never computed for them. This made relay-only nodes show as dead/stale
even when actively forwarding traffic.

Fixes #660

## Root Cause

`indexByNode()` only indexed pubkeys from decoded JSON fields (`pubKey`,
`destPubKey`, `srcPubKey`). Relay nodes appearing in `resolved_path`
were ignored entirely.

## Fix

`indexByNode()` now also iterates:
1. `ResolvedPath` entries from each observation
2. `tx.ResolvedPath` (best observation's resolved path, used for
DB-loaded packets)

A per-call `indexed` set prevents double-indexing when the same pubkey
appears in both decoded JSON and resolved path.

Extracted `addToByNode()` helper to deduplicate the nodeHashes/byNode
append logic.

## Scope

**Phase 1 only** — server-side in-memory indexing. No DB changes, no
ingestor changes. This makes `last_heard` reflect relay activity with
zero risk to persistence.

## Tests

5 new test cases in `TestIndexByNodeResolvedPath`:
- Resolved path pubkeys from observations get indexed
- Null entries in resolved path are skipped
- Relay-only nodes (no decoded JSON match) appear in `byNode`
- Dedup between decoded JSON and resolved path
- `tx.ResolvedPath` indexed when observations are empty

All existing tests pass unchanged.

## Complexity

O(observations × path_length) per packet — typically 1-3 observations ×
1-3 hops. No hot-path regression.

---------

Co-authored-by: you <you@example.com>
2026-04-11 21:25:42 -07:00
Kpa-clawbot
fcba2a9f3d fix: set PRAGMA busy_timeout on all RW SQLite connections (#707)
## Problem

`SQLITE_BUSY` contention between the ingestor and server's async
persistence goroutine drops `resolved_path` and `neighbor_edges`
updates. The DSN parameter `_busy_timeout=10000` may not be honored by
the modernc/sqlite driver.

## Fix

- **`openRW()` now sets `PRAGMA busy_timeout = 5000`** after opening the
connection, guaranteeing SQLite retries for up to 5 seconds before
returning `SQLITE_BUSY`
- **Refactored `PruneOldPackets` and `PruneOldMetrics`** to use
`openRW()` instead of duplicating connection setup — all RW connections
now get consistent busy_timeout handling
- Added test verifying the pragma is set correctly

## Changes

| File | Change |
|------|--------|
| `cmd/server/neighbor_persist.go` | `openRW()` sets `PRAGMA
busy_timeout = 5000` after open |
| `cmd/server/db.go` | `PruneOldPackets` and `PruneOldMetrics` use
`openRW()` instead of inline `sql.Open` |
| `cmd/server/neighbor_persist_test.go` | `TestOpenRW_BusyTimeout`
verifies pragma is set |

## Performance

No performance impact — `PRAGMA busy_timeout` is a connection-level
setting with zero overhead on uncontended writes. Under contention, it
converts immediate `SQLITE_BUSY` failures into brief retries (up to 5s),
which is strictly better than dropping data.

Fixes #705

---------

Co-authored-by: you <you@example.com>
2026-04-11 21:25:23 -07:00
Kpa-clawbot
ef8bce5002 feat: repeater multi-byte capability inference table (#706)
## Summary

Adds a new "Repeater Multi-Byte Capability" section to the Hash Stats
analytics tab that classifies each repeater's ability to handle
multi-byte hash prefixes (firmware >= v1.14).

Fixes #689

## What Changed

### Backend (`cmd/server/store.go`)
- New `computeMultiByteCapability()` method that infers capability for
each repeater using two evidence sources:
- **Confirmed** (100% reliable): node has advertised with `hash_size >=
2`, leveraging existing `computeNodeHashSizeInfo()` data
- **Suspected** (<100%): node's prefix appears as a hop in packets with
multi-byte path headers, using the `byPathHop` index. Prefix collisions
mean this isn't definitive.
- **Unknown**: no multi-byte evidence — could be pre-1.14 or 1.14+ with
default settings
- Extended `/api/analytics/hash-sizes` response with
`multiByteCapability` array

### Frontend (`public/analytics.js`)
- New `renderMultiByteCapability()` function on the Hash Stats tab
- Color-coded table: green confirmed, yellow suspected, gray unknown
- Filter buttons to show all/confirmed/suspected/unknown
- Column sorting by name, role, status, evidence, max hash size, last
seen
- Clickable rows link to node detail pages

### Tests (`cmd/server/multibyte_capability_test.go`)
- `TestMultiByteCapability_Confirmed`: advert with hash_size=2 →
confirmed
- `TestMultiByteCapability_Suspected`: path appearance only → suspected
- `TestMultiByteCapability_Unknown`: 1-byte advert only → unknown
- `TestMultiByteCapability_PrefixCollision`: two nodes sharing prefix,
one confirmed via advert, other correctly marked suspected (not
confirmed)

## Performance

- `computeMultiByteCapability()` runs once per cache cycle (15s TTL via
hash-sizes cache)
- Leverages existing `GetNodeHashSizeInfo()` cache (also 15s TTL) — no
redundant advert scanning
- Path hop scan is O(repeaters × prefix lengths) lookups in the
`byPathHop` map, with early break on first match per prefix
- Only computed for global (non-regional) requests to avoid unnecessary
work

---------

Co-authored-by: you <you@example.com>
2026-04-11 21:02:54 -07:00
copelaje
922ebe54e7 BYOP Advert signature validation (#686)
For BYOP mode in the packet analyzer, perform signature validation on
advert packets and display whether successful or not. This is added as
we observed many corrupted advert packets that would be easily
detectable as such if signature validation checks were performed.

At present this MR is just to add this status in BYOP mode so there is
minimal impact to the application and no performance penalty for having
to perform these checks on all packets. Moving forward it probably makes
sense to do these checks on all advert packets so that corrupt packets
can be ignored in several contexts (like node lists for example).

Let me know what you think and I can adjust as needed.

---------

Co-authored-by: you <you@example.com>
2026-04-12 04:02:17 +00:00
Kpa-clawbot
9917d50622 fix: resolve neighbor graph duplicate entries from different prefix lengths (#699)
## Problem

The neighbor graph creates separate entries for the same physical node
when observed with different prefix lengths. For example, a 1-byte
prefix `B0` (ambiguous, unresolved) and a 2-byte prefix `B05B` (resolved
to Busbee) appear as two separate neighbors of the same node.

Fixes #698

## Solution

### Part 1: Post-build resolution pass (Phase 1.5)

New function `resolveAmbiguousEdges(pm, graph)` in `neighbor_graph.go`:
- Called after `BuildFromStore()` completes the full graph, before any
API use
- Iterates all ambiguous edges and attempts resolution via
`resolveWithContext` with full graph context
- Only accepts high-confidence resolutions (`neighbor_affinity`,
`geo_proximity`, `unique_prefix`) — rejects
`first_match`/`gps_preference` fallbacks to avoid false positives
- Merges with existing resolved edges (count accumulation, max LastSeen)
or updates in-place
- Phase 1 edge collection loop is **unchanged**

### Part 2: API-layer dedup (defense-in-depth)

New function `dedupPrefixEntries()` in `neighbor_api.go`:
- Scans neighbor response for unresolved prefix entries matching
resolved pubkey entries
- Merges counts, timestamps, and observers; removes the unresolved entry
- O(n²) on ~50 neighbors per node — negligible cost

### Performance

Phase 1.5 runs O(ambiguous_edges × candidates). Per Carmack's analysis:
~50ms at 2K nodes on the 5-min rebuild cycle. Hot ingest path untouched.

## Tests

9 new tests in `neighbor_dedup_test.go`:

1. **Geo proximity resolution** — ambiguous edge resolved when candidate
has GPS near context node
2. **Merge with existing** — ambiguous edge merged into existing
resolved edge (count accumulation)
3. **No-match preservation** — ambiguous edge left as-is when prefix has
no candidates
4. **API dedup** — unresolved prefix merged with resolved pubkey in
response
5. **Integration** — node with both 1-byte and 2-byte prefix
observations shows single neighbor entry
6. **Phase 1 regression** — non-ambiguous edge collection unchanged
7. **LastSeen preservation** — merge keeps higher LastSeen timestamp
8. **No-match dedup** — API dedup doesn't merge non-matching prefixes
9. **Benchmark** — Phase 1.5 with 500+ edges

All existing tests pass (server + ingestor).

---------

Co-authored-by: you <you@example.com>
2026-04-10 11:19:54 -07:00
Kpa-clawbot
2e1a4a2e0d fix: handle companion nodes without adverts in My Mesh health cards (#696)
## Summary

Fixes #665 — companion nodes claimed in "My Mesh" showed "Could not load
data" because they never sent an advert, so they had no `nodes` table
entry, causing the health API to return 404.

## Three-Layer Fix

### 1. API Resilience (`cmd/server/store.go`)
`GetNodeHealth()` now falls back to building a partial response from the
in-memory packet store when `GetNodeByPubkey()` returns nil. Returns a
synthetic node stub (`role: "unknown"`, `name: "Unknown"`) with whatever
stats exist from packets, instead of returning nil → 404.

### 2. Ingestor Cleanup (`cmd/ingestor/main.go`)
Removed phantom sender node creation that used `"sender-" + name` as the
pubkey. Channel messages don't carry the sender's real pubkey, so these
synthetic entries were unreachable from the claiming/health flow — they
just polluted the nodes table with unmatchable keys.

### 3. Frontend UX (`public/home.js`)
The catch block in `loadMyNodes()` now distinguishes 404 (node not in DB
yet) from other errors:
- **404**: Shows 📡 "Waiting for first advert — this node has been seen
in channel messages but hasn't advertised yet"
- **Other errors**: Shows  "Could not load data" (unchanged)

## Tests
- Added `TestNodeHealthPartialFromPackets` — verifies a node with
packets but no DB entry returns 200 with synthetic node stub and stats
- Updated `TestHandleMessageChannelMessage` — verifies channel messages
no longer create phantom sender nodes
- All existing tests pass (`cmd/server`, `cmd/ingestor`)

Co-authored-by: you <you@example.com>
2026-04-09 20:03:52 -07:00
Kpa-clawbot
fcad49594b fix: include path.hopsCompleted in TRACE WebSocket broadcasts (#695)
## Summary

Fixes #683 — TRACE packets on the live map were showing the full path
instead of distinguishing completed vs remaining hops.

## Root Cause

Both WebSocket broadcast builders in `store.go` constructed the
`decoded` map with only `header` and `payload` keys — `path` was never
included. The frontend reads `decoded.path.hopsCompleted` to split trace
routes into solid (completed) and dashed (remaining) segments, but that
field was always `undefined`.

## Fix

For TRACE packets (payload type 9), call `DecodePacket()` on the raw hex
during broadcast and include the resulting `Path` struct in
`decoded["path"]`. This populates `hopsCompleted` which the frontend
already knows how to consume.

Both broadcast builders are patched:
- `IngestNewFromDB()` — new transmissions path (~line 1419)
- `IngestNewObservations()` — new observations path (~line 1680)

TRACE packets are infrequent, so the per-packet decode overhead is
negligible.

## Testing

- Added `TestIngestTraceBroadcastIncludesPath` — verifies that TRACE
broadcast maps include `decoded.path` with correct `hopsCompleted` value
- All existing tests pass (`cmd/server` + `cmd/ingestor`)

Co-authored-by: you <you@example.com>
2026-04-09 20:02:46 -07:00
efiten
34e7366d7c test: add RouteTransportDirect zero-hop cases to ingestor decoder tests (#684)
## Summary

Closes the symmetry gap flagged as a nit in PR #653 review:

> The ingestor decoder tests omit `RouteTransportDirect` zero-hop tests
— only the server decoder has those. Since the logic is identical, this
is not a blocker, but adding them would make the test suites symmetric.

- Adds `TestZeroHopTransportDirectHashSize` — `pathByte=0x00`, expects
`HashSize=0`
- Adds `TestZeroHopTransportDirectHashSizeWithNonZeroUpperBits` —
`pathByte=0xC0` (hash_size bits set, hash_count=0), expects `HashSize=0`

Both mirror the equivalent tests already present in
`cmd/server/decoder_test.go`.

## Test plan

- [ ] `cd cmd/ingestor && go test -run TestZeroHopTransportDirect -v` →
both new tests pass
- [ ] `cd cmd/ingestor && go test ./...` → no regressions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 17:36:34 -07:00
Kpa-clawbot
22bf33700e Fix: filter path-hop candidates by resolved_path to prevent prefix collisions (#658)
## Problem

The "Paths Through This Node" API endpoint (`/api/nodes/{pubkey}/paths`)
returns unrelated packets when two nodes share a hex prefix. For
example, querying paths for "Kpa Roof Solar" (`c0dedad4...`) returns 316
packets that actually belong to "C0ffee SF" (`C0FFEEC7...`) because both
share the `c0` prefix in the `byPathHop` index.

Fixes #655

## Root Cause

`handleNodePaths()` in `routes.go` collects candidates from the
`byPathHop` index using 2-char and 4-char hex prefixes for speed, but
never verifies that the target node actually appears in each candidate's
resolved path. The broad index lookup is intentional, but the
**post-filter was missing**.

## Fix

Added `nodeInResolvedPath()` helper in `store.go` that checks whether a
transmission's `resolved_path` (from the neighbor affinity graph via
`resolveWithContext`) contains the target node's full pubkey. The
filter:

- **Includes** packets where `resolved_path` contains the target node's
full pubkey
- **Excludes** packets where `resolved_path` resolved to a different
node (prefix collision)
- **Excludes** packets where `resolved_path` is nil/empty (ambiguous —
avoids false positives)

The check examines both the best observation's resolved_path
(`tx.ResolvedPath`) and all individual observations, so packets are
included if *any* observation resolved the target.

## Tests

- `TestNodeInResolvedPath` — unit test for the helper with 5 cases
(match, different node, nil, all-nil elements, match in observation
only)
- `TestNodePathsPrefixCollisionFilter` — integration test: two nodes
sharing `aa` prefix, verifies the collision packet is excluded from one
and included for the other
- Updated test DB schema to include `resolved_path` column and seed data
with resolved pubkeys
- All existing tests pass (165 additions, 8 modifications)

## Performance

No impact on hot paths. The filter runs once per API call on the
already-collected candidate set (typically small). `nodeInResolvedPath`
is O(observations × hops) per candidate — negligible since observations
per transmission are typically 1–5.

---------

Co-authored-by: you <you@example.com>
2026-04-07 21:24:00 -07:00
Kpa-clawbot
7d71dc857b feat: expose hopsCompleted for TRACE packets, show real path on live map (#656)
## Summary

TRACE packets on the live map previously animated the **full intended
route** regardless of how far the trace actually reached. This made it
impossible to distinguish a completed route from a failed one —
undermining the primary diagnostic purpose of trace packets.

## Changes

### Backend — `cmd/server/decoder.go`

- Added `HopsCompleted *int` field to the `Path` struct
- For TRACE packets, the header path contains SNR bytes (one per hop
that actually forwarded). Before overwriting `path.Hops` with the full
intended route from the payload, we now capture the header path's
`HashCount` as `hopsCompleted`
- This field is included in API responses and WebSocket broadcasts via
the existing JSON serialization

### Frontend — `public/live.js`

- For TRACE packets with `hopsCompleted < totalHops`:
  - Animate only the **completed** portion (solid line + pulse)
- Draw the **unreached** remainder as a dashed/ghosted line (25%
opacity, `6,8` dash pattern) with ghost markers
  - Dashed lines and ghost markers auto-remove after 10 seconds
- When `hopsCompleted` is absent or equals total hops, behavior is
unchanged

### Tests — `cmd/server/decoder_test.go`

- `TestDecodePacket_TraceHopsCompleted` — partial completion (2 of 4
hops)
- `TestDecodePacket_TraceNoSNR` — zero completion (trace not forwarded
yet)
- `TestDecodePacket_TraceFullyCompleted` — all hops completed

## How it works

The MeshCore firmware appends an SNR byte to `pkt->path[]` at each hop
that forwards a TRACE packet. The count of these SNR bytes (`path_len`)
indicates how far the trace actually got. CoreScope's decoder already
parsed the header path, but the TRACE-specific code overwrote it with
the payload hops (full intended route) without preserving the progress
information. Now we save that count first.

Fixes #651

---------

Co-authored-by: you <you@example.com>
2026-04-07 21:19:45 -07:00
Kpa-clawbot
088b4381c3 Fix: Hash Stats 'By Repeaters' includes non-repeater nodes (#654)
## Summary

The "By Repeaters" section on the Hash Stats analytics page was counting
**all** node types (companions, room servers, sensors, etc.) instead of
only repeaters. This made the "By Repeaters" distribution identical to
"Multi-Byte Hash Adopters", defeating the purpose of the breakdown.

Fixes #652

## Root Cause

`computeAnalyticsHashSizes()` in `cmd/server/store.go` built its
`byNode` map from advert packet data without cross-referencing node
roles from the node store. Both `distributionByRepeaters` and
`multiByteNodes` consumed this unfiltered map.

## Changes

### `cmd/server/store.go`
- Build a `nodeRoleByPK` lookup map from `getCachedNodesAndPM()` at the
start of the function
- Store `role` in each `byNode` entry when processing advert packets
- **`distributionByRepeaters`**: filter to only count nodes whose role
contains "repeater"
- **`multiByteNodes`**: include `role` field in output so the frontend
can filter/group by node type

### `cmd/server/coverage_test.go`
- Add `TestHashSizesDistributionByRepeatersFiltersRole`: verifies that
companion nodes are excluded from `distributionByRepeaters` but included
in `multiByteNodes` with correct role

### `cmd/server/routes_test.go`
- Fix `TestHashAnalyticsZeroHopAdvert`: invalidate node cache after DB
insert so role lookup works
- Fix `TestAnalyticsHashSizeSameNameDifferentPubkey`: insert node
records as repeaters + invalidate cache

## Testing

All `cmd/server` tests pass (68 insertions, 3 deletions across 3 files).

Co-authored-by: you <you@example.com>
2026-04-07 21:00:03 -07:00
efiten
144e98bcdf fix: hide hash size for zero-hop direct adverts (#649) (#653)
## Fix: Zero-hop DIRECT packets report bogus hash_size

Closes #649

### Problem
When a DIRECT packet has zero hops (pathByte lower 6 bits = 0), the
generic `hash_size = (pathByte >> 6) + 1` formula produces a bogus value
(1-4) instead of 0/unknown. This causes incorrect hash size displays and
analytics for zero-hop direct adverts.

### Solution

**Frontend (JS):**
- `packets.js` and `nodes.js` now check `(pathByte & 0x3F) === 0` to
detect zero-hop packets and suppress bogus hash_size display.

**Backend (Go):**
- Both `cmd/server/decoder.go` and `cmd/ingestor/decoder.go` reset
`HashSize=0` for DIRECT packets where `pathByte & 0x3F == 0` (hash_count
is zero).
- TRACE packets are excluded since they use hashSize to parse hop data
from the payload.
- The condition uses `pathByte & 0x3F == 0` (not `pathByte == 0x00`) to
correctly handle the case where hash_size bits are non-zero but
hash_count is zero — matching the JS frontend approach.

### Testing

**Backend:**
- Added 4 tests each in `cmd/server/decoder_test.go` and
`cmd/ingestor/decoder_test.go`:
  - DIRECT + pathByte 0x00 → HashSize=0 
- DIRECT + pathByte 0x40 (hash_size bits set, hash_count=0) → HashSize=0

  - Non-DIRECT + pathByte 0x00 → HashSize=1 (unchanged) 
  - DIRECT + pathByte 0x01 (1 hop) → HashSize=1 (unchanged) 
- All existing tests pass (`go test ./...` in both cmd/server and
cmd/ingestor)

**Frontend:**
- Verified hash size display is suppressed for zero-hop direct adverts

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: you <you@example.com>
2026-04-07 19:39:15 -07:00
Kpa-clawbot
0f5e2db5cf feat: auto-generated OpenAPI 3.0 spec endpoint + Swagger UI (#530) (#632)
## Summary

Auto-generated OpenAPI 3.0.3 spec endpoint (`/api/spec`) and Swagger UI
(`/api/docs`) for the CoreScope API.

## What

- **`cmd/server/openapi.go`** — Route metadata map
(`routeDescriptions()`) + spec builder that walks the mux router to
generate a complete OpenAPI 3.0.3 spec at runtime. Includes:
- All 47 API endpoints grouped by tag (admin, analytics, channels,
config, nodes, observers, packets)
- Query parameter documentation for key endpoints (packets, nodes,
search, resolve-hops)
  - Path parameter extraction from mux `{name}` patterns
  - `ApiKeyAuth` security scheme for API-key-protected endpoints
  - Swagger UI served as a self-contained HTML page using unpkg CDN

- **`cmd/server/openapi_test.go`** — Tests for spec endpoint (validates
JSON structure, required fields, path count, security schemes,
self-exclusion of `/api/spec` and `/api/docs`), Swagger UI endpoint, and
`extractPathParams` helper.

- **`cmd/server/routes.go`** — Stores router reference on `Server`
struct for spec generation; registers `/api/spec` and `/api/docs`
routes.

## Design Decisions

- **Runtime spec generation** vs static YAML: The spec walks the actual
router, so it can never drift from registered routes. Route metadata
(summaries, descriptions, tags, auth flags) is maintained in a parallel
map — the test enforces minimum path count to catch drift.
- **No external dependencies**: Uses only stdlib + existing gorilla/mux.
Swagger UI loaded from unpkg CDN (no vendored assets).
- **Security tagging**: Auth-protected endpoints (those behind
`requireAPIKey` middleware) are tagged with `security: [{ApiKeyAuth:
[]}]` in the spec, matching the actual middleware configuration.

## Testing

- `go test -run TestOpenAPI` — validates spec structure, field presence,
path count ≥ 20, security schemes
- `go test -run TestSwagger` — validates HTML response with swagger-ui
references
- `go test -run TestExtractPathParams` — unit tests for path parameter
extraction

---------

Co-authored-by: you <you@example.com>
2026-04-05 15:05:20 -07:00
Kpa-clawbot
a068e3e086 feat: zero-config defaults + deployment docs (M3-M4, #610) (#631)
## Zero-Config Defaults + Deployment Docs

Make CoreScope start with zero configuration — no `config.json`
required. The ingestor falls back to sensible defaults (local MQTT
broker, standard topics, default DB path) when no config file exists.

### What changed

**`cmd/ingestor/config.go`** — `LoadConfig` no longer errors on missing
config file. Instead it logs a message and uses defaults. If no MQTT
sources are configured (from file or env), defaults to
`mqtt://localhost:1883` with `meshcore/#` topic.

**`cmd/ingestor/main.go`** — Removed redundant "no MQTT sources" fatal
(now handled in config layer). Improved the "no connections established"
fatal with actionable hints.

**`README.md`** — Replaced "Docker (Recommended)" section with a
one-command quickstart using the pre-built image. No build step, no
config file, just `docker run`.

**`docs/deployment.md`** — New comprehensive deployment guide covering
Docker, Compose, config reference, MQTT setup, TLS/HTTPS, monitoring,
backup, and troubleshooting.

### Zero-config flow

```
docker run -d -p 80:80 -v corescope-data:/app/data ghcr.io/kpa-clawbot/corescope:latest
```

1. No config.json found → defaults used, log message printed
2. No MQTT sources → defaults to `mqtt://localhost:1883`
3. Internal Mosquitto broker already running in container → connection
succeeds
4. Dashboard shows empty, ready for packets

### Review fixes (commit 13b89bb)

- Removed `DISABLE_CADDY` references from all docs — this env var was
never implemented in the entrypoint
- Fixed `/api/stats` example in deployment guide — showed nonexistent
fields (`mqttConnected`, `uptimeSeconds`, `activeNodes`)
- Improved MQTT connection failure message with actionable
troubleshooting hints

Closes #610

---------

Co-authored-by: you <you@example.com>
2026-04-05 15:04:49 -07:00
Kpa-clawbot
dc5b5ce9a0 fix: reject weak/default API keys + startup warning (#532) (#628)
## Summary

Hardens API key security for write endpoints (fixes #532):

1. **Constant-time comparison** — uses
`crypto/subtle.ConstantTimeCompare` to prevent timing attacks on API key
validation
2. **Weak key blocklist** — rejects known default/example keys (`test`,
`password`, `change-me`, `your-secret-api-key-here`, etc.)
3. **Minimum length enforcement** — keys shorter than 16 characters are
rejected
4. **Startup warning** — logs a clear warning if the configured key is
weak or a known default
5. **Generic error messages** — HTTP 403 response uses opaque
"forbidden" message to prevent information leakage about why a key was
rejected

### Security Model
- **Empty key** → all write endpoints disabled (403)
- **Weak/default key** → all write endpoints disabled (403), startup
warning logged
- **Wrong key** → 401 unauthorized
- **Strong correct key** → request proceeds

### Files Changed
- `cmd/server/config.go` — `IsWeakAPIKey()` function + blocklist
- `cmd/server/routes.go` — constant-time comparison via
`constantTimeEqual()`, weak key rejection
- `cmd/server/main.go` — startup warning for weak keys
- `cmd/server/apikey_security_test.go` — comprehensive test coverage
- `cmd/server/routes_test.go` — existing tests updated to use strong
keys

### Reviews
-  Self-review: all security properties verified
-  djb Final Review: timing fix correct, blocklist pragmatic, error
messages opaque, tests comprehensive. **Verdict: Ship it.**

### Test Results
All existing + new tests pass. Coverage includes: weak key detection
(blocklist + length + case-insensitive), empty key handling, strong key
acceptance, wrong key rejection, and constant-time comparison.

---------

Co-authored-by: you <you@example.com>
2026-04-05 14:50:40 -07:00
Kpa-clawbot
30e7e9ae3c docs: document lock ordering for cacheMu and channelsCacheMu (#624)
## Summary

Documents the lock ordering for all five mutexes in `PacketStore`
(`store.go`) to prevent future deadlocks.

## What changed

Added a comment block above the `PacketStore` struct documenting:

- All 5 mutexes (`mu`, `cacheMu`, `channelsCacheMu`, `groupedCacheMu`,
`regionObsMu`)
- What each mutex guards
- The required acquisition order (numbered 1–5)
- The nesting relationships that exist today (`cacheMu →
channelsCacheMu` in `invalidateCachesFor` and `rebuildAnalyticsCaches`)
- Confirmation that no reverse ordering exists (no deadlock risk)

## Verification

- Grepped all lock acquisition sites to confirm no reverse nesting
exists
- `go build ./...` passes — documentation-only change

Fixes #413

---------

Co-authored-by: you <you@example.com>
2026-04-05 13:00:35 -07:00
Kpa-clawbot
05fbcb09dd fix: wire cacheTTL.analyticsHashSizes config to collision cache (#420) (#622)
## Summary

Fixes #420 — wires `cacheTTL` config values to server-side cache
durations that were previously hardcoded.

## Problem

`collisionCacheTTL` was hardcoded at 60s in `store.go`. The config has
`cacheTTL.analyticsHashSizes: 3600` (1 hour) but it was never read — the
`/api/config/cache` endpoint just passed the raw map to the client
without applying values server-side.

## Changes

- **`store.go`**: Add `cacheTTLSec()` helper to safely extract duration
values from the `cacheTTL` config map. `NewPacketStore` now accepts an
optional `cacheTTL` map (variadic, backward-compatible) and wires:
  - `cacheTTL.analyticsHashSizes` → `collisionCacheTTL`
  - `cacheTTL.analyticsRF` → `rfCacheTTL`
- **Default changed**: `collisionCacheTTL` default raised from 60s →
3600s (1 hour). Hash collision computation is expensive and data changes
rarely — 60s was causing unnecessary recomputation.
- **`main.go`**: Pass `cfg.CacheTTL` to `NewPacketStore`.
- **Tests**: Added `TestCacheTTLFromConfig` and `TestCacheTTLDefaults`
in eviction_test.go. Updated existing `TestHashCollisionsCacheTTL` for
the new default.

## Audit of other cacheTTL values

The remaining `cacheTTL` keys (`stats`, `nodeDetail`, `nodeHealth`,
`nodeList`, `bulkHealth`, `networkStatus`, `observers`, `channels`,
`channelMessages`, `analyticsTopology`, `analyticsChannels`,
`analyticsSubpaths`, `analyticsSubpathDetail`, `nodeAnalytics`,
`nodeSearch`, `invalidationDebounce`) are **client-side only** — served
via `/api/config/cache` and consumed by the frontend. They don't have
corresponding server-side caches to wire to. The only server-side caches
(`rfCache`, `topoCache`, `hashCache`, `chanCache`, `distCache`,
`subpathCache`, `collisionCache`) all use either `rfCacheTTL` or
`collisionCacheTTL`, both now configurable.

## Complexity

O(1) config lookup at store init time. No hot-path impact.

Co-authored-by: you <you@example.com>
2026-04-05 12:49:46 -07:00
efiten
b587f20d1c feat: add distance column to neighbor table in node details (#617)
Closes #616

## What

Adds a **Distance** column to the neighbor table on the node detail
page.

When both the viewed node and a neighbor have GPS coordinates recorded,
the table shows the haversine distance between them (e.g. `3.2 km`).
When either node lacks GPS, the cell shows `—`.

## Changes

**Backend** (`cmd/server/neighbor_api.go`):
- Added `distance_km *float64` (omitempty) to `NeighborEntry`
- In `handleNodeNeighbors`: look up source node coords from `nodeMap`,
then for each resolved (non-ambiguous) neighbor with GPS, compute
`haversineKm` and set the field

**Frontend** (`public/nodes.js`):
- Added `Distance` column header between Last Seen and Conf
- Cell renders `X.X km` or `—` (muted) when unavailable

**Tests** (`cmd/server/neighbor_api_test.go`):
- `TestNeighborAPI_DistanceKm_WithGPS`: two nodes with real coords →
`distance_km` is positive
- `TestNeighborAPI_DistanceKm_NoGPS`: two nodes at 0,0 → `distance_km`
is nil

## Verification

Test at **https://staging.on8ar.eu** — navigate to any node detail page
and scroll to the Neighbors section. Nodes with GPS coordinates show a
distance; those without show `—`.
2026-04-05 12:33:23 -07:00
Kpa-clawbot
767c8a5a3e perf: async chunked backfill — HTTP serves within 2 minutes (#612) (#614)
## Summary

Adds two config knobs for controlling backfill scope and neighbor graph
data retention, plus removes the dead synchronous backfill function.

## Changes

### Config knobs

#### `resolvedPath.backfillHours` (default: 24)
Controls how far back (in hours) the async backfill scans for
observations with NULL `resolved_path`. Transmissions with `first_seen`
older than this window are skipped, reducing startup time for instances
with large historical datasets.

#### `neighborGraph.maxAgeDays` (default: 30)
Controls the maximum age of `neighbor_edges` entries. Edges with
`last_seen` older than this are pruned from both SQLite and the
in-memory graph. Pruning runs on startup (after a 4-minute stagger) and
every 24 hours thereafter.

### Dead code removal
- Removed the synchronous `backfillResolvedPaths` function that was
replaced by the async version.

### Implementation details
- `backfillResolvedPathsAsync` now accepts a `backfillHours` parameter
and filters by `tx.FirstSeen`
- `NeighborGraph.PruneOlderThan(cutoff)` removes stale edges from the
in-memory graph
- `PruneNeighborEdges(conn, graph, maxAgeDays)` prunes both DB and
in-memory graph
- Periodic pruning ticker follows the same pattern as metrics pruning
(24h interval, staggered start)
- Graceful shutdown stops the edge prune ticker

### Config example
Both knobs added to `config.example.json` with `_comment` fields.

## Tests
- Config default/override tests for both knobs
- `TestGraphPruneOlderThan` — in-memory edge pruning
- `TestPruneNeighborEdgesDB` — SQLite + in-memory pruning together
- `TestBackfillRespectsHourWindow` — verifies old transmissions are
excluded by backfill window

---------

Co-authored-by: you <you@example.com>
2026-04-05 09:49:39 -07:00
Kpa-clawbot
232770a858 feat(rf-health): M2 — airtime, error rate, battery charts with delta computation (#605)
## M2: Airtime + Channel Quality + Battery Charts

Implements M2 of #600 — server-side delta computation and three new
charts in the RF Health detail view.

### Backend Changes

**Delta computation** for cumulative counters (`tx_air_secs`,
`rx_air_secs`, `recv_errors`):
- Computes per-interval deltas between consecutive samples
- **Reboot handling:** detects counter reset (current < previous), skips
that delta, records reboot timestamp
- **Gap handling:** if time between samples > 2× interval, inserts null
(no interpolation)
- Returns `tx_airtime_pct` and `rx_airtime_pct` as percentages
(delta_secs / interval_secs × 100)
- Returns `recv_error_rate` as delta_errors / (delta_recv +
delta_errors) × 100

**`resolution` query param** on `/api/observers/{id}/metrics`:
- `5m` (default) — raw samples
- `1h` — hourly aggregates (GROUP BY hour with AVG/MAX)
- `1d` — daily aggregates

**Schema additions:**
- `packets_sent` and `packets_recv` columns added to `observer_metrics`
(migration)
- Ingestor parses these fields from MQTT stats messages

**API response** now includes:
- `tx_airtime_pct`, `rx_airtime_pct`, `recv_error_rate` (computed
deltas)
- `reboots` array with timestamps of detected reboots
- `is_reboot_sample` flag on affected samples

### Frontend Changes

Three new charts in the RF Health detail view, stacked vertically below
noise floor:

1. **Airtime chart** — TX (red) + RX (blue) as separate SVG lines,
Y-axis 0-100%, direct labels at endpoints
2. **Error Rate chart** — `recv_error_rate` line, shown only when data
exists
3. **Battery chart** — voltage line with 3.3V low reference, shown only
when battery_mv > 0

All charts:
- Share X-axis and time range (aligned vertically)
- Reboot markers as vertical hairlines spanning all charts
- Direct labels on data (no legends)
- Resolution auto-selected: `1h` for 7d/30d ranges
- Charts hidden when no data exists

### Tests

- `TestComputeDeltas`: normal deltas, reboot detection, gap detection
- `TestGetObserverMetricsResolution`: 5m/1h/1d downsampling verification
- Updated `TestGetObserverMetrics` for new API signature

---------

Co-authored-by: you <you@example.com>
2026-04-04 23:17:17 -07:00
you
747aea37b7 fix(rf-health): add region filter support to metrics summary
Frontend passes RegionFilter query string to summary API.
Backend filters results by observer IATA region.
Added iata field to MetricsSummaryRow.
2026-04-05 06:00:42 +00:00
Kpa-clawbot
6f35d4d417 feat: RF Health Dashboard M1 — observer metrics + small multiples grid (#604)
## RF Health Dashboard — M1: Observer Metrics Storage, API & Small
Multiples Grid

Implements M1 of #600.

### What this does

Adds a complete RF health monitoring pipeline: MQTT stats ingestion →
SQLite storage → REST API → interactive dashboard with small multiples
grid.

### Backend Changes

**Ingestor (`cmd/ingestor/`)**
- New `observer_metrics` table via migration system (`_migrations`
pattern)
- Parse `tx_air_secs`, `rx_air_secs`, `recv_errors` from MQTT status
messages (same pattern as existing `noise_floor` and `battery_mv`)
- `INSERT OR REPLACE` with timestamps rounded to nearest 5-min interval
boundary (using ingestor wall clock, not observer timestamps)
- Missing fields stored as NULLs — partial data is always better than no
data
- Configurable retention pruning: `retention.metricsDays` (default 30),
runs on startup + every 24h

**Server (`cmd/server/`)**
- `GET /api/observers/{id}/metrics?since=...&until=...` — per-observer
time-series data
- `GET /api/observers/metrics/summary?window=24h` — fleet summary with
current NF, avg/max NF, sample count
- `parseWindowDuration()` supports `1h`, `24h`, `3d`, `7d`, `30d` etc.
- Server-side metrics retention pruning (same config, staggered 2min
after packet prune)

### Frontend Changes

**RF Health tab (`public/analytics.js`, `public/style.css`)**
- Small multiples grid showing all observers simultaneously — anomalies
pop out visually
- Per-observer cell: name, current NF value, battery voltage, sparkline,
avg/max stats
- NF status coloring: warning (amber) at ≥-100 dBm, critical (red) at
≥-85 dBm — text color only, no background fills
- Click any cell → expanded detail view with full noise floor line chart
- Reference lines with direct text labels (`-100 warning`, `-85
critical`) — not color bands
- Min/max points labeled directly on the chart
- Time range selector: preset buttons (1h/3h/6h/12h/24h/3d/7d/30d) +
custom from/to datetime picker
- Deep linking: `#/analytics?tab=rf-health&observer=...&range=...`
- All charts use SVG, matching existing analytics.js patterns
- Responsive: 3-4 columns on desktop, 1 on mobile

### Design Decisions (from spec)
- Labels directly on data, not in legends
- Reference lines with text labels, not color bands
- Small multiples grid, not card+accordion (Tufte: instant visual fleet
comparison)
- Ingestor wall clock for all timestamps (observer clocks may drift)

### Tests Added

**Ingestor tests:**
- `TestRoundToInterval` — 5 cases for rounding to 5-min boundaries
- `TestInsertMetrics` — basic insertion with all fields
- `TestInsertMetricsIdempotent` — INSERT OR REPLACE deduplication
- `TestInsertMetricsNullFields` — partial data with NULLs
- `TestPruneOldMetrics` — retention pruning
- `TestExtractObserverMetaNewFields` — parsing tx_air_secs, rx_air_secs,
recv_errors

**Server tests:**
- `TestGetObserverMetrics` — time-series query with since/until filters,
NULL handling
- `TestGetMetricsSummary` — fleet summary aggregation
- `TestObserverMetricsAPIEndpoints` — DB query verification
- `TestMetricsAPIEndpoints` — HTTP endpoint response shape
- `TestParseWindowDuration` — duration parsing for h/d formats

### Test Results
```
cd cmd/ingestor && go test ./... → PASS (26s)
cd cmd/server && go test ./... → PASS (5s)
```

### What's NOT in this PR (deferred to M2+)
- Server-side delta computation for cumulative counters
- Airtime charts (TX/RX percentage lines)
- Channel quality chart (recv_error_rate)
- Battery voltage chart
- Reboot detection and chart annotations
- Resolution downsampling (1h, 1d aggregates)
- Pattern detection / automated diagnosis

---------

Co-authored-by: you <you@example.com>
2026-04-04 22:21:35 -07:00
Kpa-clawbot
6ae62ce535 perf: make txToMap observations lazy via ExpandObservations flag (#595)
## Summary

`txToMap()` previously always allocated observation sub-maps for every
packet, even though the `/api/packets` handler immediately stripped them
via `delete(p, "observations")` unless `expand=observations` was
requested. A typical page of 50 packets with ~5 observations each caused
300+ unnecessary map allocations per request.

## Changes

- **`txToMap`**: Add variadic `includeObservations bool` parameter.
Observations are only built when `true` is passed, eliminating
allocations when they'd just be discarded.
- **`PacketQuery`**: Add `ExpandObservations bool` field to thread the
caller's intent through the query pipeline.
- **`routes.go`**: Set `ExpandObservations` based on
`expand=observations` query param. Removed the post-hoc `delete(p,
"observations")` loop — observations are simply never created when not
requested.
- **Single-packet lookups** (`GetPacketByID`, `GetPacketByHash`): Always
pass `true` since detail views need observations.
- **Multi-node/analytics queries**: Default (no flag) = no observations,
matching prior behavior.

## Testing

- Added `TestTxToMapLazyObservations` covering all three cases: no flag,
`false`, and `true`.
- All existing tests pass (`go test ./...`).

## Perf Impact

Eliminates ~250 observation map allocations per /api/packets request (at
default page size of 50 with ~5 observations each). This is a
constant-factor improvement per request — no algorithmic complexity
change.

Fixes #374

Co-authored-by: you <you@example.com>
2026-04-04 10:39:30 -07:00
Kpa-clawbot
6e2f79c0ad perf: optimize QueryGroupedPackets — cache observer count, defer map construction (#594)
## Summary

Optimizes `QueryGroupedPackets()` in `store.go` to eliminate two major
inefficiencies on every grouped packet list request:

### Changes

1. **Cache `UniqueObserverCount` on `StoreTx`** — Instead of iterating
all observations to count unique observers on every query
(O(total_observations) per request), we now track unique observers at
ingest time via an `observerSet` map and pre-computed
`UniqueObserverCount` field. This is updated incrementally as
observations arrive.

2. **Defer map construction until after pagination** — Previously,
`map[string]interface{}` was built for ALL 30K+ filtered results before
sorting and paginating. Now the grouped cache stores sorted `[]*StoreTx`
pointers (lightweight), and `groupedTxsToPage()` builds maps only for
the requested page (typically 50 items). This eliminates ~30K map
allocations per cache miss.

3. **Lighter cache footprint** — The grouped cache now stores
`[]*StoreTx` instead of `*PacketResult` with pre-built maps, reducing
memory pressure and GC work.

### Complexity

- Observer counting: O(1) per query (was O(total_observations))
- Map construction: O(page_size) per query (was O(n) where n = all
filtered results)
- Sort remains O(n log n) on cache miss, but the cache (3s TTL) absorbs
repeated requests

### Testing

- `cd cmd/server && go test ./...` — all tests pass
- `cd cmd/ingestor && go build ./...` — builds clean

Fixes #370

---------

Co-authored-by: you <you@example.com>
2026-04-04 10:39:04 -07:00
Kpa-clawbot
b0862f7a41 fix: replace time.Tick with NewTicker in prune goroutine for graceful shutdown (#593)
## Summary

Replace `time.Tick()` with `time.NewTicker()` in the auto-prune
goroutine so it stops cleanly during graceful shutdown.

## Problem

`time.Tick` creates a ticker that can never be garbage collected or
stopped. While the prune goroutine runs for the process lifetime, it
won't stop during graceful shutdown — the goroutine leaks past the
shutdown sequence.

## Fix

- Create a `time.NewTicker` and a done channel
- Use `select` to listen on both the ticker and done channel
- Stop the ticker and close the done channel in the shutdown path (after
`poller.Stop()`)
- Pattern matches the existing `StartEvictionTicker()` approach

## Testing

- `go build ./...` — compiles cleanly
- `go test ./...` — all tests pass

Fixes #377

Co-authored-by: you <you@example.com>
2026-04-04 10:38:37 -07:00
Kpa-clawbot
45991eca09 perf: combine chained filterPackets passes into single scan (#592)
## Summary

Combines the chained `filterTxSlice` calls in `filterPackets()` into a
single pass over the packet slice.

## Problem

When multiple filter parameters are specified (e.g.,
`type=4&route=1&since=...&until=...`), each filter created a new
intermediate `[]*StoreTx` slice. With N filters, this meant N separate
scans and N-1 unnecessary allocations.

## Fix

All filter predicates (type, route, observer, hash, since, until,
region, node) are pre-computed before the loop, then evaluated in a
single `filterTxSlice` call. This eliminates all intermediate
allocations.

**Preserved behavior:**
- Fast-path index lookups for hash-only and observer-only queries remain
unchanged
- Node-only fast-path via `byNode` index preserved
- All existing filter semantics maintained (same comparison operators,
same null checks)

**Complexity:** Single `O(n)` pass regardless of how many filters are
active, vs previous `O(n * k)` where k = number of active filters (each
pass is O(n) but allocates).

## Testing

All existing tests pass (`cd cmd/server && go test ./...`).

Fixes #373

Co-authored-by: you <you@example.com>
2026-04-04 10:38:10 -07:00
Kpa-clawbot
76c42556a2 perf: sort snrVals/rssiVals once in computeAnalyticsRF (#591)
## Summary

Sort `snrVals` and `rssiVals` once upfront in `computeAnalyticsRF()` and
read min/max/median directly from the sorted slices, instead of copying
and sorting per stat call.

## Changes

- Sort both slices once before computing stats (2 sorts total instead of
4+ copy+sorts)
- Read `min` from `sorted[0]`, `max` from `sorted[len-1]`, `median` from
`sorted[len/2]`
- Remove the now-unused `sortedF64` and `medianF64` helper closures

## Performance impact

With 100K+ observations, this eliminates multiple O(n log n) copy+sort
operations. Previously each call to `medianF64` did a full copy + sort,
and `minF64`/`maxF64` did O(n) scans on the unsorted array. Now: 2
in-place sorts total, O(1) lookups for min/max/median.

Fixes #366

Co-authored-by: you <you@example.com>
2026-04-04 10:37:42 -07:00
Kpa-clawbot
6f8378a31c perf: batch-remove from secondary indexes in EvictStale (#590)
## Summary

`EvictStale()` was doing O(n) linear scans per evicted item to remove
from secondary indexes (`byObserver`, `byPayloadType`, `byNode`).
Evicting 1000 packets from an observer with 50K observations meant 1000
× 50K = 50M comparisons — all under a write lock.

## Fix

Replace per-item removal with batch single-pass filtering:

1. **Collect phase**: Walk evicted packets once, building sets of
evicted tx IDs, observation IDs, and affected index keys
2. **Filter phase**: For each affected index slice, do a single pass
keeping only non-evicted entries

**Before**: O(evicted_count × index_slice_size) per index — quadratic in
practice
**After**: O(evicted_count + index_slice_size) per affected key — linear

## Changes

- `cmd/server/store.go`: Restructured `EvictStale()` eviction loop into
collect + batch-filter pattern

## Testing

- All existing tests pass (`cd cmd/server && go test ./...`)

Fixes #368

Co-authored-by: you <you@example.com>
2026-04-04 10:37:27 -07:00
Kpa-clawbot
56115ee0a4 perf: use byNode index in QueryMultiNodePackets instead of full scan (#589)
## Summary

`QueryMultiNodePackets()` was scanning ALL packets with
`strings.Contains` on JSON blobs — O(packets × pubkeys × json_length).
With 30K+ packets and multiple pubkeys, this caused noticeable latency
on `/api/packets?nodes=...`.

## Fix

Replace the full scan with lookups into the existing `byNode` index,
which already maps pubkeys to their transmissions. Merge results with
hash-based deduplication, then apply time filters.

**Before:** O(N × P × J) where N=all packets, P=pubkeys, J=avg JSON
length
**After:** O(M × P) where M=packets per pubkey (typically small), plus
O(R log R) sort for pagination correctness

Results are sorted by `FirstSeen` after merging to maintain the
oldest-first ordering expected by the pagination logic.

Fixes #357

Co-authored-by: you <you@example.com>
2026-04-04 10:36:59 -07:00
Kpa-clawbot
321d1cf913 perf: apply time filter early in GetNodeAnalytics to avoid full packet scan (#588)
## Problem

`GetNodeAnalytics()` in `store.go` scans ALL 30K+ packets doing
`strings.Contains` on every JSON blob when the node has a name, then
filters by time range *after* the full scan. This is `O(packets ×
json_length)` on every `/api/nodes/{pubkey}/analytics` request.

## Fix

Move the `fromISO` time check inside the scan loop so old packets are
skipped **before** the expensive `strings.Contains` matching. For the
non-name path (indexed-only), the time filter is also applied inline,
eliminating the separate `allPkts` intermediate slice.

### Before
1. Scan all packets → collect matches (including old ones) → `allPkts`
2. Filter `allPkts` by time → `packets`

### After
1. Scan packets, skip `tx.FirstSeen <= fromISO` immediately → `packets`

This avoids `strings.Contains` calls on packets outside the requested
time window (typically 7 days out of months of data).

## Complexity
- **Before:** `O(total_packets × avg_json_length)` for name matching
- **After:** `O(recent_packets × avg_json_length)` — only packets within
the time window are string-matched

## Testing
- `cd cmd/server && go test ./...` — all tests pass

Fixes #367

Co-authored-by: you <you@example.com>
2026-04-04 10:36:49 -07:00
Kpa-clawbot
790a713ba9 perf: combine 4 subpath API calls into single bulk endpoint (#587)
## Summary

Consolidates the 4 parallel `/api/analytics/subpaths` calls in the Route
Patterns tab into a single `/api/analytics/subpaths-bulk` endpoint,
eliminating 3 redundant server-side scans of the subpath index on cache
miss.

## Changes

### Backend (`cmd/server/routes.go`, `cmd/server/store.go`)
- New `GET
/api/analytics/subpaths-bulk?groups=2-2:50,3-3:30,4-4:20,5-8:15`
endpoint
- Groups format: `minLen-maxLen:limit` comma-separated
- `GetAnalyticsSubpathsBulk()` iterates `spIndex` once, bucketing
entries into per-group accumulators by hop length
- Hop name resolution is done once per raw hop and shared across groups
- Results are cached per-group for compatibility with existing
single-key cache lookups
- Region-filtered queries fall back to individual
`GetAnalyticsSubpaths()` calls (region filtering requires
per-transmission observer checks)

### Frontend (`public/analytics.js`)
- `renderSubpaths()` now makes 1 API call instead of 4
- Response shape: `{ results: [{ subpaths, totalPaths }, ...] }` —
destructured into the same `[d2, d3, d4, d5]` variables

### Tests (`cmd/server/routes_test.go`)
- `TestAnalyticsSubpathsBulk`: validates 3-group response shape, missing
params error, invalid format error

## Performance

- **Before:** 4 API calls → 4 scans of `spIndex` + 4× hop resolution on
cache miss
- **After:** 1 API call → 1 scan of `spIndex` + 1× hop resolution
(shared cache)
- Cache miss cost reduced by ~75% for this tab
- No change on cache hit (individual group caching still works)

Fixes #398

Co-authored-by: you <you@example.com>
2026-04-04 10:19:18 -07:00
Kpa-clawbot
cd470dffbe perf: batch observation fetching to eliminate N+1 API calls on sort change (#586)
## Summary

Fixes the N+1 API call pattern when changing observation sort mode on
the packets page. Previously, switching sort to Path or Time fired
individual `/api/packets/{hash}` requests for **every**
multi-observation group without cached children — potentially 100+
concurrent requests.

## Changes

### Backend: Batch observations endpoint
- **New endpoint:** `POST /api/packets/observations` accepts `{"hashes":
["h1", "h2", ...]}` and returns all observations keyed by hash in a
single response
- Capped at 200 hashes per request to prevent abuse
- 4 test cases covering empty input, invalid JSON, too-many-hashes, and
valid requests

### Frontend: Use batch endpoint
- `packets.js` sort change handler now collects all hashes needing
observation data and sends a single POST request instead of N individual
GETs
- Same behavior, single round-trip

## Performance

- **Before:** Changing sort with 100 visible groups → 100 concurrent API
requests, browser connection queueing (6 per host), several seconds of
lag
- **After:** Single POST request regardless of group count, response
time proportional to store lookup (sub-millisecond per hash in memory)

Fixes #389

---------

Co-authored-by: you <you@example.com>
2026-04-04 10:18:40 -07:00
Kpa-clawbot
45d8116880 perf: query only matching node locations in handleObservers (#579)
## Summary

`handleObservers()` in `routes.go` was calling `GetNodeLocations()`
which fetches ALL nodes from the DB just to match ~10 observer IDs
against node public keys. With 500+ nodes this is wasteful.

## Changes

- **`db.go`**: Added `GetNodeLocationsByKeys(keys []string)` — queries
only the rows matching the given public keys using a parameterized
`WHERE LOWER(public_key) IN (?, ?, ...)` clause.
- **`routes.go`**: `handleObservers` now collects observer IDs and calls
the targeted method instead of the full-table scan.
- **`coverage_test.go`**: Added `TestGetNodeLocationsByKeys` covering
known key, empty keys, and unknown key cases.

## Performance

With ~10 observers and 500+ nodes, the query goes from scanning all 500
rows to fetching only ~10. The original `GetNodeLocations()` is
preserved for any other callers.

Fixes #378

Co-authored-by: you <you@example.com>
2026-04-04 10:14:37 -07:00