Removes the separate config.json file bind mount from both compose
files. The data directory mount already covers it, and the Go server
searches /app/data/config.json via LoadConfig.
- Entrypoint symlinks /app/data/config.json for ingestor compatibility
- manage.sh setup creates config in data dir, prompts admin if missing
- manage.sh start checks config exists before starting, offers to create
- deploy.yml simplified — no more sudo rm or directory cleanup
- Backup/restore updated to use data dir path
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add 1/2/3-byte selector to Hash Issues analytics page
- 1-byte and 2-byte modes show 16×16 matrix with stat cards (nodes
tracked, using N-byte ID, prefix space used, prefix collisions)
- 3-byte mode shows summary stat cards instead of unrenderable grid
- Fix "Nodes tracked" to always show total node count across all modes
- Use CSS variable colours for matrix cells (light/dark mode compatible)
- Replace native title tooltips with custom styled popovers
- Hide collision risk card when 3-byte mode is selected
- Fix double-tooltip bug on mode switch via _matrixTipInit guard
- Fix tooltip persisting outside matrix grid on mouseleave
https://dev.ve7kod.ca/#/analytics
Hash Issues
---------
Co-authored-by: Jesse <your@email.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes#276
## Root cause
TRACE packets store hop IDs in the payload (bytes 9+) rather than in the
header path field. The header path field is overloaded in TRACE packets
to carry RSSI values instead of repeater IDs (as noted in the issue
comments). This meant `Path.Hops` was always empty for TRACE packets —
the raw bytes ended up as an opaque `PathData` hex string with no
structure.
The hashSize encoded in the header path byte (bits 6–7) is still valid
for TRACE and is used to split the payload path bytes into individual
hop prefixes.
## Fix
After decoding a TRACE payload, if `PathData` is non-empty, parse it
into individual hops using `path.HashSize`:
```go
if header.PayloadType == PayloadTRACE && payload.PathData != "" {
pathBytes, err := hex.DecodeString(payload.PathData)
if err == nil && path.HashSize > 0 {
for i := 0; i+path.HashSize <= len(pathBytes); i += path.HashSize {
path.Hops = append(path.Hops, ...)
}
}
}
```
Applied to both `cmd/ingestor/decoder.go` and `cmd/server/decoder.go`.
## Verification
Packet from the issue: `260001807dca00000000007d547d`
| | Before | After |
|---|---|---|
| `Path.Hops` | `[]` | `["7D", "54", "7D"]` |
| `Path.HashCount` | `0` | `3` |
New test `TestDecodeTracePathParsing` covers this exact packet.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
The in-memory `PacketStore` had **no eviction or aging** — it grew
unbounded until OOM killed the process. At ~3K packets/hour and ~5KB per
packet (not the 450 bytes previously estimated), an 8GB VM would OOM in
a few days.
## Changes
### Time-based eviction
- Configurable via `config.json`: `"packetStore": { "retentionHours": 24
}`
- Packets older than the retention window are evicted from the head of
the sorted slice
### Memory-based cap
- Configurable via `"packetStore": { "maxMemoryMB": 1024 }`
- Hard ceiling — evicts oldest packets when estimated memory exceeds the
cap
### Index cleanup
When a `StoreTx` is evicted, ALL associated data is removed from:
- `byHash`, `byTxID`, `byObsID`, `byObserver`, `byNode`, `byPayloadType`
- `nodeHashes`, `distHops`, `distPaths`, `spIndex`
### Periodic execution
- Background ticker runs eviction every 60 seconds
- Analytics caches and hash size cache are invalidated after eviction
### Stats fixes
- `estimatedMB` now uses ~5KB/packet + ~500B/observation (was 430B +
200B)
- `evicted` counter reflects actual evictions (was hardcoded to 0)
- Removed fake `maxPackets: 2386092` and `maxMB: 1024` from stats
### Config example
```json
{
"packetStore": {
"retentionHours": 24,
"maxMemoryMB": 1024
}
}
```
Both values default to 0 (unlimited) for backward compatibility.
## Tests
- 7 new tests in `eviction_test.go` covering time-based, memory-based,
index cleanup, thread safety, config parsing, and no-op when disabled
- All existing tests pass unchanged
Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
## Problem
The RF analytics `packetsPerHour` chart was counting **observations**
instead of **unique transmissions** per hour. With ~34 observations per
transmission on average, the chart showed ~5,645 packets/hr instead of
the correct ~163/hr.
**Evidence from prod API:**
- `packetsPerHour` total: 1,580,620 (sum of all hourly counts)
- `totalPackets`: 45,764
- That's a ~34× inflation — exactly the observations-per-transmission
ratio
## Root Cause
In `store.go`, the `hourBuckets[hr]++` counter was inside the
observations loop (both regional and non-regional paths). Other counters
like `packetSizes` and `typeBuckets` already deduplicate by hash —
`hourBuckets` was the only one that didn't.
## Fix
Added a `seenHourHash` map (keyed by `hash|hour`) to deduplicate. Each
unique transmission is counted once per hour bucket, matching how packet
sizes and payload types already work.
Both the regional observer path and the non-regional path are fixed. The
legacy path (transmissions without observations) was already correct
since it iterates per-transmission.
Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
Root causes from CI logs:
1. 'read /app/config.json: is a directory' — Docker creates a directory
when bind-mounting a non-existent file. The entrypoint now detects
and removes directory config.json before falling back to example.
2. 'unable to open database file: out of memory (14)' — old container
(3GB) not fully exited when new one starts. Deploy now uses
'docker compose down' with timeout and waits for memory reclaim.
3. Supervisor gave up after 3 fast retries (FATAL in ~6s). Increased
startretries to 10 and startsecs to 2 for server and ingestor.
Additional:
- Deploy step ensures staging config.json exists before starting
- Healthcheck: added start_period=60s, increased timeout and retries
- No longer uses manage.sh (CI working dir != repo checkout dir)
Root cause: on the 8GB VM, both prod (~2.5GB) and staging (~2GB) containers
run simultaneously. During deploy, manage.sh would rm the old staging container
and immediately start a new one. The old container's memory wasn't reclaimed
yet, so the new one got 'unable to open database file: out of memory (14)'
from SQLite and both corescope-server and corescope-ingestor entered FATAL.
Fix:
- manage.sh restart staging: wait up to 15s for old container to fully exit,
plus 3s for OS memory reclamation before starting new container
- manage.sh restart staging: verify config.json exists before starting
- docker-compose.staging.yml: add deploy.resources.limits.memory=3g to
prevent staging from consuming unbounded memory
## Summary
Adds `distributionByRepeaters` to the `/api/analytics/hash-sizes`
endpoint in the **Go server**.
### Problem
PR #263 implemented this feature in the deprecated Node.js server
(server.js). All backend changes should go in the Go server at
`cmd/server/`.
### Solution
- For each hash size (1, 2, 3), count how many unique repeaters (nodes)
advertise packets with that hash size
- Uses the existing `byNode` map already computed in
`computeAnalyticsHashSizes()`
- Added to both the live response and the empty/fallback response in
routes.go
- Frontend changes from PR #263 (`public/analytics.js`) already render
this field — no frontend changes needed
### Response shape
```json
{
"distributionByRepeaters": { "1": 42, "2": 7, "3": 2 },
...existing fields...
}
```
### Testing
- All Go server tests pass
- Replaces PR #263 (which modified the wrong server)
Closes#263
---------
Co-authored-by: you <you@example.com>
down tears down the entire compose project including prod.
rm -sf stops and removes just the named service.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Version/Commit/BuildTime now populated from package.json, git, and
date. Exported as env vars so docker compose build picks them up.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## Summary
Complete CI pipeline restructure. Sequential fail-fast chain, E2E tests
against Go server with real staging data, all deprecated Node.js server
tests removed.
### Pipeline (PR):
1. **Go unit tests** — fail-fast, coverage + badges
2. **Playwright E2E** — against Go server with fixture DB, frontend
coverage, fail-fast on first failure
3. **Docker build** — verify containers build
### Pipeline (master merge):
Same chain + deploy to staging + badge publishing
### Removed:
- All Node.js server-side unit tests (deprecated JS server)
- `npm ci` / `npm run test` steps
- JS server coverage collection (`COVERAGE=1 node server.js`)
- Changed-files detection logic
- Docs-only CI skip logic
- Cancel-workflow API hacks
### Added:
- `test-fixtures/e2e-fixture.db` — real data from staging (200 nodes, 31
observers, 500 packets)
- `scripts/capture-fixture.sh` — refresh fixture from staging API
- Go server launches with `-port 13581 -db test-fixtures/e2e-fixture.db
-public public-instrumented`
---------
Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
Co-authored-by: you <you@example.com>
Three optimizations to reduce wall-clock time:
1. Reduce safeClick timeout from 3000ms to 500ms
- Elements either exist immediately after navigation or don't exist at all
- ~75 safeClick calls; if ~30 miss, saves ~75s of dead wait time
2. Replace 18 page.goto() calls with SPA hash navigation
- After initial page load, the SPA shell is already in the DOM
- page.goto() reloads the entire page (network round-trip + parse)
- Hash navigation via location.hash triggers the SPA router instantly
- Only 3 page.goto() remain: initial load + 2 home page loads after localStorage.clear()
3. Remove redundant final route sweep
- All 10 routes were already visited during the page-specific sections
- The sweep just re-navigated to pages that had already been exercised
- Saves ~2s of redundant navigation
Also:
- Reduce inter-route wait from 200ms to 50ms (SPA router is synchronous)
- Merge utility function + packet filter exercises into single evaluate() call
- Use navHash() helper for consistent hash navigation with 150ms settle time
The test 'Node perf page should NOT show Go Runtime section' asserts
Node.js-specific behavior, but E2E tests now run against the Go server
(per this PR), so Go Runtime info is correctly present. Remove the
now-irrelevant assertion.
The Playwright E2E tests were starting `node server.js` (the deprecated
JS server) instead of the Go server, meaning E2E tests weren't testing
the production backend at all.
Changes:
- Add Go 1.22 setup and build steps to the node-test job
- Build the Go server binary before E2E tests run
- Replace `node server.js` with `./corescope-server` in both the
instrumented (coverage) and quick (no-coverage) E2E server starts
- Use `-port 13581` and `-public` flags to configure the Go server
- For coverage runs, serve from `public-instrumented/` directory
The Go server serves the same static files and exposes compatible
/api/* routes (stats, packets, health, perf) that the E2E tests hit.