10 Commits

Author SHA1 Message Date
Kpa-clawbot
f87eb3601c fix: graceful container shutdown for reliable deployments (#453)
## Summary

Fixes #450 — staging deployment flaky due to container not shutting down
cleanly.

## Root Causes

1. **Server never closed DB on shutdown** — SQLite WAL lock held
indefinitely, blocking new container startup
2. **`httpServer.Close()` instead of `Shutdown()`** — abruptly kills
connections instead of draining them
3. **No `stop_grace_period` in compose configs** — Docker sends SIGTERM
then immediately SIGKILL (default 10s is often not enough for WAL
checkpoint)
4. **Supervisor didn't forward SIGTERM** — missing
`stopsignal`/`stopwaitsecs` meant Go processes got SIGKILL instead of
graceful shutdown
5. **Deploy scripts used default `docker stop` timeout** — only 10s
grace period

## Changes

### Go Server (`cmd/server/`)
- **Graceful HTTP shutdown**: `httpServer.Shutdown(ctx)` with 15s
context timeout — drains in-flight requests before closing
- **WebSocket cleanup**: New `Hub.Close()` method sends `CloseGoingAway`
frames to all connected clients
- **DB close on shutdown**: Explicitly closes DB after HTTP server stops
(was never closed before)
- **WAL checkpoint**: `PRAGMA wal_checkpoint(TRUNCATE)` before DB close
— flushes WAL to main DB file and removes WAL/SHM lock files

### Go Ingestor (`cmd/ingestor/`)
- **WAL checkpoint on shutdown**: New `Store.Checkpoint()` method,
called before `Close()`
- **Longer MQTT disconnect timeout**: 5s (was 1s) to allow in-flight
messages to drain

### Docker Compose (all 4 variants)
- Added `stop_grace_period: 30s` and `stop_signal: SIGTERM`

### Supervisor Configs (both variants)
- Added `stopsignal=TERM` and `stopwaitsecs=20` to server and ingestor
programs

### Deploy Scripts
- `deploy-staging.sh`: `docker stop -t 30` with explicit grace period
- `deploy-live.sh`: `docker stop -t 30` with explicit grace period

## Shutdown Sequence (after fix)

1. Docker sends SIGTERM to supervisord (PID 1)
2. Supervisord forwards SIGTERM to server + ingestor (waits up to 20s
each)
3. Server: stops poller → drains HTTP (15s) → closes WS clients →
checkpoints WAL → closes DB
4. Ingestor: stops tickers → disconnects MQTT (5s) → checkpoints WAL →
closes DB
5. Docker waits up to 30s total before SIGKILL

## Tests

All existing tests pass:
- `cd cmd/server && go test ./...` 
- `cd cmd/ingestor && go test ./...` 

---------

Co-authored-by: you <you@example.com>
Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
2026-04-01 12:19:20 -07:00
Kpa-clawbot
738d5fef39 fix: poller uses store max IDs to prevent replaying entire DB
When GetMaxTransmissionID() fails silently (e.g., corrupted DB returns 0
from COALESCE), the poller starts from ID 0 and replays the entire
database over WebSocket — broadcasting thousands of old packets per second.

Fix: after querying the DB, use the in-memory store's MaxTransmissionID
and MaxObservationID as a floor. Since Load() already read the full DB
successfully, the store has the correct max IDs.

Root cause discovered on staging: DB corruption caused MAX(id) query to
fail, returning 0. Poller log showed 'starting from transmission ID 0'
followed by 1000-2000 broadcasts per tick walking through 76K rows.

Also adds MaxObservationID() to PacketStore for observation cursor safety.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-31 23:28:56 -07:00
Kpa-clawbot
ab03b142f5 fix: per-observation WS broadcast for live view starburst — fixes #237
IngestNewFromDB now broadcasts one message per observation (not per
transmission). IngestNewObservations also broadcasts late arrivals.
Tests verify multi-observer packets produce multiple WS messages.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-29 08:32:37 -07:00
Kpa-clawbot
df63efa78d fix: poll new observations for existing transmissions (fixes #174)
The poller only queried WHERE t.id > sinceID, which missed new
observations added to transmissions already in the store. The trace
page was correct because it always queries the DB directly.

Add IngestNewObservations() that polls observations by o.id watermark,
adds them to existing StoreTx entries, re-picks best observation, and
invalidates analytics caches. The Poller now tracks both lastTxID and
lastObsID watermarks.

Includes tests for v3, v2, dedup, best-path re-pick, and
GetMaxObservationID.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 17:26:26 -07:00
Kpa-clawbot
f55a3454aa feat(go): replace map[string]interface{} with typed Go structs in route handlers
Phase 1: Create cmd/server/types.go with ~80 typed response structs
matching all proto definitions. Every API response shape is now a
compile-time checked struct.

Phase 2: Rewire all route handlers in routes.go to construct typed
structs instead of map[string]interface{} for response building:
- /api/stats -> StatsResponse
- /api/health -> HealthResponse
- /api/perf -> PerfResponse
- /api/config/* -> typed config responses
- /api/nodes/* -> NodeListResponse, NodeDetailResponse, etc.
- /api/packets/* -> PacketListResponse, PacketDetailResponse
- /api/analytics/* -> RFAnalyticsResponse, TopologyResponse, etc.
- /api/observers/* -> ObserverListResponse, ObserverResp
- /api/channels/* -> ChannelListResponse, ChannelMessagesResponse
- /api/traces/* -> TraceResponse
- /api/resolve-hops -> ResolveHopsResponse
- /api/iata-coords -> IataCoordsResponse (typed IataCoord)
- /api/audio-lab/buckets -> AudioLabBucketsResponse
- WebSocket broadcast -> WSMessage struct
- SlowQuery tracking -> SlowQuery struct (was map)

Phase 3 (partial): Add typed store/db methods:
- PacketStore.GetCacheStatsTyped() -> CacheStats
- PacketStore.GetPerfStoreStatsTyped() -> PerfPacketStoreStats
- DB.GetDBSizeStatsTyped() -> SqliteStats

Remaining map usage is in store/db data flow (PacketResult.Packets
still uses maps) — these will be addressed in a follow-up.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 15:17:21 -07:00
Kpa-clawbot
fc494962d1 fix(go): add broadcast diagnostic logging, rebuild fixes stale deployment
The Go staging packets page wasn't live-updating because the deployed
binary was stale (built before the #162 fix). Rebuilding from current
source fixed the issue — broadcasts now fire correctly.

Added two permanent diagnostic log lines:
- [poller] IngestNewFromDB: logs when new transmissions are found
- [broadcast] sending N packets to M clients: logs each broadcast batch

These log lines make it easy to verify the broadcast pipeline is working
and would have caught this stale-deployment issue immediately.

Verified on VM: WS clients receive packets with nested 'packet' field,
all Go tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 14:15:19 -07:00
Kpa-clawbot
95afaf2f0d fix(go): add nested packet field to WS broadcast, fixes #162
The frontend packets.js filters WS messages with m.data?.packet and
extracts m.data.packet for live rendering. Node's server.js includes
a packet sub-object (packet: fullPacket) in the broadcast data, but
Go's IngestNewFromDB built the data flat without a nested packet field.

This caused the Go staging packets page to never live-update via WS
even though messages were being sent — they were silently filtered out
by packets.js.

Fix: build the packet fields map separately, then create the broadcast
map with both top-level fields (for live.js) and nested packet (for
packets.js). Also fixes the fallback DB-direct poller path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 13:31:37 -07:00
Kpa-clawbot
afe16db960 feat(go-server): in-memory packet store — port of packet-store.js
Streams transmissions + observations from SQLite at startup into
5 indexed in-memory structures. QueryPackets and QueryGroupedPackets
now serve from RAM (<10ms) instead of hitting SQLite (2.3s).

- store.go: PacketStore with byHash, byTxID, byObsID, byObserver, byNode indexes
- main.go: create + load store at startup
- routes.go: dispatch to store for packet/stats endpoints
- websocket.go: poller ingests new transmissions into store

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 08:52:07 -07:00
Kpa-clawbot
b2e6c8105b fix: handle WebSocket upgrade at root path (client connects to ws://host/)
Node.js upgrades WS at /, Go was only at /ws. Now the static file
handler checks for Upgrade header first and routes to WebSocket.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 01:25:35 -07:00
Kpa-clawbot
742ed86596 feat: add Go web server (cmd/server/) — full API + WebSocket + static files
35+ REST endpoints matching Node.js server, WebSocket broadcast,
static file serving with SPA fallback, config.json support.
Uses modernc.org/sqlite (pure Go, no CGO required).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 01:16:59 -07:00