meshcore-analyzer

mirror of https://github.com/Kpa-clawbot/meshcore-analyzer.git synced 2026-04-17 15:45:56 +00:00

Author	SHA1	Message	Date
Kpa-clawbot	f87eb3601c	fix: graceful container shutdown for reliable deployments (#453 ) ## Summary Fixes #450 — staging deployment flaky due to container not shutting down cleanly. ## Root Causes 1. Server never closed DB on shutdown — SQLite WAL lock held indefinitely, blocking new container startup 2. `httpServer.Close()` instead of `Shutdown()` — abruptly kills connections instead of draining them 3. No `stop_grace_period` in compose configs — Docker sends SIGTERM then immediately SIGKILL (default 10s is often not enough for WAL checkpoint) 4. Supervisor didn't forward SIGTERM — missing `stopsignal`/`stopwaitsecs` meant Go processes got SIGKILL instead of graceful shutdown 5. Deploy scripts used default `docker stop` timeout — only 10s grace period ## Changes ### Go Server (`cmd/server/`) - Graceful HTTP shutdown: `httpServer.Shutdown(ctx)` with 15s context timeout — drains in-flight requests before closing - WebSocket cleanup: New `Hub.Close()` method sends `CloseGoingAway` frames to all connected clients - DB close on shutdown: Explicitly closes DB after HTTP server stops (was never closed before) - WAL checkpoint: `PRAGMA wal_checkpoint(TRUNCATE)` before DB close — flushes WAL to main DB file and removes WAL/SHM lock files ### Go Ingestor (`cmd/ingestor/`) - WAL checkpoint on shutdown: New `Store.Checkpoint()` method, called before `Close()` - Longer MQTT disconnect timeout: 5s (was 1s) to allow in-flight messages to drain ### Docker Compose (all 4 variants) - Added `stop_grace_period: 30s` and `stop_signal: SIGTERM` ### Supervisor Configs (both variants) - Added `stopsignal=TERM` and `stopwaitsecs=20` to server and ingestor programs ### Deploy Scripts - `deploy-staging.sh`: `docker stop -t 30` with explicit grace period - `deploy-live.sh`: `docker stop -t 30` with explicit grace period ## Shutdown Sequence (after fix) 1. Docker sends SIGTERM to supervisord (PID 1) 2. Supervisord forwards SIGTERM to server + ingestor (waits up to 20s each) 3. Server: stops poller → drains HTTP (15s) → closes WS clients → checkpoints WAL → closes DB 4. Ingestor: stops tickers → disconnects MQTT (5s) → checkpoints WAL → closes DB 5. Docker waits up to 30s total before SIGKILL ## Tests All existing tests pass: - `cd cmd/server && go test ./...` ✅ - `cd cmd/ingestor && go test ./...` ✅ --------- Co-authored-by: you <you@example.com> Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>	2026-04-01 12:19:20 -07:00
Kpa-clawbot	738d5fef39	fix: poller uses store max IDs to prevent replaying entire DB When GetMaxTransmissionID() fails silently (e.g., corrupted DB returns 0 from COALESCE), the poller starts from ID 0 and replays the entire database over WebSocket — broadcasting thousands of old packets per second. Fix: after querying the DB, use the in-memory store's MaxTransmissionID and MaxObservationID as a floor. Since Load() already read the full DB successfully, the store has the correct max IDs. Root cause discovered on staging: DB corruption caused MAX(id) query to fail, returning 0. Poller log showed 'starting from transmission ID 0' followed by 1000-2000 broadcasts per tick walking through 76K rows. Also adds MaxObservationID() to PacketStore for observation cursor safety. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-31 23:28:56 -07:00
Kpa-clawbot	ab03b142f5	fix: per-observation WS broadcast for live view starburst — fixes #237 IngestNewFromDB now broadcasts one message per observation (not per transmission). IngestNewObservations also broadcasts late arrivals. Tests verify multi-observer packets produce multiple WS messages. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-29 08:32:37 -07:00
Kpa-clawbot	df63efa78d	fix: poll new observations for existing transmissions (fixes #174 ) The poller only queried WHERE t.id > sinceID, which missed new observations added to transmissions already in the store. The trace page was correct because it always queries the DB directly. Add IngestNewObservations() that polls observations by o.id watermark, adds them to existing StoreTx entries, re-picks best observation, and invalidates analytics caches. The Poller now tracks both lastTxID and lastObsID watermarks. Includes tests for v3, v2, dedup, best-path re-pick, and GetMaxObservationID. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 17:26:26 -07:00
Kpa-clawbot	f55a3454aa	feat(go): replace map[string]interface{} with typed Go structs in route handlers Phase 1: Create cmd/server/types.go with ~80 typed response structs matching all proto definitions. Every API response shape is now a compile-time checked struct. Phase 2: Rewire all route handlers in routes.go to construct typed structs instead of map[string]interface{} for response building: - /api/stats -> StatsResponse - /api/health -> HealthResponse - /api/perf -> PerfResponse - /api/config/* -> typed config responses - /api/nodes/* -> NodeListResponse, NodeDetailResponse, etc. - /api/packets/* -> PacketListResponse, PacketDetailResponse - /api/analytics/* -> RFAnalyticsResponse, TopologyResponse, etc. - /api/observers/* -> ObserverListResponse, ObserverResp - /api/channels/* -> ChannelListResponse, ChannelMessagesResponse - /api/traces/* -> TraceResponse - /api/resolve-hops -> ResolveHopsResponse - /api/iata-coords -> IataCoordsResponse (typed IataCoord) - /api/audio-lab/buckets -> AudioLabBucketsResponse - WebSocket broadcast -> WSMessage struct - SlowQuery tracking -> SlowQuery struct (was map) Phase 3 (partial): Add typed store/db methods: - PacketStore.GetCacheStatsTyped() -> CacheStats - PacketStore.GetPerfStoreStatsTyped() -> PerfPacketStoreStats - DB.GetDBSizeStatsTyped() -> SqliteStats Remaining map usage is in store/db data flow (PacketResult.Packets still uses maps) — these will be addressed in a follow-up. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 15:17:21 -07:00
Kpa-clawbot	fc494962d1	fix(go): add broadcast diagnostic logging, rebuild fixes stale deployment The Go staging packets page wasn't live-updating because the deployed binary was stale (built before the #162 fix). Rebuilding from current source fixed the issue — broadcasts now fire correctly. Added two permanent diagnostic log lines: - [poller] IngestNewFromDB: logs when new transmissions are found - [broadcast] sending N packets to M clients: logs each broadcast batch These log lines make it easy to verify the broadcast pipeline is working and would have caught this stale-deployment issue immediately. Verified on VM: WS clients receive packets with nested 'packet' field, all Go tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 14:15:19 -07:00
Kpa-clawbot	95afaf2f0d	fix(go): add nested packet field to WS broadcast, fixes #162 The frontend packets.js filters WS messages with m.data?.packet and extracts m.data.packet for live rendering. Node's server.js includes a packet sub-object (packet: fullPacket) in the broadcast data, but Go's IngestNewFromDB built the data flat without a nested packet field. This caused the Go staging packets page to never live-update via WS even though messages were being sent — they were silently filtered out by packets.js. Fix: build the packet fields map separately, then create the broadcast map with both top-level fields (for live.js) and nested packet (for packets.js). Also fixes the fallback DB-direct poller path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 13:31:37 -07:00
Kpa-clawbot	afe16db960	feat(go-server): in-memory packet store — port of packet-store.js Streams transmissions + observations from SQLite at startup into 5 indexed in-memory structures. QueryPackets and QueryGroupedPackets now serve from RAM (<10ms) instead of hitting SQLite (2.3s). - store.go: PacketStore with byHash, byTxID, byObsID, byObserver, byNode indexes - main.go: create + load store at startup - routes.go: dispatch to store for packet/stats endpoints - websocket.go: poller ingests new transmissions into store Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 08:52:07 -07:00
Kpa-clawbot	b2e6c8105b	fix: handle WebSocket upgrade at root path (client connects to ws://host/) Node.js upgrades WS at /, Go was only at /ws. Now the static file handler checks for Upgrade header first and routes to WebSocket. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 01:25:35 -07:00
Kpa-clawbot	742ed86596	feat: add Go web server (cmd/server/) — full API + WebSocket + static files 35+ REST endpoints matching Node.js server, WebSocket broadcast, static file serving with SPA fallback, config.json support. Uses modernc.org/sqlite (pure Go, no CGO required). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-27 01:16:59 -07:00

10 Commits