mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-04-18 21:15:39 +00:00
## Summary Fixes #450 — staging deployment flaky due to container not shutting down cleanly. ## Root Causes 1. **Server never closed DB on shutdown** — SQLite WAL lock held indefinitely, blocking new container startup 2. **`httpServer.Close()` instead of `Shutdown()`** — abruptly kills connections instead of draining them 3. **No `stop_grace_period` in compose configs** — Docker sends SIGTERM then immediately SIGKILL (default 10s is often not enough for WAL checkpoint) 4. **Supervisor didn't forward SIGTERM** — missing `stopsignal`/`stopwaitsecs` meant Go processes got SIGKILL instead of graceful shutdown 5. **Deploy scripts used default `docker stop` timeout** — only 10s grace period ## Changes ### Go Server (`cmd/server/`) - **Graceful HTTP shutdown**: `httpServer.Shutdown(ctx)` with 15s context timeout — drains in-flight requests before closing - **WebSocket cleanup**: New `Hub.Close()` method sends `CloseGoingAway` frames to all connected clients - **DB close on shutdown**: Explicitly closes DB after HTTP server stops (was never closed before) - **WAL checkpoint**: `PRAGMA wal_checkpoint(TRUNCATE)` before DB close — flushes WAL to main DB file and removes WAL/SHM lock files ### Go Ingestor (`cmd/ingestor/`) - **WAL checkpoint on shutdown**: New `Store.Checkpoint()` method, called before `Close()` - **Longer MQTT disconnect timeout**: 5s (was 1s) to allow in-flight messages to drain ### Docker Compose (all 4 variants) - Added `stop_grace_period: 30s` and `stop_signal: SIGTERM` ### Supervisor Configs (both variants) - Added `stopsignal=TERM` and `stopwaitsecs=20` to server and ingestor programs ### Deploy Scripts - `deploy-staging.sh`: `docker stop -t 30` with explicit grace period - `deploy-live.sh`: `docker stop -t 30` with explicit grace period ## Shutdown Sequence (after fix) 1. Docker sends SIGTERM to supervisord (PID 1) 2. Supervisord forwards SIGTERM to server + ingestor (waits up to 20s each) 3. Server: stops poller → drains HTTP (15s) → closes WS clients → checkpoints WAL → closes DB 4. Ingestor: stops tickers → disconnects MQTT (5s) → checkpoints WAL → closes DB 5. Docker waits up to 30s total before SIGKILL ## Tests All existing tests pass: - `cd cmd/server && go test ./...` ✅ - `cd cmd/ingestor && go test ./...` ✅ --------- Co-authored-by: you <you@example.com> Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
28 lines
699 B
Bash
28 lines
699 B
Bash
#!/bin/bash
|
|
set -e
|
|
|
|
DEPLOY_DIR="$(cd "$(dirname "$0")" && pwd)"
|
|
MATOMO_COMMIT="38c30f9"
|
|
|
|
cd "$DEPLOY_DIR"
|
|
|
|
echo "[deploy] Fetching latest from origin..."
|
|
git fetch origin
|
|
|
|
echo "[deploy] Resetting to origin/master..."
|
|
git reset --hard origin/master
|
|
|
|
echo "[deploy] Building Docker image..."
|
|
docker build -t meshcore-analyzer .
|
|
|
|
echo "[deploy] Stopping old container (30s grace period)..."
|
|
docker stop -t 30 meshcore-analyzer && docker rm meshcore-analyzer
|
|
docker run -d --name meshcore-analyzer \
|
|
--restart unless-stopped \
|
|
-p 3000:3000 \
|
|
-v "$(pwd)/config.json:/app/config.json:ro" \
|
|
-v meshcore-data:/app/data \
|
|
meshcore-analyzer
|
|
|
|
echo "[deploy] Done. Live at https://analyzer.on8ar.eu"
|