Commit Graph

2 Commits

Author SHA1 Message Date
Kpa-clawbot f87eb3601c fix: graceful container shutdown for reliable deployments (#453)
## Summary

Fixes #450 — staging deployment flaky due to container not shutting down
cleanly.

## Root Causes

1. **Server never closed DB on shutdown** — SQLite WAL lock held
indefinitely, blocking new container startup
2. **`httpServer.Close()` instead of `Shutdown()`** — abruptly kills
connections instead of draining them
3. **No `stop_grace_period` in compose configs** — Docker sends SIGTERM
then immediately SIGKILL (default 10s is often not enough for WAL
checkpoint)
4. **Supervisor didn't forward SIGTERM** — missing
`stopsignal`/`stopwaitsecs` meant Go processes got SIGKILL instead of
graceful shutdown
5. **Deploy scripts used default `docker stop` timeout** — only 10s
grace period

## Changes

### Go Server (`cmd/server/`)
- **Graceful HTTP shutdown**: `httpServer.Shutdown(ctx)` with 15s
context timeout — drains in-flight requests before closing
- **WebSocket cleanup**: New `Hub.Close()` method sends `CloseGoingAway`
frames to all connected clients
- **DB close on shutdown**: Explicitly closes DB after HTTP server stops
(was never closed before)
- **WAL checkpoint**: `PRAGMA wal_checkpoint(TRUNCATE)` before DB close
— flushes WAL to main DB file and removes WAL/SHM lock files

### Go Ingestor (`cmd/ingestor/`)
- **WAL checkpoint on shutdown**: New `Store.Checkpoint()` method,
called before `Close()`
- **Longer MQTT disconnect timeout**: 5s (was 1s) to allow in-flight
messages to drain

### Docker Compose (all 4 variants)
- Added `stop_grace_period: 30s` and `stop_signal: SIGTERM`

### Supervisor Configs (both variants)
- Added `stopsignal=TERM` and `stopwaitsecs=20` to server and ingestor
programs

### Deploy Scripts
- `deploy-staging.sh`: `docker stop -t 30` with explicit grace period
- `deploy-live.sh`: `docker stop -t 30` with explicit grace period

## Shutdown Sequence (after fix)

1. Docker sends SIGTERM to supervisord (PID 1)
2. Supervisord forwards SIGTERM to server + ingestor (waits up to 20s
each)
3. Server: stops poller → drains HTTP (15s) → closes WS clients →
checkpoints WAL → closes DB
4. Ingestor: stops tickers → disconnects MQTT (5s) → checkpoints WAL →
closes DB
5. Docker waits up to 30s total before SIGKILL

## Tests

All existing tests pass:
- `cd cmd/server && go test ./...` 
- `cd cmd/ingestor && go test ./...` 

---------

Co-authored-by: you <you@example.com>
Co-authored-by: Kpa-clawbot <kpabap+clawdbot@gmail.com>
2026-04-01 12:19:20 -07:00
Kpa-clawbot 6922d63b1c Add DISABLE_MOSQUITTO support for external brokers (#309)
## Summary
- add `DISABLE_MOSQUITTO` support in container startup by switching
supervisord config when disabled
- add a no-mosquitto supervisord config
(`docker/supervisord-go-no-mosquitto.conf`)
- fix Compose port mapping regression so host ports map to fixed
internal listener ports (`80`, `443`, `1883`)
- add compose variants without MQTT port publishing
(`docker-compose.no-mosquitto.yml`,
`docker-compose.staging.no-mosquitto.yml`)
- update `manage.sh` setup flow to ask `Use built-in MQTT broker?
[Y/n]`, skip MQTT port prompt when disabled, persist
`DISABLE_MOSQUITTO`, and use no-mosquitto compose files when
starting/stopping/restarting
- align `.env.example` staging keys with compose
(`STAGING_GO_HTTP_PORT`, `STAGING_GO_MQTT_PORT`)
- fix staging Caddyfile generation to use `STAGING_GO_HTTP_PORT`
- fix `.env.example` staging default comments to match actual values
(82/1885)

## Validation performed
-  `bash -n manage.sh` passes.
-  With `DISABLE_MOSQUITTO=true`, no-mosquitto compose overrides are
selected, Mosquitto is not started, and MQTT port is not published.
-  With `DISABLE_MOSQUITTO=false`, standard compose files are used,
Mosquitto starts, and MQTT port mapping is present.
- ℹ️ Runtime Docker validation requires a running Docker host.

Fixes #267

---------

Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-30 21:36:29 -07:00