Files
meshcore-analyzer/cmd
Kpa-clawbot 6345c6fb05 fix(ingestor): observability + bounded backoff for MQTT reconnect (#947) (#949)
## Summary

Fixes #947 — MQTT ingestor silently stalls after `pingresp not received`
disconnect due to paho's default 10-minute reconnect backoff and zero
observability of reconnect attempts.

## Changes

### `cmd/ingestor/main.go`
- **Extract `buildMQTTOpts()`** — encapsulates MQTT client option
construction for testability
- **`SetMaxReconnectInterval(30s)`** — bounds paho's default 10-minute
exponential backoff (source: `options.go:137` in
`paho.mqtt.golang@v1.5.0`)
- **`SetConnectTimeout(10s)`** — prevents stuck connect attempts from
blocking reconnect cycle
- **`SetWriteTimeout(10s)`** — prevents stuck publish writes
- **`SetReconnectingHandler`** — logs `MQTT [<tag>] reconnecting to
<broker>` on every reconnect attempt, giving operators visibility into
retry behavior
- **Enhanced `SetConnectionLostHandler`** — now includes broker address
in log line for multi-source disambiguation

### `cmd/ingestor/mqtt_opts_test.go` (new)
- Tests verify `MaxReconnectInterval`, `ConnectTimeout`, `WriteTimeout`
are set correctly
- Tests verify credential and TLS configuration
- Anti-tautology: tests fail if timing settings are removed from
`buildMQTTOpts()`

## Operator impact

After this change, a pingresp disconnect produces:
```
MQTT [staging] disconnected from tcp://broker:1883: pingresp not received, disconnecting
MQTT [staging] reconnecting to tcp://broker:1883
MQTT [staging] reconnecting to tcp://broker:1883
MQTT [staging] connected to tcp://broker:1883
MQTT [staging] subscribed to meshcore/#
```

Max gap between disconnect and first reconnect attempt: ~30s (was up to
10 minutes).

---------

Co-authored-by: you <you@example.com>
2026-05-01 00:01:07 -07:00
..