- Move getService()/getCharacteristic() out of mutex-held paths in
writeCharacteristic(), read(), enableNotifications() by caching all
three char pointers (RX, TX, Identity) during discoverServices()
- Replace 5-second spin-wait in processPendingDisconnects() with
non-blocking deferral: break if GATT ops in flight, retry next loop
- Add WARNING logs to all read-path helpers on mutex timeout
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After a 574 connection failure, the NimBLE controller's scan state can
become corrupted (returning rc=530 / Invalid HCI Params) even after the
host re-syncs. This led to scan failure escalation and device reboots.
Key fixes:
- Add ble_gap_conn_cancel() to enterErrorRecovery() — stuck GAP master
connection operations were blocking all subsequent scans
- Add ble_hs_sched_reset(BLE_HS_ECONTROLLER) in error recovery to force
a full host-controller resynchronization after desync
- Proactively cancel stale GAP connections before scan start
- Reduce SCAN_FAIL_RECOVERY_THRESHOLD from 10 to 5 for faster recovery
- Enhanced scan failure logging with GAP state diagnostics
- Move ESP reset reason logging after WiFi init for UDP log visibility
- Suppress connection candidate log spam when at max connections
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 30-second cooldown after NimBLE host desync recovery before
allowing new connection attempts. During desync, client->connect()
blocks waiting for a host-task completion event that never arrives,
causing WDT crashes. The cooldown skips connection attempts while
the host is desynced or recently recovered.
Also adds ESP reset reason logging at boot to diagnose crash types
(WDT, panic, brownout, etc.) in soak test logs.
Soak test results: Run 3 (before) had 17 reboots in ~4 hours with
a 12-crash-in-14-minutes loop. Run 4 (after) has 1 early reboot
then 19+ hours of continuous uptime with the same desync frequency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Critical fixes for NimBLE host task / BLE loop task concurrency:
- Defer all disconnect map cleanup from NimBLE callbacks to loop task via
SPSC ring buffer, preventing iterator invalidation and use-after-free
- Defer enterErrorRecovery() from callback context to loop task
- Add WDT feed in enterErrorRecovery() host-sync polling loop
Operational hardening:
- Cache NimBLERemoteCharacteristic* pointers in write() to avoid repeated
service/characteristic lookups per fragment
- Add isConnected() checks before GATT operations (read, enableNotifications)
- Validate peer address in notification callback to guard against handle reuse
- Skip stuck-state detector during CONNECTING/CONN_STARTING states
- Expire stale pending data entries after HANDSHAKE_TIMEOUT (30s)
- Read actual connection RSSI via ble_gap_conn_rssi() for peripheral connections
instead of hardcoding 0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix use-after-free crash on hangup: set _call_state=IDLE before deleting
_lxst_audio, preventing pump_call_tx() (runs without LVGL lock) from
accessing freed memory
- Replace single-slot _call_signal_pending with 8-element ring buffer queue
to prevent signal loss when CONNECTING+ESTABLISHED arrive in rapid succession
- Extract TX pump into pump_call_tx() called right after reticulum->loop()
for low-latency audio TX without LVGL lock dependency (was buried at step 10)
- Tune ES7210 mic gain to 21dB (was 15dB) to improve Codec2 input level
without ADC clipping that occurred at 24dB
- I2S capture: use APLL for accurate 8kHz clock, direct 8kHz sampling
(no more 16→8kHz decimation), DMA 16x64 for encode burst headroom
- Reduce Reticulum log verbosity to LOG_INFO (was LOG_TRACE)
- BLE: add ble_hs_sched_reset() tiered recovery before reboot on desync,
widen supervision timeout to 4.0s for WiFi coexistence
- Add UDP multicast log broadcasting and OTA flash support
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reduces crash rate from every 60-85s to 1 reboot per 6+ minutes.
Zero WDT triggers in 10-minute stability test.
BLE mutex fixes (BLEInterface.cpp):
- Release _mutex before blocking GATT ops in onConnected() and
onServicesDiscovered() — prevents 5-30s main-loop stalls during
service discovery, notification subscribe, identity exchange
- Non-blocking try_lock() for peerCount(), getConnectedPeerSummaries(),
get_stats() — returns empty/default if BLE task holds mutex
- Write-without-response in initiateHandshake()
WDT and persistence (main.cpp, sdkconfig.defaults, microReticulum):
- 30s WDT timeout (up from 10s) for SPIFFS flash I/O headroom
- Register Identity::set_persist_yield_callback() to feed WDT every
5 entries during save_known_destinations() (70+ entries = 30-50s)
- WDT feeds between reticulum and identity persist calls
BLE host desync recovery (NimBLEPlatform):
- Time-based desync tracking instead of aggressive counter-based reboot
- 60s tolerance without connections, 5 minutes with active connections
(data still flows over existing BLE mesh links)
- Remove immediate recoverBLEStack() from 574 handler and
enterErrorRecovery() — let startScan() manage reboot decision
- Increase CONNECTION_COOLDOWN from 3s to 10s to reduce 574 risk
- Increase SCAN_FAIL_RECOVERY_THRESHOLD from 5 to 10
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix systemic One Definition Rule violation where BLEInterface.h included
headers from deps/microReticulum/src/BLE/ while .cpp files compiled
against local lib/ble_interface/ versions, causing struct layout mismatches
(PeerInfo field shifting corrupted conn_handle/mtu) and class layout
mismatches (BLEPeerManager member differences caused LoadProhibited crash).
Key fixes:
- Include local BLE headers instead of deps versions in BLEInterface.h
- Sync PeerInfo keepalive tracking fields and BLETypes constants with deps
- Shutdown re-entrancy guard and proper client cleanup via deinit(true)
- Host sync checks before scan, advertise, and connect operations
- Avoid deadlock by deferring _on_connected from NimBLE host task
- Duplicate identity detection, stale handle cross-check in keepalives
- Bounds validation on conn_handle in setPeerHandle/promoteToIdentityKeyed
- Periodic persist_data() call for display name persistence across reboots
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split T-Deck firmware from microReticulum examples/lxmf_tdeck/ into its
own repo. microReticulum is consumed as a git submodule dependency pinned
to feat/t-deck. All include paths updated from relative symlinks to bare
includes resolved via library build flags.
Both tdeck (NimBLE) and tdeck-bluedroid environments compile successfully.
Licensed under AGPLv3.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>