6 Commits

Author SHA1 Message Date
torlando-tech
d6d4eb2c9c BLE stability: defer disconnect processing, fix data races, harden operations
Critical fixes for NimBLE host task / BLE loop task concurrency:
- Defer all disconnect map cleanup from NimBLE callbacks to loop task via
  SPSC ring buffer, preventing iterator invalidation and use-after-free
- Defer enterErrorRecovery() from callback context to loop task
- Add WDT feed in enterErrorRecovery() host-sync polling loop

Operational hardening:
- Cache NimBLERemoteCharacteristic* pointers in write() to avoid repeated
  service/characteristic lookups per fragment
- Add isConnected() checks before GATT operations (read, enableNotifications)
- Validate peer address in notification callback to guard against handle reuse
- Skip stuck-state detector during CONNECTING/CONN_STARTING states
- Expire stale pending data entries after HANDSHAKE_TIMEOUT (30s)
- Read actual connection RSSI via ble_gap_conn_rssi() for peripheral connections
  instead of hardcoding 0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 00:15:24 -05:00
torlando-tech
e343caf2d2 Stability: WDT yield, BLE mutex fixes, time-based desync recovery
Reduces crash rate from every 60-85s to 1 reboot per 6+ minutes.
Zero WDT triggers in 10-minute stability test.

BLE mutex fixes (BLEInterface.cpp):
- Release _mutex before blocking GATT ops in onConnected() and
  onServicesDiscovered() — prevents 5-30s main-loop stalls during
  service discovery, notification subscribe, identity exchange
- Non-blocking try_lock() for peerCount(), getConnectedPeerSummaries(),
  get_stats() — returns empty/default if BLE task holds mutex
- Write-without-response in initiateHandshake()

WDT and persistence (main.cpp, sdkconfig.defaults, microReticulum):
- 30s WDT timeout (up from 10s) for SPIFFS flash I/O headroom
- Register Identity::set_persist_yield_callback() to feed WDT every
  5 entries during save_known_destinations() (70+ entries = 30-50s)
- WDT feeds between reticulum and identity persist calls

BLE host desync recovery (NimBLEPlatform):
- Time-based desync tracking instead of aggressive counter-based reboot
- 60s tolerance without connections, 5 minutes with active connections
  (data still flows over existing BLE mesh links)
- Remove immediate recoverBLEStack() from 574 handler and
  enterErrorRecovery() — let startScan() manage reboot decision
- Increase CONNECTION_COOLDOWN from 3s to 10s to reduce 574 risk
- Increase SCAN_FAIL_RECOVERY_THRESHOLD from 5 to 10

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 12:30:30 -05:00
torlando-tech
43a7e1088f BLE P2P stability: fix ODR violation, shutdown safety, connection robustness
Fix systemic One Definition Rule violation where BLEInterface.h included
headers from deps/microReticulum/src/BLE/ while .cpp files compiled
against local lib/ble_interface/ versions, causing struct layout mismatches
(PeerInfo field shifting corrupted conn_handle/mtu) and class layout
mismatches (BLEPeerManager member differences caused LoadProhibited crash).

Key fixes:
- Include local BLE headers instead of deps versions in BLEInterface.h
- Sync PeerInfo keepalive tracking fields and BLETypes constants with deps
- Shutdown re-entrancy guard and proper client cleanup via deinit(true)
- Host sync checks before scan, advertise, and connect operations
- Avoid deadlock by deferring _on_connected from NimBLE host task
- Duplicate identity detection, stale handle cross-check in keepalives
- Bounds validation on conn_handle in setPeerHandle/promoteToIdentityKeyed
- Periodic persist_data() call for display name persistence across reboots

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 00:24:45 -05:00
torlando-tech
769c9952bd BLE P2P stability: PSRAM zero-init, pool sizing, stuck-state recovery
Root cause: Bytes objects stored in PSRAM-allocated BLEInterface had
corrupted shared_ptr members from uninitialized memory, causing crashes
in processDiscoveredPeers(). Fixed by using heap_caps_calloc instead of
heap_caps_malloc for PSRAM placement-new allocation.

Additional fixes:
- Reduce pool sizes to fit memory budget (reassembler 134KB→17KB,
  fragmenters 8→4, handshakes 32→4, pending data 64→8)
- Store local MAC as BLEAddress struct instead of Bytes to avoid
  heap allocation in PSRAM-resident object
- Move setLocalMac after platform start (NimBLE needs to be running
  for valid random address), add lazy MAC init fallback in loop()
- Add stuck-state detector: resets GAP state machine if hardware
  is idle but state machine thinks it's busy
- Enhance getLocalAddress with 3 fallback methods (NimBLE API,
  ble_hs_id_copy_addr RANDOM, esp_read_mac efuse)
- Fix C++17 structured binding to C++11 compatibility
- Increase BLE task stack 8KB→12KB for string ops in debug logs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 20:57:05 -05:00
torlando-tech
ec181e18f8 BLE interop fixes: pools, keepalive tracking, data buffering
- Replace std::map and std::vector with fixed-size pools in
  BLEInterface (fragmenters, pending handshakes, pending data)
- Track keepalive failures and disconnect after 3 consecutive
- Force-disconnect zombie peers detected by BLEPeerManager
- Add periodic advertising refresh (every 60s) to combat silent stops
- Buffer incoming data when identity not yet mapped instead of dropping
- Subtract ATT_OVERHEAD from MTU in NimBLEPlatform connection setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 00:18:02 -05:00
torlando-tech
ac6ceca9f8 Initial commit: standalone Pyxis T-Deck firmware
Split T-Deck firmware from microReticulum examples/lxmf_tdeck/ into its
own repo. microReticulum is consumed as a git submodule dependency pinned
to feat/t-deck. All include paths updated from relative symlinks to bare
includes resolved via library build flags.

Both tdeck (NimBLE) and tdeck-bluedroid environments compile successfully.
Licensed under AGPLv3.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 19:48:33 -05:00