pyxis

mirror of https://github.com/torlando-tech/pyxis.git synced 2026-03-30 13:45:38 +00:00

Author	SHA1	Message	Date
torlando-tech	d6d4eb2c9c	BLE stability: defer disconnect processing, fix data races, harden operations Critical fixes for NimBLE host task / BLE loop task concurrency: - Defer all disconnect map cleanup from NimBLE callbacks to loop task via SPSC ring buffer, preventing iterator invalidation and use-after-free - Defer enterErrorRecovery() from callback context to loop task - Add WDT feed in enterErrorRecovery() host-sync polling loop Operational hardening: - Cache NimBLERemoteCharacteristic* pointers in write() to avoid repeated service/characteristic lookups per fragment - Add isConnected() checks before GATT operations (read, enableNotifications) - Validate peer address in notification callback to guard against handle reuse - Skip stuck-state detector during CONNECTING/CONN_STARTING states - Expire stale pending data entries after HANDSHAKE_TIMEOUT (30s) - Read actual connection RSSI via ble_gap_conn_rssi() for peripheral connections instead of hardcoding 0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 00:15:24 -05:00
torlando-tech	e343caf2d2	Stability: WDT yield, BLE mutex fixes, time-based desync recovery Reduces crash rate from every 60-85s to 1 reboot per 6+ minutes. Zero WDT triggers in 10-minute stability test. BLE mutex fixes (BLEInterface.cpp): - Release _mutex before blocking GATT ops in onConnected() and onServicesDiscovered() — prevents 5-30s main-loop stalls during service discovery, notification subscribe, identity exchange - Non-blocking try_lock() for peerCount(), getConnectedPeerSummaries(), get_stats() — returns empty/default if BLE task holds mutex - Write-without-response in initiateHandshake() WDT and persistence (main.cpp, sdkconfig.defaults, microReticulum): - 30s WDT timeout (up from 10s) for SPIFFS flash I/O headroom - Register Identity::set_persist_yield_callback() to feed WDT every 5 entries during save_known_destinations() (70+ entries = 30-50s) - WDT feeds between reticulum and identity persist calls BLE host desync recovery (NimBLEPlatform): - Time-based desync tracking instead of aggressive counter-based reboot - 60s tolerance without connections, 5 minutes with active connections (data still flows over existing BLE mesh links) - Remove immediate recoverBLEStack() from 574 handler and enterErrorRecovery() — let startScan() manage reboot decision - Increase CONNECTION_COOLDOWN from 3s to 10s to reduce 574 risk - Increase SCAN_FAIL_RECOVERY_THRESHOLD from 5 to 10 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 12:30:30 -05:00
torlando-tech	3ca27f53f6	Task watchdog, BLE mutex fixes, NimBLE crash-safe recovery Subscribe loopTask and BLE task to the ESP32 Task Watchdog (10s timeout) to detect and recover from silent hangs. Per-step WDT feeds in the main loop prevent false triggers from cumulative slow operations. Fix BLE mutex starvation that blocked the main loop for 3-6s: - Move processDiscoveredPeers() out of performMaintenance() so _mutex is not held during blocking NimBLE connect calls - Use try_lock() in send_outgoing() to skip sends when BLE task has the mutex, rather than blocking (Reticulum retransmits) - Switch BLE data writes to write-without-response (non-blocking) - Add WDT feeds to all NimBLE blocking wait loops Replace NimBLE soft-reset recovery with immediate reboot — deinit() during sync failures caused CORRUPT HEAP panics. With atomic file persistence, data survives reboots reliably. Reduce loop task stack from 49KB to 16KB (measured peak ~6KB). Add NimBLE PHY update null guard to patch_nimble.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:45:43 -05:00
torlando-tech	43a7e1088f	BLE P2P stability: fix ODR violation, shutdown safety, connection robustness Fix systemic One Definition Rule violation where BLEInterface.h included headers from deps/microReticulum/src/BLE/ while .cpp files compiled against local lib/ble_interface/ versions, causing struct layout mismatches (PeerInfo field shifting corrupted conn_handle/mtu) and class layout mismatches (BLEPeerManager member differences caused LoadProhibited crash). Key fixes: - Include local BLE headers instead of deps versions in BLEInterface.h - Sync PeerInfo keepalive tracking fields and BLETypes constants with deps - Shutdown re-entrancy guard and proper client cleanup via deinit(true) - Host sync checks before scan, advertise, and connect operations - Avoid deadlock by deferring _on_connected from NimBLE host task - Duplicate identity detection, stale handle cross-check in keepalives - Bounds validation on conn_handle in setPeerHandle/promoteToIdentityKeyed - Periodic persist_data() call for display name persistence across reboots Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 00:24:45 -05:00
torlando-tech	ac7b0ac1f7	Remove debug heartbeat and loop_count from BLE interface loop Cleanup after BLE stability debugging - the Serial.printf heartbeat and loop_count were temporary instrumentation no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 21:46:35 -05:00
torlando-tech	869963c33c	BLE: use NimBLEClient for connections, fix service discovery and host reset Root cause: connectNative() used raw ble_gap_connect() which bypasses NimBLE's client management. The NimBLEClient created afterwards wasn't associated with the connection handle, causing service discovery to fail with "could not retrieve services". This led to a connect-disconnect loop where no BLE peers could complete handshakes. Fix: Replace raw ble_gap_connect() with NimBLEClient::connect() which properly manages the GAP event handler, connection handle tracking, MTU exchange, and service discovery. Connections now succeed with MTU 517 and identity handshakes complete. Also fixed: - Error recovery escalates to full stack reset (deinit/reinit) when NimBLE host fails to sync, instead of looping in a dead state - Added recursion guard in enterErrorRecovery() - Promoted key BLE logs (scan, connect, peer status) to INFO level for visibility during monitoring - Added 10-second serial heartbeat with connection/peer/heap stats Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 21:34:28 -05:00
torlando-tech	769c9952bd	BLE P2P stability: PSRAM zero-init, pool sizing, stuck-state recovery Root cause: Bytes objects stored in PSRAM-allocated BLEInterface had corrupted shared_ptr members from uninitialized memory, causing crashes in processDiscoveredPeers(). Fixed by using heap_caps_calloc instead of heap_caps_malloc for PSRAM placement-new allocation. Additional fixes: - Reduce pool sizes to fit memory budget (reassembler 134KB→17KB, fragmenters 8→4, handshakes 32→4, pending data 64→8) - Store local MAC as BLEAddress struct instead of Bytes to avoid heap allocation in PSRAM-resident object - Move setLocalMac after platform start (NimBLE needs to be running for valid random address), add lazy MAC init fallback in loop() - Add stuck-state detector: resets GAP state machine if hardware is idle but state machine thinks it's busy - Enhance getLocalAddress with 3 fallback methods (NimBLE API, ble_hs_id_copy_addr RANDOM, esp_read_mac efuse) - Fix C++17 structured binding to C++11 compatibility - Increase BLE task stack 8KB→12KB for string ops in debug logs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:57:05 -05:00
torlando-tech	ec181e18f8	BLE interop fixes: pools, keepalive tracking, data buffering - Replace std::map and std::vector with fixed-size pools in BLEInterface (fragmenters, pending handshakes, pending data) - Track keepalive failures and disconnect after 3 consecutive - Force-disconnect zombie peers detected by BLEPeerManager - Add periodic advertising refresh (every 60s) to combat silent stops - Buffer incoming data when identity not yet mapped instead of dropping - Subtract ATT_OVERHEAD from MTU in NimBLEPlatform connection setup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 00:18:02 -05:00
torlando-tech	ac6ceca9f8	Initial commit: standalone Pyxis T-Deck firmware Split T-Deck firmware from microReticulum examples/lxmf_tdeck/ into its own repo. microReticulum is consumed as a git submodule dependency pinned to feat/t-deck. All include paths updated from relative symlinks to bare includes resolved via library build flags. Both tdeck (NimBLE) and tdeck-bluedroid environments compile successfully. Licensed under AGPLv3. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 19:48:33 -05:00

9 Commits