pyxis

dandri/pyxis

Fork 0

mirror of https://github.com/torlando-tech/pyxis.git synced 2026-03-30 13:45:38 +00:00

Commit Graph

Author	SHA1	Message	Date
torlando-tech	74d832fb63	NimBLE patches: fix 574 stuck GAP state, add desync diagnostics Patch 3 (ble_gap.c): Handle BLE_ERR_CONN_ESTABLISHMENT (574) unconditionally. NimBLE only handled 574 under BLE_PERIODIC_ADV_WITH_RESPONSES (disabled on ESP32), causing ble_gap_master_failed() to never be called. This left the master GAP state stuck in BLE_GAP_OP_M_CONN, permanently blocking scan and advertising. Also clean up master state in the default case instead of assert(0). Patch 4 (NimBLEDevice.cpp): Expose host reset reason via global volatile int. NimBLE's onReset callback logs the reason code through ESP_LOG (serial UART only). This patch adds nimble_host_reset_reason that the BLE loop polls to capture the reason in UDP log output for remote soak test monitoring. NimBLEPlatform.cpp: Escalate persistent scan failures to full stack recovery. After 3 consecutive enterErrorRecovery() rounds fail to restore scanning (30 total scan failures), escalate to recoverBLEStack() (clean reboot) instead of looping indefinitely in a broken state. Validated with 17+ hour soak test: device recovers from desyncs and maintains 3 active BLE connections with stable heap (~43K). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 12:49:41 -05:00
torlando-tech	3ca27f53f6	Task watchdog, BLE mutex fixes, NimBLE crash-safe recovery Subscribe loopTask and BLE task to the ESP32 Task Watchdog (10s timeout) to detect and recover from silent hangs. Per-step WDT feeds in the main loop prevent false triggers from cumulative slow operations. Fix BLE mutex starvation that blocked the main loop for 3-6s: - Move processDiscoveredPeers() out of performMaintenance() so _mutex is not held during blocking NimBLE connect calls - Use try_lock() in send_outgoing() to skip sends when BLE task has the mutex, rather than blocking (Reticulum retransmits) - Switch BLE data writes to write-without-response (non-blocking) - Add WDT feeds to all NimBLE blocking wait loops Replace NimBLE soft-reset recovery with immediate reboot — deinit() during sync failures caused CORRUPT HEAP panics. With atomic file persistence, data survives reboots reliably. Reduce loop task stack from 49KB to 16KB (measured peak ~6KB). Add NimBLE PHY update null guard to patch_nimble.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:45:43 -05:00
torlando-tech	a499a2b30a	Persistence reliability: NimBLE crash fix, atomic save, fast persist NimBLE crash fix: - Patch ble_hs.c assert(0) in BLE_HS_SYNC_STATE_BRINGUP timer handler via pre-build script (patch_nimble.py). The assert fires when a timer callback races with host re-sync — harmless, but kills the ESP32 and corrupts any file writes in progress. Persistence fixes (in microReticulum submodule): - Atomic save: write to temp file then rename, protecting existing data - Fast persist: 5s after dirty flag instead of waiting 60s interval - Corrupt file recovery: delete invalid files, recover from temp files - INFO-level logging for load/save visibility Other: - Wrap LXMF announce in try/catch for crash safety - Call Identity::should_persist_data() from main loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 01:44:38 -05:00

Author

SHA1

Message

Date

torlando-tech

74d832fb63

NimBLE patches: fix 574 stuck GAP state, add desync diagnostics

Patch 3 (ble_gap.c): Handle BLE_ERR_CONN_ESTABLISHMENT (574) unconditionally.
NimBLE only handled 574 under BLE_PERIODIC_ADV_WITH_RESPONSES (disabled on
ESP32), causing ble_gap_master_failed() to never be called. This left the
master GAP state stuck in BLE_GAP_OP_M_CONN, permanently blocking scan and
advertising. Also clean up master state in the default case instead of
assert(0).

Patch 4 (NimBLEDevice.cpp): Expose host reset reason via global volatile int.
NimBLE's onReset callback logs the reason code through ESP_LOG (serial UART
only). This patch adds nimble_host_reset_reason that the BLE loop polls to
capture the reason in UDP log output for remote soak test monitoring.

NimBLEPlatform.cpp: Escalate persistent scan failures to full stack recovery.
After 3 consecutive enterErrorRecovery() rounds fail to restore scanning (30
total scan failures), escalate to recoverBLEStack() (clean reboot) instead
of looping indefinitely in a broken state.

Validated with 17+ hour soak test: device recovers from desyncs and maintains
3 active BLE connections with stable heap (~43K).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-03 12:49:41 -05:00

torlando-tech

3ca27f53f6

Task watchdog, BLE mutex fixes, NimBLE crash-safe recovery

Subscribe loopTask and BLE task to the ESP32 Task Watchdog (10s timeout)
to detect and recover from silent hangs. Per-step WDT feeds in the main
loop prevent false triggers from cumulative slow operations.

Fix BLE mutex starvation that blocked the main loop for 3-6s:
- Move processDiscoveredPeers() out of performMaintenance() so _mutex
  is not held during blocking NimBLE connect calls
- Use try_lock() in send_outgoing() to skip sends when BLE task has
  the mutex, rather than blocking (Reticulum retransmits)
- Switch BLE data writes to write-without-response (non-blocking)
- Add WDT feeds to all NimBLE blocking wait loops

Replace NimBLE soft-reset recovery with immediate reboot — deinit()
during sync failures caused CORRUPT HEAP panics. With atomic file
persistence, data survives reboots reliably.

Reduce loop task stack from 49KB to 16KB (measured peak ~6KB).
Add NimBLE PHY update null guard to patch_nimble.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-23 10:45:43 -05:00

torlando-tech

a499a2b30a

Persistence reliability: NimBLE crash fix, atomic save, fast persist

NimBLE crash fix:
- Patch ble_hs.c assert(0) in BLE_HS_SYNC_STATE_BRINGUP timer handler
  via pre-build script (patch_nimble.py). The assert fires when a timer
  callback races with host re-sync — harmless, but kills the ESP32 and
  corrupts any file writes in progress.

Persistence fixes (in microReticulum submodule):
- Atomic save: write to temp file then rename, protecting existing data
- Fast persist: 5s after dirty flag instead of waiting 60s interval
- Corrupt file recovery: delete invalid files, recover from temp files
- INFO-level logging for load/save visibility

Other:
- Wrap LXMF announce in try/catch for crash safety
- Call Identity::should_persist_data() from main loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-23 01:44:38 -05:00

3 Commits