Commit Graph

25 Commits

Author SHA1 Message Date
Torlando
70b8df052d Merge pull request #12 from torlando-tech/feature/splash-screen
Add boot splash screen with Pyxis constellation logo
2026-03-04 18:38:43 -05:00
Torlando
3c81eeb5be Merge pull request #11 from torlando-tech/fix/ble-wdt-stability
Fix Task WDT crashes from LVGL priority starvation
2026-03-04 17:07:27 -05:00
torlando-tech
a4a1aacdd8 Show boot splash within 1s of power-on instead of after 20s+ init
Move Display::init_hardware_only() and POWER_EN to right after serial
banner, before GPS/WiFi/SD/Reticulum init. Add 150ms delay after
POWER_EN HIGH so ST7789V power rail stabilizes before SPI commands
(without this, SWRESET is sent to an unpowered chip and silently lost).

Splash now visible for entire boot period (~18s) until LVGL takes over.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 14:12:25 -05:00
torlando-tech
c80e63dee9 Fix Task WDT crashes: LVGL priority starvation + BLE WDT false positives
Two root causes for frequent device reboots:

1. LVGL task (priority 2) starved loopTask (priority 1) on core 1.
   During heavy screen rendering, loopTask couldn't run for 30+ seconds,
   triggering the Task WDT. Fixed by lowering LVGL to priority 1 so
   FreeRTOS round-robins both tasks fairly.

2. BLE task was registered with the 30s Task WDT, but blocking NimBLE
   GATT operations (connect + service discovery + subscribe + read) can
   legitimately take 30-60s total. Removed BLE task from WDT since
   NimBLE has its own internal ~30s timeouts per GATT operation.

Also added ble_hs_synced() guards to write(), read(), notify(),
writeCharacteristic(), discoverServices(), and enableNotifications()
to prevent use-after-free on stale NimBLE client pointers during
host resets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 14:11:51 -05:00
torlando-tech
b4afa6d3f7 Fix SD card SPI init: use FSPI before Display claims HSPI
SD card was unresponsive (MISO stuck 0xFF) because Display's HSPI
peripheral had already claimed the GPIO pins via the matrix, preventing
FSPI from routing MISO. Fix by initializing SD card BEFORE Display,
using the global SPI (FSPI) instance — matching LilyGo's reference code.

- Move SD card init before display init in boot sequence
- Use global SPI (FSPI) instead of Display's SPIClass(HSPI)
- Lower SPI frequency to 800kHz matching LilyGo example
- Drive all CS lines (display, LoRa, SD) high before SD init
- Add MISO=38 to Display's SPI.begin for post-init bus sharing
- Add Display::get_spi() accessor for future shared use

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 09:29:47 -05:00
torlando-tech
d03f0b308f Add shared SPI bus mutex for SD card, display, and LoRa coexistence
The T-Deck Plus shares HSPI across the display (CS=12), LoRa (CS=9),
and SD card (CS=39). Previously SD logging was disabled because
SD.begin() reconfigured the SPI bus and blanked the display.

This introduces a FreeRTOS mutex created in main.cpp and injected into
Display, SX1262Interface, and a new SDAccess class so all three
peripherals serialize their SPI transactions safely.

- Add SDAccess class wrapping SD.begin() and file ops with mutex
- Add set_spi_mutex() to Display and SX1262Interface
- Wrap Display flush, fill, draw, and power ops in mutex
- Refactor SDLogger to use SDAccess mutex instead of owning SD.begin()
- Wire up mutex creation and injection order in setup()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 00:19:10 -05:00
Torlando
d9411fb4bb Merge pull request #7 from torlando-tech/ble-stability-audit
BLE stability: fix desync crash loops and scan recovery
2026-03-03 23:39:38 -05:00
torlando-tech
46ce057a1e BLE stability: host-controller resync, stuck GAP conn cancel, scan diagnostics
After a 574 connection failure, the NimBLE controller's scan state can
become corrupted (returning rc=530 / Invalid HCI Params) even after the
host re-syncs. This led to scan failure escalation and device reboots.

Key fixes:
- Add ble_gap_conn_cancel() to enterErrorRecovery() — stuck GAP master
  connection operations were blocking all subsequent scans
- Add ble_hs_sched_reset(BLE_HS_ECONTROLLER) in error recovery to force
  a full host-controller resynchronization after desync
- Proactively cancel stale GAP connections before scan start
- Reduce SCAN_FAIL_RECOVERY_THRESHOLD from 10 to 5 for faster recovery
- Enhanced scan failure logging with GAP state diagnostics
- Move ESP reset reason logging after WiFi init for UDP log visibility
- Suppress connection candidate log spam when at max connections

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 19:57:55 -05:00
torlando-tech
2cc9441f0a BLE stability: desync connect cooldown prevents crash-on-connect
Add 30-second cooldown after NimBLE host desync recovery before
allowing new connection attempts. During desync, client->connect()
blocks waiting for a host-task completion event that never arrives,
causing WDT crashes. The cooldown skips connection attempts while
the host is desynced or recently recovered.

Also adds ESP reset reason logging at boot to diagnose crash types
(WDT, panic, brownout, etc.) in soak test logs.

Soak test results: Run 3 (before) had 17 reboots in ~4 hours with
a 12-crash-in-14-minutes loop. Run 4 (after) has 1 early reboot
then 19+ hours of continuous uptime with the same desync frequency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 18:34:40 -05:00
davidcranor
827ff2eb42 Fix cross-platform build: replace ${PROJECT_DIR} with relative paths
platformio.ini:
- Replace -I${PROJECT_DIR}/lib, -I${PROJECT_DIR}/deps/... with relative
  paths (-Ilib, -Ideps/...) in both tdeck-bluedroid and tdeck environments;
  ${PROJECT_DIR} is mangled on Windows inside build_flags, causing include
  paths to resolve inside the PlatformIO builder directory instead of the
  project root
- Remove hardcoded -I.pio/libdeps/tdeck/TinyGPSPlus/src and
  -I.pio/libdeps/tdeck/NimBLE-Arduino/src; these paths reference generated
  cache, break on fresh clones, and are redundant with lib_ldf_mode = deep+
- Fix OTA upload_command: replace python3 with $PYTHONEXE so it resolves
  to PlatformIO's bundled Python on Windows, macOS, and Linux

src/main.cpp, lib/tdeck_ui/UI/LXMF/UIManager.cpp:
- Change #include "tone/Tone.h" to #include "Tone.h"; PlatformIO
  automatically adds -Ilib/tone for local libraries, making the
  subdirectory prefix unnecessary and broken when -Ilib is not effective

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 15:19:32 -05:00
torlando-tech
609a3bc62b LXMF propagation sync, manual node entry, and status improvements
Propagation sync (microReticulum submodule):
- Fix msgpack interop: send nil (not 0) for per_transfer_limit so
  Python server doesn't reject all messages as exceeding "0 KB limit"
- Fix Resource response routing: extract request_id from packed data
  when not present in Resource advertisement, route to pending request
  callback instead of generic concluded handler
- Fix Link::request() to manually build packed arrays, avoiding
  Bytes::to_msgpack() BIN-wrapping that breaks protocol interop

UI enhancements:
- PropagationNodesScreen: manual node entry via 32-char hex hash in
  search field, with paste support and radio button selection
- StatusScreen: display stamp cost from propagation node
- UIManager: NVS persistence for selected propagation node, proactive
  path request on node selection, sync state machine with timeout
  handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 23:03:32 -05:00
torlando-tech
6744eb136d LXST voice call stability: fix hangup crash, signal queue, TX pump, mic tuning
- Fix use-after-free crash on hangup: set _call_state=IDLE before deleting
  _lxst_audio, preventing pump_call_tx() (runs without LVGL lock) from
  accessing freed memory
- Replace single-slot _call_signal_pending with 8-element ring buffer queue
  to prevent signal loss when CONNECTING+ESTABLISHED arrive in rapid succession
- Extract TX pump into pump_call_tx() called right after reticulum->loop()
  for low-latency audio TX without LVGL lock dependency (was buried at step 10)
- Tune ES7210 mic gain to 21dB (was 15dB) to improve Codec2 input level
  without ADC clipping that occurred at 24dB
- I2S capture: use APLL for accurate 8kHz clock, direct 8kHz sampling
  (no more 16→8kHz decimation), DMA 16x64 for encode burst headroom
- Reduce Reticulum log verbosity to LOG_INFO (was LOG_TRACE)
- BLE: add ble_hs_sched_reset() tiered recovery before reboot on desync,
  widen supervision timeout to 4.0s for WiFi coexistence
- Add UDP multicast log broadcasting and OTA flash support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 10:57:14 -05:00
torlando-tech
e263e1e7a6 Add OTA flashing and wireless UDP log broadcasting
ArduinoOTA enables wireless firmware uploads (pio run -e tdeck-ota -t upload).
UDP log callback via RNS::setLogCallback sends all log lines plus Serial.printf
diagnostics to multicast group 239.0.99.99:9999 for untethered monitoring.
Includes safety guards: UDP suspended during WiFi transitions, reentrancy
protection, and WiFi status check before each send.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:43:48 -05:00
torlando-tech
e343caf2d2 Stability: WDT yield, BLE mutex fixes, time-based desync recovery
Reduces crash rate from every 60-85s to 1 reboot per 6+ minutes.
Zero WDT triggers in 10-minute stability test.

BLE mutex fixes (BLEInterface.cpp):
- Release _mutex before blocking GATT ops in onConnected() and
  onServicesDiscovered() — prevents 5-30s main-loop stalls during
  service discovery, notification subscribe, identity exchange
- Non-blocking try_lock() for peerCount(), getConnectedPeerSummaries(),
  get_stats() — returns empty/default if BLE task holds mutex
- Write-without-response in initiateHandshake()

WDT and persistence (main.cpp, sdkconfig.defaults, microReticulum):
- 30s WDT timeout (up from 10s) for SPIFFS flash I/O headroom
- Register Identity::set_persist_yield_callback() to feed WDT every
  5 entries during save_known_destinations() (70+ entries = 30-50s)
- WDT feeds between reticulum and identity persist calls

BLE host desync recovery (NimBLEPlatform):
- Time-based desync tracking instead of aggressive counter-based reboot
- 60s tolerance without connections, 5 minutes with active connections
  (data still flows over existing BLE mesh links)
- Remove immediate recoverBLEStack() from 574 handler and
  enterErrorRecovery() — let startScan() manage reboot decision
- Increase CONNECTION_COOLDOWN from 3s to 10s to reduce 574 risk
- Increase SCAN_FAIL_RECOVERY_THRESHOLD from 5 to 10

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 12:30:30 -05:00
torlando-tech
3ca27f53f6 Task watchdog, BLE mutex fixes, NimBLE crash-safe recovery
Subscribe loopTask and BLE task to the ESP32 Task Watchdog (10s timeout)
to detect and recover from silent hangs. Per-step WDT feeds in the main
loop prevent false triggers from cumulative slow operations.

Fix BLE mutex starvation that blocked the main loop for 3-6s:
- Move processDiscoveredPeers() out of performMaintenance() so _mutex
  is not held during blocking NimBLE connect calls
- Use try_lock() in send_outgoing() to skip sends when BLE task has
  the mutex, rather than blocking (Reticulum retransmits)
- Switch BLE data writes to write-without-response (non-blocking)
- Add WDT feeds to all NimBLE blocking wait loops

Replace NimBLE soft-reset recovery with immediate reboot — deinit()
during sync failures caused CORRUPT HEAP panics. With atomic file
persistence, data survives reboots reliably.

Reduce loop task stack from 49KB to 16KB (measured peak ~6KB).
Add NimBLE PHY update null guard to patch_nimble.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 10:45:43 -05:00
torlando-tech
a499a2b30a Persistence reliability: NimBLE crash fix, atomic save, fast persist
NimBLE crash fix:
- Patch ble_hs.c assert(0) in BLE_HS_SYNC_STATE_BRINGUP timer handler
  via pre-build script (patch_nimble.py). The assert fires when a timer
  callback races with host re-sync — harmless, but kills the ESP32 and
  corrupts any file writes in progress.

Persistence fixes (in microReticulum submodule):
- Atomic save: write to temp file then rename, protecting existing data
- Fast persist: 5s after dirty flag instead of waiting 60s interval
- Corrupt file recovery: delete invalid files, recover from temp files
- INFO-level logging for load/save visibility

Other:
- Wrap LXMF announce in try/catch for crash safety
- Call Identity::should_persist_data() from main loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:44:38 -05:00
torlando-tech
43a7e1088f BLE P2P stability: fix ODR violation, shutdown safety, connection robustness
Fix systemic One Definition Rule violation where BLEInterface.h included
headers from deps/microReticulum/src/BLE/ while .cpp files compiled
against local lib/ble_interface/ versions, causing struct layout mismatches
(PeerInfo field shifting corrupted conn_handle/mtu) and class layout
mismatches (BLEPeerManager member differences caused LoadProhibited crash).

Key fixes:
- Include local BLE headers instead of deps versions in BLEInterface.h
- Sync PeerInfo keepalive tracking fields and BLETypes constants with deps
- Shutdown re-entrancy guard and proper client cleanup via deinit(true)
- Host sync checks before scan, advertise, and connect operations
- Avoid deadlock by deferring _on_connected from NimBLE host task
- Duplicate identity detection, stale handle cross-check in keepalives
- Bounds validation on conn_handle in setPeerHandle/promoteToIdentityKeyed
- Periodic persist_data() call for display name persistence across reboots

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 00:24:45 -05:00
torlando-tech
769c9952bd BLE P2P stability: PSRAM zero-init, pool sizing, stuck-state recovery
Root cause: Bytes objects stored in PSRAM-allocated BLEInterface had
corrupted shared_ptr members from uninitialized memory, causing crashes
in processDiscoveredPeers(). Fixed by using heap_caps_calloc instead of
heap_caps_malloc for PSRAM placement-new allocation.

Additional fixes:
- Reduce pool sizes to fit memory budget (reassembler 134KB→17KB,
  fragmenters 8→4, handshakes 32→4, pending data 64→8)
- Store local MAC as BLEAddress struct instead of Bytes to avoid
  heap allocation in PSRAM-resident object
- Move setLocalMac after platform start (NimBLE needs to be running
  for valid random address), add lazy MAC init fallback in loop()
- Add stuck-state detector: resets GAP state machine if hardware
  is idle but state machine thinks it's busy
- Enhance getLocalAddress with 3 fallback methods (NimBLE API,
  ble_hs_id_copy_addr RANDOM, esp_read_mac efuse)
- Fix C++17 structured binding to C++11 compatibility
- Increase BLE task stack 8KB→12KB for string ops in debug logs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 20:57:05 -05:00
torlando-tech
f73fbb2568 Fix LXST voice call interop with Python LXST/Columba
- Fix thread safety: defer Reticulum link callbacks (packet, link_closed)
  to call_update() which runs under LVGL lock, preventing crashes from
  concurrent LVGL access across cores
- Fix outgoing call signal handling: store link reference and re-register
  packet/link_closed callbacks in on_call_link_established so signals
  are actually received
- Fix call answer screen freeze: update UI before blocking audio init
  (I2S/ES7210/Codec2 setup) so screen renders immediately
- Fix audio direction: use startPlayback() (speaker RX) instead of
  startCapture() (mic TX) so received audio is actually heard
- Add msgpack wire format for LXST signalling and audio frames
- Add LXST IN destination for receiving calls + announce support
- Add incoming call UI (Answer/Reject buttons) on CallScreen
- Add path request before outgoing call link establishment
- Add LXST announce handler registration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:00:07 -05:00
torlando-tech
beb122606d Screen wake fix, memory monitor poll, crash breadcrumbs
- Fix slow screen wake: track screen_off_time and wake on any activity
  newer than when screen turned off (was requiring activity <1s ago)
- Add MEMORY_MONITOR_POLL() to main loop for deferred logging
- Add NVS crash breadcrumb system for LXST call debugging
- Update microReticulum submodule (memory stability fixes)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 16:41:32 -05:00
Astra
c30dba58e6 Display blackout debugging 2026-02-11 21:41:09 +09:00
Astra
512cfb27ec Don't crash when no wifi network is configured 2026-02-11 14:37:39 +09:00
torlando-tech
6ffafb9242 fix: auto-start TCP interface when WiFi connects after boot
TCP interface was never created if WiFi wasn't connected at boot time.
The settings-changed handler tried to restart it but the objects didn't
exist. Extract TCP creation into start_tcp_interface() helper and call
it from boot, settings changes, and a WiFi state-change detector in
the main loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 00:52:54 -05:00
torlando-tech
80baf98d52 feat: auto-inject firmware version from git tags at build time
Add version.py PlatformIO pre-build script that reads version from
`git describe --tags --always` and defines FIRMWARE_VERSION as a build
flag. Local builds without tags fall back to "dev". Also rename
FIRMWARE_NAME from "microReticulum" to "Pyxis".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:26:09 -05:00
torlando-tech
ac6ceca9f8 Initial commit: standalone Pyxis T-Deck firmware
Split T-Deck firmware from microReticulum examples/lxmf_tdeck/ into its
own repo. microReticulum is consumed as a git submodule dependency pinned
to feat/t-deck. All include paths updated from relative symlinks to bare
includes resolved via library build flags.

Both tdeck (NimBLE) and tdeck-bluedroid environments compile successfully.
Licensed under AGPLv3.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 19:48:33 -05:00