Hook equivalent to reticulum-kt's :rns-test:test and reticulum-swift's
Tests/Interop/. Each pyxis PR now runs the microReticulum conformance
bridge against canonical Python RNS, so a submodule-pin bump or any
change that breaks byte-equivalence with python is caught at PR time
instead of only by the reticulum-conformance repo's own CI (which
runs against pyxis main, not the PR branch).
Mechanism:
- Check out THIS pyxis branch + reticulum-conformance + markqvist/Reticulum
+ markqvist/LXMF
- Build the microReticulumBridge with -DMICRORETICULUM_DIR pointing at
this branch's deps/microReticulum
- Run the same deselect set we lock in reticulum-conformance/.github/
workflows/microreticulum.yml so the two CI surfaces stay in sync
Locked baseline: 52 passing against the pyxis fork submodule
(feat/t-deck @ ca355e5). The conformance CI on the spike/graft branch
will report a different number once the graft progresses — that's the
intended signal.
Greptile review feedback on PR #21:
ACCEPT:
- test_patch_nimble.py:151 (P1) — replace dead `if False else True`
ternary with a real assertion that "already applied" is absent on
the first run.
- test_patch_nimble.py:247 (P1) — invoke the shim subprocess via
`sys.executable` instead of hardcoded `/usr/bin/python3` so CI's
setup-python interpreter is used consistently.
- workflows/test.yml:50 (P2) — include hash of
deps/microReticulum/platformio.ini in PlatformIO cache key so the
cache invalidates when dependencies change.
MODIFY (narrowed):
- test_ring_buffers.cpp:209 (P2) — keep both `write(data, 0)` and
`write(data, -1)` assertions, but add a comment clarifying that
EncodedRingBuffer::write() takes signed `int length` (not size_t),
so -1 hits the `length <= 0` branch — same as 0. Greptile's
premise (size_t wrap to SIZE_MAX) does not apply to this codebase.
The two assertions lock the contract in case the param is ever
migrated to size_t.
REJECT (silently — no public reply per agent policy):
- test_audio_filters.cpp:237 (P1) — VoiceFilterChain::process()
takes `numSamples = frames * channels` per the documented
contract in audio_filters.h:33-40, and the implementation does
`numFrames = numSamples / channels_` (audio_filters.cpp:63). The
multichannel test correctly passes `(int)samples.size() = 8000`
(4000 frames * 2 channels). No out-of-bounds read occurs.
- lib/lxst_audio/{packet,encoded}_ring_buffer.cpp use malloc/free without
including <cstdlib>. macOS leaks it via header transitivity but Linux
clang is stricter — real portability bug surfaced by the new pytest CI.
- microReticulum native17 tests link against system libbz2 via the fork's
pre:link_bz2.py script. Ubuntu runners need libbz2-dev installed.
Standalone C++ tests of pyxis-unique code (BLE fragmenter/reassembler,
peer manager, GATT op queue, LXST ring buffers, audio filters, HDLC
framing) plus Python tests of the patch_nimble.py build script.
Each C++ test is compiled directly by clang++/g++ with shims in
tests/native/ (Bytes.h, Log.h, Utilities/OS.h) so pyxis sources can build
without microReticulum's full Arduino/MsgPack dep tree. A pytest wrapper
per test compiles, runs, and parses the summary line — the whole suite
is one command: `pytest tests/build_scripts tests/native -v`.
Total: 13 pytest tests, ~72 underlying C++ assertions, 3.4s.
Surfaced an HPF-formula bug in lxst_audio (mirrored upstream in
LXST-kt/native_audio_filters.cpp) — filed as LXST-kt#13 and tracked
in the corresponding test with a TODO link.
CI workflow runs the pyxis pytest suite plus the clean-passing
microReticulum native17 unit tests (94/114 of the existing fork
test/* suites) on push and PR.
The unprotected _clients.find() could race with
processPendingDisconnects() erasing from the map concurrently.
Mutex is released before the blocking getService() GATT call.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, a mutex timeout left characteristic caches empty but
still signalled success to callers, making all GATT ops silently
fail for the connection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Re-check hasActiveWriteOperations() after acquiring mutex in
processPendingDisconnects() to close race where write() registers
an op between the pre-mutex check and mutex acquisition
- Move cached char pointer writes inside connection-exists guard in
discoverServices() to prevent dangling pointers on handle reuse
- Add WARNING logs to both onConnect callbacks on mutex timeout
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move getService()/getCharacteristic() out of mutex-held paths in
writeCharacteristic(), read(), enableNotifications() by caching all
three char pointers (RX, TX, Identity) during discoverServices()
- Replace 5-second spin-wait in processPendingDisconnects() with
non-blocking deferral: break if GATT ops in flight, retry next loop
- Add WARNING logs to all read-path helpers on mutex timeout
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move beginWriteOperation() before xSemaphoreGive(_conn_mutex) in
write(), writeCharacteristic(), read(), and enableNotifications()
so the active-op counter is incremented while the mutex is still
held. This closes the window where processPendingDisconnects()
could observe hasActiveWriteOperations()==false and delete the
client before the GATT caller has registered its operation.
- Add _conn_mutex around _connections/_clients insertions in both
server and client onConnect() callbacks, preventing concurrent
map insertions from corrupting the red-black tree.
- Protect updateConnectionMTU() with _conn_mutex.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
writeCharacteristic(), read(), and enableNotifications() resolve
characteristic pointers under _conn_mutex then call blocking GATT
ops after releasing it — same pattern as write(). Without the
active-operation guard, processPendingDisconnects() could delete
the client (and its child characteristics) during the GATT call.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Defer NimBLEDevice::deleteClient() in processPendingDisconnects()
until after releasing _conn_mutex and waiting for any active write
operations to complete. Prevents use-after-free when write() holds
a child NimBLERemoteCharacteristic* pointer across the mutex boundary.
- Add _conn_mutex protection to getConnectionCount(), isConnectedTo(),
and isDeviceConnected() which read _connections without synchronization.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
send_outgoing() on loopTask (core 1) calls write() which reads
_connections, _clients, and _cached_rx_chars maps, while
processPendingDisconnects() on the BLE task (core 0) erases from
them — with no synchronization. This causes std::map red-black tree
corruption, manifesting as LoadProhibited crashes in map rotate/insert
operations (EXCVADDR=0x00000008).
Protect all map accesses in write(), writeCharacteristic(), read(),
enableNotifications(), getConnection(), getConnections(), and
processPendingDisconnects() with _conn_mutex. The mutex is released
before any blocking GATT operations (writeValue, readValue, subscribe)
to avoid holding it during 10-30s NimBLE timeouts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidate #ifndef/#ifdef into single #ifdef/#else/#endif block.
Add warning comment to generated header about static linkage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move BG_COLOR inline into #ifndef block to avoid unused variable
when HAS_SPLASH_IMAGE is defined. Make show_splash() private since
it's only called internally from init_hardware_only().
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Matches the guard already on notify() to prevent use-after-free
of _tx_char during a NimBLE host reset.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BLE task is no longer subscribed to WDT, so these 23 calls were
silently returning ESP_ERR_NOT_FOUND. Removes dead code and the
now-unused esp_task_wdt.h include.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents double PSRAM allocation and LVGL driver re-registration
if init() were called more than once.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scale constellation to 80% and shift down to make room for title text.
Generate splash at full display resolution instead of 160x160 centered.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- pyxis-icon.svg: Pyxis constellation icon (3 stars with connecting lines)
- generate_splash.py: PlatformIO pre-build script that renders the SVG to
a 160x160 RGB565 PROGMEM header (SplashImage.h) using cairosvg + Pillow
- .gitignore: Exclude generated SplashImage.h
- platformio.ini: Add generate_splash.py to both build environments
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move Display::init_hardware_only() and POWER_EN to right after serial
banner, before GPS/WiFi/SD/Reticulum init. Add 150ms delay after
POWER_EN HIGH so ST7789V power rail stabilizes before SPI commands
(without this, SWRESET is sent to an unpowered chip and silently lost).
Splash now visible for entire boot period (~18s) until LVGL takes over.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two root causes for frequent device reboots:
1. LVGL task (priority 2) starved loopTask (priority 1) on core 1.
During heavy screen rendering, loopTask couldn't run for 30+ seconds,
triggering the Task WDT. Fixed by lowering LVGL to priority 1 so
FreeRTOS round-robins both tasks fairly.
2. BLE task was registered with the 30s Task WDT, but blocking NimBLE
GATT operations (connect + service discovery + subscribe + read) can
legitimately take 30-60s total. Removed BLE task from WDT since
NimBLE has its own internal ~30s timeouts per GATT operation.
Also added ble_hs_synced() guards to write(), read(), notify(),
writeCharacteristic(), discoverServices(), and enableNotifications()
to prevent use-after-free on stale NimBLE client pointers during
host resets.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Display and LoRa were creating separate SPIClass(HSPI) instances which
claimed GPIO pins via the matrix, preventing SD card (on FSPI) from
accessing MISO after Display init. Now all three peripherals use the
global SPI (FSPI) instance, eliminating GPIO routing conflicts.
- Display: use &SPI instead of new SPIClass(HSPI)
- SX1262Interface: use &SPI instead of new SPIClass(HSPI)
- SDAccess: enable format_if_empty for unformatted cards
Verified on device: SD (128GB SDHC), display, and LoRa all coexist.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SD card was unresponsive (MISO stuck 0xFF) because Display's HSPI
peripheral had already claimed the GPIO pins via the matrix, preventing
FSPI from routing MISO. Fix by initializing SD card BEFORE Display,
using the global SPI (FSPI) instance — matching LilyGo's reference code.
- Move SD card init before display init in boot sequence
- Use global SPI (FSPI) instead of Display's SPIClass(HSPI)
- Lower SPI frequency to 800kHz matching LilyGo example
- Drive all CS lines (display, LoRa, SD) high before SD init
- Add MISO=38 to Display's SPI.begin for post-init bus sharing
- Add Display::get_spi() accessor for future shared use
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The T-Deck Plus shares HSPI across the display (CS=12), LoRa (CS=9),
and SD card (CS=39). Previously SD logging was disabled because
SD.begin() reconfigured the SPI bus and blanked the display.
This introduces a FreeRTOS mutex created in main.cpp and injected into
Display, SX1262Interface, and a new SDAccess class so all three
peripherals serialize their SPI transactions safely.
- Add SDAccess class wrapping SD.begin() and file ops with mutex
- Add set_spi_mutex() to Display and SX1262Interface
- Wrap Display flush, fill, draw, and power ops in mutex
- Refactor SDLogger to use SDAccess mutex instead of owning SD.begin()
- Wire up mutex creation and injection order in setup()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
grep -q 200 could match 2001 or other superstrings; -qx anchors
to the full line which is correct for curl's %{http_code} output.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs both tdeck and tdeck-bluedroid builds on every PR to main.
Uploads firmware binaries as artifacts on the tdeck build for
easy testing by reviewers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
UIManager.cpp includes Tone.h, so tdeck_ui should declare this
dependency rather than relying on implicit global discovery.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generates a styled HTML page from README.md and deploys it to the root
of the GitHub Pages site, fixing the 404 at torlando-tech.github.io/pyxis/.
Includes navigation links to the web flasher, GitHub repo, and releases.
Closes#6
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GitHub release download URLs redirect to release-assets.githubusercontent.com
which doesn't return Access-Control-Allow-Origin headers. The browser blocks
cross-origin fetches from the GitHub Pages flasher, causing "Failed to fetch"
for any versioned release while the latest dev build (same-origin) works fine.
Fix: Deploy versioned firmware binaries to GitHub Pages alongside the dev
build at firmware/releases/{tag}/, so all versions are fetched same-origin.
The CI workflow now downloads existing release assets and deploys them to
Pages with keep_files: true to preserve across deploys.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After a 574 connection failure, the NimBLE controller's scan state can
become corrupted (returning rc=530 / Invalid HCI Params) even after the
host re-syncs. This led to scan failure escalation and device reboots.
Key fixes:
- Add ble_gap_conn_cancel() to enterErrorRecovery() — stuck GAP master
connection operations were blocking all subsequent scans
- Add ble_hs_sched_reset(BLE_HS_ECONTROLLER) in error recovery to force
a full host-controller resynchronization after desync
- Proactively cancel stale GAP connections before scan start
- Reduce SCAN_FAIL_RECOVERY_THRESHOLD from 10 to 5 for faster recovery
- Enhanced scan failure logging with GAP state diagnostics
- Move ESP reset reason logging after WiFi init for UDP log visibility
- Suppress connection candidate log spam when at max connections
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 30-second cooldown after NimBLE host desync recovery before
allowing new connection attempts. During desync, client->connect()
blocks waiting for a host-task completion event that never arrives,
causing WDT crashes. The cooldown skips connection attempts while
the host is desynced or recently recovered.
Also adds ESP reset reason logging at boot to diagnose crash types
(WDT, panic, brownout, etc.) in soak test logs.
Soak test results: Run 3 (before) had 17 reboots in ~4 hours with
a 12-crash-in-14-minutes loop. Run 4 (after) has 1 early reboot
then 19+ hours of continuous uptime with the same desync frequency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
platformio.ini:
- Replace -I${PROJECT_DIR}/lib, -I${PROJECT_DIR}/deps/... with relative
paths (-Ilib, -Ideps/...) in both tdeck-bluedroid and tdeck environments;
${PROJECT_DIR} is mangled on Windows inside build_flags, causing include
paths to resolve inside the PlatformIO builder directory instead of the
project root
- Remove hardcoded -I.pio/libdeps/tdeck/TinyGPSPlus/src and
-I.pio/libdeps/tdeck/NimBLE-Arduino/src; these paths reference generated
cache, break on fresh clones, and are redundant with lib_ldf_mode = deep+
- Fix OTA upload_command: replace python3 with $PYTHONEXE so it resolves
to PlatformIO's bundled Python on Windows, macOS, and Linux
src/main.cpp, lib/tdeck_ui/UI/LXMF/UIManager.cpp:
- Change #include "tone/Tone.h" to #include "Tone.h"; PlatformIO
automatically adds -Ilib/tone for local libraries, making the
subdirectory prefix unnecessary and broken when -Ilib is not effective
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Patch 3 (ble_gap.c): Handle BLE_ERR_CONN_ESTABLISHMENT (574) unconditionally.
NimBLE only handled 574 under BLE_PERIODIC_ADV_WITH_RESPONSES (disabled on
ESP32), causing ble_gap_master_failed() to never be called. This left the
master GAP state stuck in BLE_GAP_OP_M_CONN, permanently blocking scan and
advertising. Also clean up master state in the default case instead of
assert(0).
Patch 4 (NimBLEDevice.cpp): Expose host reset reason via global volatile int.
NimBLE's onReset callback logs the reason code through ESP_LOG (serial UART
only). This patch adds nimble_host_reset_reason that the BLE loop polls to
capture the reason in UDP log output for remote soak test monitoring.
NimBLEPlatform.cpp: Escalate persistent scan failures to full stack recovery.
After 3 consecutive enterErrorRecovery() rounds fail to restore scanning (30
total scan failures), escalate to recoverBLEStack() (clean reboot) instead
of looping indefinitely in a broken state.
Validated with 17+ hour soak test: device recovers from desyncs and maintains
3 active BLE connections with stable heap (~43K).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>