Repins microReticulum + microLXMF onto the upstream-0.4.1 graft and adapts
pyxis to the new src/microReticulum/ layout and 0.4.x APIs. The far-diverged
0.3.0 fork's Resource/Transport/Identity work is subsumed by upstream's
reimplementation; only the still-needed fixes ride on the pinned branches
(PKCS7/HMAC/X25519 crypto -- proven byte-identical to python RNS 1.3.1 --
Packet link-proof callback, Identity short-sig guard, and the bz2 layer +
decompress-on-receive in Resource::assemble()).
Consumer-side changes:
- platformio.ini: pin microReticulum @2f21fee (pyxis-fixes-on-0.4.1) and
microLXMF @33760d0 (chore/microreticulum-0.4.1-layout); bump microStore
ceea8f5 -> c5fb69d (0.4.x requires the new BasicFileStore::init API);
-std=gnu++11 -> gnu++17 (upstream requires C++17).
- Namespace all microReticulum includes (angle + quote) to <microReticulum/...>
for the relocated layout; shim-local Utilities/Stream.h|Print.h preserved.
- Interface::send_outgoing now returns bool: update TCP/BLE/SX1262/Auto
overrides with correct success/failure returns.
- SDArchiveFileSystem::init(bool reformatOnFail=true) to match new microStore.
- Static Transport::get_path_table() -> path_table(); instance getter unchanged.
- Remove duplicate shim Cryptography/BZ2 (microReticulum provides it now; keep
lib/libbz2 as the ESP32 bzlib provider).
- patch_littlefs_paths.py: normalize microStore's LittleFS adapter paths to a
leading "/" -- ESP32 Arduino LittleFS rejects "./"-prefixed paths, which
silently broke the path store (no peer paths learned, all messaging blocked).
Validated on T-Deck Plus: builds (RAM 27.5% / Flash 77.7%), boots stable
(no WDT/panic), and a full on-device LXMF e2e (DIRECT + OPPORTUNISTIC +
bz2-compressed-Resource receive) passes 5/5.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UWZuYkHBRqNb6BZHV8sTG5
Greptile cleanup (greploop): universal_filesystem is no longer included by
main.cpp (migrated to microStore) -- drop the dead lib_dep so a clean build
doesn't pull its removed SPIFFS dependency. _stat_rx_packets_complete was
declared but never incremented or logged -- remove the unfinished counter.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UWZuYkHBRqNb6BZHV8sTG5
Pre-this the 10s heartbeat reported running/scanning/connected/peers
state but nothing about whether data was actually flowing. With the
counters added to BLEInterface and threaded into the heartbeat
snprintf, the line now also surfaces:
tx_pkt — outbound RNS packets attempted
tx_frag — BLE fragments actually written/notified
tx_b — total bytes written
tx_fail — platform write/notify returned false
rx_frag — BLE fragments handed to the reassembler
rx_b — total bytes received
That was enough to root-cause the Columba-side stalls observed
during the BLE end-to-end testing session: pyxis showed connected=1
but tx_pkt frozen, surfacing that the keepalive loop wasn't firing
for a peer whose handshake had completed but identity recording
raced. Cumulative-since-start, no reset; cheap to keep on always.
Three API migrations to keep the graft moving against vanilla
attermann/microReticulum @ 0.3.0:
(A.5) Identity persistence migrated to OS::set_loop_callback.
Was: Identity::set_persist_yield_callback(cb) // fork-only
Identity::should_persist_data() // fork-only
Now: RNS::Utilities::OS::set_loop_callback(cb) // upstream global
reticulum->should_persist_data() // already used
The fork's split between Identity-specific 5s fast-flush and
Reticulum-level 60s full-persist is unified upstream into a single
Reticulum::should_persist_data() entry point. The fast cadence is
folded into microStore's dirty-tracking. If we observe excessive
lost-known-destinations after crashes, revisit microStore's flush
cadence rather than re-adding the fork-only Identity API.
(A.6) Transport stats diagnostics disabled — vanilla upstream doesn't
expose the *_count() getter family the fork added. Two [TABLES]
diagnostic blocks in main.cpp now print a placeholder. Restore by
porting to upstream's get_path_table().size() and friends, or PR the
getters back to upstream Transport. Tracked in
pyxis_microReticulum_graft_spike_findings.md.
(A.7) BLE/SX1262 Interface stat methods are no longer virtual overrides.
Vanilla upstream Interface base class doesn't declare get_stats /
get_rssi / get_snr. Kept the methods as plain (non-virtual)
BLEInterface / SX1262Interface members; callers needing stats access
must hold the concrete type, not the base Interface*. Propose
upstream PR adding to base API if polymorphic access matters.
Also: setLogCallback -> set_log_callback (renamed in upstream commit
4d6f0b9 "Added dual-class PSRAM/TLSF allocator system").
Pyxis still doesn't build — next failures (4 distinct):
- OS::register_filesystem signature changed to microStore::FileSystem&.
Real microStore migration needed for UniversalFileSystem.
- LXMRouter::process_sync still missing despite vendored src-shim copy.
Include-order or shadowing — needs investigation.
- MEMORY_MONITOR_POLL macro not picked up despite -I src-shim/Instrumentation.
- Identity::should_persist_data appears to still be referenced via
LXMF or another vendored layer — would surface once the above land.
Critical fixes for NimBLE host task / BLE loop task concurrency:
- Defer all disconnect map cleanup from NimBLE callbacks to loop task via
SPSC ring buffer, preventing iterator invalidation and use-after-free
- Defer enterErrorRecovery() from callback context to loop task
- Add WDT feed in enterErrorRecovery() host-sync polling loop
Operational hardening:
- Cache NimBLERemoteCharacteristic* pointers in write() to avoid repeated
service/characteristic lookups per fragment
- Add isConnected() checks before GATT operations (read, enableNotifications)
- Validate peer address in notification callback to guard against handle reuse
- Skip stuck-state detector during CONNECTING/CONN_STARTING states
- Expire stale pending data entries after HANDSHAKE_TIMEOUT (30s)
- Read actual connection RSSI via ble_gap_conn_rssi() for peripheral connections
instead of hardcoding 0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reduces crash rate from every 60-85s to 1 reboot per 6+ minutes.
Zero WDT triggers in 10-minute stability test.
BLE mutex fixes (BLEInterface.cpp):
- Release _mutex before blocking GATT ops in onConnected() and
onServicesDiscovered() — prevents 5-30s main-loop stalls during
service discovery, notification subscribe, identity exchange
- Non-blocking try_lock() for peerCount(), getConnectedPeerSummaries(),
get_stats() — returns empty/default if BLE task holds mutex
- Write-without-response in initiateHandshake()
WDT and persistence (main.cpp, sdkconfig.defaults, microReticulum):
- 30s WDT timeout (up from 10s) for SPIFFS flash I/O headroom
- Register Identity::set_persist_yield_callback() to feed WDT every
5 entries during save_known_destinations() (70+ entries = 30-50s)
- WDT feeds between reticulum and identity persist calls
BLE host desync recovery (NimBLEPlatform):
- Time-based desync tracking instead of aggressive counter-based reboot
- 60s tolerance without connections, 5 minutes with active connections
(data still flows over existing BLE mesh links)
- Remove immediate recoverBLEStack() from 574 handler and
enterErrorRecovery() — let startScan() manage reboot decision
- Increase CONNECTION_COOLDOWN from 3s to 10s to reduce 574 risk
- Increase SCAN_FAIL_RECOVERY_THRESHOLD from 5 to 10
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix systemic One Definition Rule violation where BLEInterface.h included
headers from deps/microReticulum/src/BLE/ while .cpp files compiled
against local lib/ble_interface/ versions, causing struct layout mismatches
(PeerInfo field shifting corrupted conn_handle/mtu) and class layout
mismatches (BLEPeerManager member differences caused LoadProhibited crash).
Key fixes:
- Include local BLE headers instead of deps versions in BLEInterface.h
- Sync PeerInfo keepalive tracking fields and BLETypes constants with deps
- Shutdown re-entrancy guard and proper client cleanup via deinit(true)
- Host sync checks before scan, advertise, and connect operations
- Avoid deadlock by deferring _on_connected from NimBLE host task
- Duplicate identity detection, stale handle cross-check in keepalives
- Bounds validation on conn_handle in setPeerHandle/promoteToIdentityKeyed
- Periodic persist_data() call for display name persistence across reboots
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: Bytes objects stored in PSRAM-allocated BLEInterface had
corrupted shared_ptr members from uninitialized memory, causing crashes
in processDiscoveredPeers(). Fixed by using heap_caps_calloc instead of
heap_caps_malloc for PSRAM placement-new allocation.
Additional fixes:
- Reduce pool sizes to fit memory budget (reassembler 134KB→17KB,
fragmenters 8→4, handshakes 32→4, pending data 64→8)
- Store local MAC as BLEAddress struct instead of Bytes to avoid
heap allocation in PSRAM-resident object
- Move setLocalMac after platform start (NimBLE needs to be running
for valid random address), add lazy MAC init fallback in loop()
- Add stuck-state detector: resets GAP state machine if hardware
is idle but state machine thinks it's busy
- Enhance getLocalAddress with 3 fallback methods (NimBLE API,
ble_hs_id_copy_addr RANDOM, esp_read_mac efuse)
- Fix C++17 structured binding to C++11 compatibility
- Increase BLE task stack 8KB→12KB for string ops in debug logs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace std::map and std::vector with fixed-size pools in
BLEInterface (fragmenters, pending handshakes, pending data)
- Track keepalive failures and disconnect after 3 consecutive
- Force-disconnect zombie peers detected by BLEPeerManager
- Add periodic advertising refresh (every 60s) to combat silent stops
- Buffer incoming data when identity not yet mapped instead of dropping
- Subtract ATT_OVERHEAD from MTU in NimBLEPlatform connection setup
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split T-Deck firmware from microReticulum examples/lxmf_tdeck/ into its
own repo. microReticulum is consumed as a git submodule dependency pinned
to feat/t-deck. All include paths updated from relative symlinks to bare
includes resolved via library build flags.
Both tdeck (NimBLE) and tdeck-bluedroid environments compile successfully.
Licensed under AGPLv3.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>