Commit Graph

10 Commits

Author SHA1 Message Date
torlando-agent[bot] 70d4aa6be9 feat: graft pyxis onto upstream microReticulum 0.4.1
Repins microReticulum + microLXMF onto the upstream-0.4.1 graft and adapts
pyxis to the new src/microReticulum/ layout and 0.4.x APIs. The far-diverged
0.3.0 fork's Resource/Transport/Identity work is subsumed by upstream's
reimplementation; only the still-needed fixes ride on the pinned branches
(PKCS7/HMAC/X25519 crypto -- proven byte-identical to python RNS 1.3.1 --
Packet link-proof callback, Identity short-sig guard, and the bz2 layer +
decompress-on-receive in Resource::assemble()).

Consumer-side changes:
- platformio.ini: pin microReticulum @2f21fee (pyxis-fixes-on-0.4.1) and
  microLXMF @33760d0 (chore/microreticulum-0.4.1-layout); bump microStore
  ceea8f5 -> c5fb69d (0.4.x requires the new BasicFileStore::init API);
  -std=gnu++11 -> gnu++17 (upstream requires C++17).
- Namespace all microReticulum includes (angle + quote) to <microReticulum/...>
  for the relocated layout; shim-local Utilities/Stream.h|Print.h preserved.
- Interface::send_outgoing now returns bool: update TCP/BLE/SX1262/Auto
  overrides with correct success/failure returns.
- SDArchiveFileSystem::init(bool reformatOnFail=true) to match new microStore.
- Static Transport::get_path_table() -> path_table(); instance getter unchanged.
- Remove duplicate shim Cryptography/BZ2 (microReticulum provides it now; keep
  lib/libbz2 as the ESP32 bzlib provider).
- patch_littlefs_paths.py: normalize microStore's LittleFS adapter paths to a
  leading "/" -- ESP32 Arduino LittleFS rejects "./"-prefixed paths, which
  silently broke the path store (no peer paths learned, all messaging blocked).

Validated on T-Deck Plus: builds (RAM 27.5% / Flash 77.7%), boots stable
(no WDT/panic), and a full on-device LXMF e2e (DIRECT + OPPORTUNISTIC +
bz2-compressed-Resource receive) passes 5/5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UWZuYkHBRqNb6BZHV8sTG5
2026-06-19 15:49:44 -04:00
torlando-agent[bot] d98b6f0682 chore: remove dead universal_filesystem dep + unused BLE rx counter
Greptile cleanup (greploop): universal_filesystem is no longer included by
main.cpp (migrated to microStore) -- drop the dead lib_dep so a clean build
doesn't pull its removed SPIFFS dependency. _stat_rx_packets_complete was
declared but never incremented or logged -- remove the unfinished counter.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UWZuYkHBRqNb6BZHV8sTG5
2026-06-18 23:08:24 -04:00
torlando-tech 76ffd29b01 feat(ble): TX/RX fragment + byte counters in BLE heartbeat
Pre-this the 10s heartbeat reported running/scanning/connected/peers
state but nothing about whether data was actually flowing. With the
counters added to BLEInterface and threaded into the heartbeat
snprintf, the line now also surfaces:

  tx_pkt   — outbound RNS packets attempted
  tx_frag  — BLE fragments actually written/notified
  tx_b     — total bytes written
  tx_fail  — platform write/notify returned false
  rx_frag  — BLE fragments handed to the reassembler
  rx_b     — total bytes received

That was enough to root-cause the Columba-side stalls observed
during the BLE end-to-end testing session: pyxis showed connected=1
but tx_pkt frozen, surfacing that the keepalive loop wasn't firing
for a peer whose handshake had completed but identity recording
raced. Cumulative-since-start, no reset; cheap to keep on always.
2026-05-10 15:23:09 -04:00
torlando-tech a0ff631001 Track A.5/6/7: Identity persistence + Transport stats + Interface overrides
Three API migrations to keep the graft moving against vanilla
attermann/microReticulum @ 0.3.0:

(A.5) Identity persistence migrated to OS::set_loop_callback.
  Was:  Identity::set_persist_yield_callback(cb)        // fork-only
        Identity::should_persist_data()                 // fork-only
  Now:  RNS::Utilities::OS::set_loop_callback(cb)       // upstream global
        reticulum->should_persist_data()                // already used
  The fork's split between Identity-specific 5s fast-flush and
  Reticulum-level 60s full-persist is unified upstream into a single
  Reticulum::should_persist_data() entry point. The fast cadence is
  folded into microStore's dirty-tracking. If we observe excessive
  lost-known-destinations after crashes, revisit microStore's flush
  cadence rather than re-adding the fork-only Identity API.

(A.6) Transport stats diagnostics disabled — vanilla upstream doesn't
  expose the *_count() getter family the fork added. Two [TABLES]
  diagnostic blocks in main.cpp now print a placeholder. Restore by
  porting to upstream's get_path_table().size() and friends, or PR the
  getters back to upstream Transport. Tracked in
  pyxis_microReticulum_graft_spike_findings.md.

(A.7) BLE/SX1262 Interface stat methods are no longer virtual overrides.
  Vanilla upstream Interface base class doesn't declare get_stats /
  get_rssi / get_snr. Kept the methods as plain (non-virtual)
  BLEInterface / SX1262Interface members; callers needing stats access
  must hold the concrete type, not the base Interface*. Propose
  upstream PR adding to base API if polymorphic access matters.

Also: setLogCallback -> set_log_callback (renamed in upstream commit
4d6f0b9 "Added dual-class PSRAM/TLSF allocator system").

Pyxis still doesn't build — next failures (4 distinct):
  - OS::register_filesystem signature changed to microStore::FileSystem&.
    Real microStore migration needed for UniversalFileSystem.
  - LXMRouter::process_sync still missing despite vendored src-shim copy.
    Include-order or shadowing — needs investigation.
  - MEMORY_MONITOR_POLL macro not picked up despite -I src-shim/Instrumentation.
  - Identity::should_persist_data appears to still be referenced via
    LXMF or another vendored layer — would surface once the above land.
2026-05-04 20:25:02 -04:00
torlando-tech d6d4eb2c9c BLE stability: defer disconnect processing, fix data races, harden operations
Critical fixes for NimBLE host task / BLE loop task concurrency:
- Defer all disconnect map cleanup from NimBLE callbacks to loop task via
  SPSC ring buffer, preventing iterator invalidation and use-after-free
- Defer enterErrorRecovery() from callback context to loop task
- Add WDT feed in enterErrorRecovery() host-sync polling loop

Operational hardening:
- Cache NimBLERemoteCharacteristic* pointers in write() to avoid repeated
  service/characteristic lookups per fragment
- Add isConnected() checks before GATT operations (read, enableNotifications)
- Validate peer address in notification callback to guard against handle reuse
- Skip stuck-state detector during CONNECTING/CONN_STARTING states
- Expire stale pending data entries after HANDSHAKE_TIMEOUT (30s)
- Read actual connection RSSI via ble_gap_conn_rssi() for peripheral connections
  instead of hardcoding 0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 00:15:24 -05:00
torlando-tech e343caf2d2 Stability: WDT yield, BLE mutex fixes, time-based desync recovery
Reduces crash rate from every 60-85s to 1 reboot per 6+ minutes.
Zero WDT triggers in 10-minute stability test.

BLE mutex fixes (BLEInterface.cpp):
- Release _mutex before blocking GATT ops in onConnected() and
  onServicesDiscovered() — prevents 5-30s main-loop stalls during
  service discovery, notification subscribe, identity exchange
- Non-blocking try_lock() for peerCount(), getConnectedPeerSummaries(),
  get_stats() — returns empty/default if BLE task holds mutex
- Write-without-response in initiateHandshake()

WDT and persistence (main.cpp, sdkconfig.defaults, microReticulum):
- 30s WDT timeout (up from 10s) for SPIFFS flash I/O headroom
- Register Identity::set_persist_yield_callback() to feed WDT every
  5 entries during save_known_destinations() (70+ entries = 30-50s)
- WDT feeds between reticulum and identity persist calls

BLE host desync recovery (NimBLEPlatform):
- Time-based desync tracking instead of aggressive counter-based reboot
- 60s tolerance without connections, 5 minutes with active connections
  (data still flows over existing BLE mesh links)
- Remove immediate recoverBLEStack() from 574 handler and
  enterErrorRecovery() — let startScan() manage reboot decision
- Increase CONNECTION_COOLDOWN from 3s to 10s to reduce 574 risk
- Increase SCAN_FAIL_RECOVERY_THRESHOLD from 5 to 10

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 12:30:30 -05:00
torlando-tech 43a7e1088f BLE P2P stability: fix ODR violation, shutdown safety, connection robustness
Fix systemic One Definition Rule violation where BLEInterface.h included
headers from deps/microReticulum/src/BLE/ while .cpp files compiled
against local lib/ble_interface/ versions, causing struct layout mismatches
(PeerInfo field shifting corrupted conn_handle/mtu) and class layout
mismatches (BLEPeerManager member differences caused LoadProhibited crash).

Key fixes:
- Include local BLE headers instead of deps versions in BLEInterface.h
- Sync PeerInfo keepalive tracking fields and BLETypes constants with deps
- Shutdown re-entrancy guard and proper client cleanup via deinit(true)
- Host sync checks before scan, advertise, and connect operations
- Avoid deadlock by deferring _on_connected from NimBLE host task
- Duplicate identity detection, stale handle cross-check in keepalives
- Bounds validation on conn_handle in setPeerHandle/promoteToIdentityKeyed
- Periodic persist_data() call for display name persistence across reboots

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 00:24:45 -05:00
torlando-tech 769c9952bd BLE P2P stability: PSRAM zero-init, pool sizing, stuck-state recovery
Root cause: Bytes objects stored in PSRAM-allocated BLEInterface had
corrupted shared_ptr members from uninitialized memory, causing crashes
in processDiscoveredPeers(). Fixed by using heap_caps_calloc instead of
heap_caps_malloc for PSRAM placement-new allocation.

Additional fixes:
- Reduce pool sizes to fit memory budget (reassembler 134KB→17KB,
  fragmenters 8→4, handshakes 32→4, pending data 64→8)
- Store local MAC as BLEAddress struct instead of Bytes to avoid
  heap allocation in PSRAM-resident object
- Move setLocalMac after platform start (NimBLE needs to be running
  for valid random address), add lazy MAC init fallback in loop()
- Add stuck-state detector: resets GAP state machine if hardware
  is idle but state machine thinks it's busy
- Enhance getLocalAddress with 3 fallback methods (NimBLE API,
  ble_hs_id_copy_addr RANDOM, esp_read_mac efuse)
- Fix C++17 structured binding to C++11 compatibility
- Increase BLE task stack 8KB→12KB for string ops in debug logs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 20:57:05 -05:00
torlando-tech ec181e18f8 BLE interop fixes: pools, keepalive tracking, data buffering
- Replace std::map and std::vector with fixed-size pools in
  BLEInterface (fragmenters, pending handshakes, pending data)
- Track keepalive failures and disconnect after 3 consecutive
- Force-disconnect zombie peers detected by BLEPeerManager
- Add periodic advertising refresh (every 60s) to combat silent stops
- Buffer incoming data when identity not yet mapped instead of dropping
- Subtract ATT_OVERHEAD from MTU in NimBLEPlatform connection setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 00:18:02 -05:00
torlando-tech ac6ceca9f8 Initial commit: standalone Pyxis T-Deck firmware
Split T-Deck firmware from microReticulum examples/lxmf_tdeck/ into its
own repo. microReticulum is consumed as a git submodule dependency pinned
to feat/t-deck. All include paths updated from relative symlinks to bare
includes resolved via library build flags.

Both tdeck (NimBLE) and tdeck-bluedroid environments compile successfully.
Licensed under AGPLv3.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 19:48:33 -05:00