Files
meshcore-analyzer/docs/specs/startup-performance.md

10 KiB
Raw Blame History

Startup Performance: Serve HTTP Within 2 Minutes on Any Database Size

Problem

CoreScope takes 3045 minutes to start on large databases (325K transmissions, 7.3M observations, 1.4GB SQLite). The HTTP server is completely unavailable during this time. Operators cannot restart without 30+ minutes of downtime.

Where time goes (7.3M observation benchmark)

Phase Time Blocking?
Load() — read SQLite → memory ~90s Yes
Build subpath index ~20s Yes
Build distance index ~15s Yes
Build path-hop index <1s Yes
Load neighbor edges from SQLite <1s Yes
Backfill resolved_path for NULL observations 2030+ min Yes — the killer
Re-pick best observations ~10s Yes

The backfill calls resolvePathForObs for every observation with resolved_path IS NULL, then writes results back to SQLite and updates in-memory state. On first run (or after schema migration), this means resolving all 7.3M observations.

Root cause

backfillResolvedPaths() in neighbor_persist.go runs synchronously in main() before httpServer.ListenAndServe(). It:

  1. Collects all observations with ResolvedPath == nil under a read lock
  2. Resolves paths (CPU-bound, ~millions of calls to resolvePathForObs)
  3. Writes results to SQLite in a single transaction
  4. Updates in-memory state under a write lock

Steps 24 block the main goroutine for 2030 minutes.

Solution: Async Chunked Backfill

Design

Move backfillResolvedPaths out of the startup critical path. Start the HTTP server immediately after loading data and building indexes. Run backfill in a background goroutine with chunked processing that yields between batches.

Startup sequence (new)

1. OpenDB, verify tables                    (~1s)
2. store.Load()                             (~90s)
3. ensureNeighborEdgesTable                  (<1s)
4. ensureResolvedPathColumn                  (<1s)
5. Load/build neighbor graph                 (<1s)
6. Build subpath/distance/path-hop indexes   (~35s)
7. pickBestObservation (with whatever        (~10s)
   resolved_path data exists)
8. *** START HTTP SERVER ***                 — serving at ~2min mark
9. Background: backfillResolvedPaths         (20-30 min, non-blocking)
   → chunked, yields between batches
   → updates in-memory + SQLite incrementally
   → re-picks best obs for affected txs

Total time to first HTTP response: ~2 minutes regardless of database size.

Implementation details

1. Background backfill goroutine

// In main(), after starting HTTP server:
go func() {
    backfillResolvedPathsAsync(store, dbPath, 5000, 100*time.Millisecond)
}()

The async backfill processes observations in chunks of N (e.g., 5,000):

func backfillResolvedPathsAsync(store *PacketStore, dbPath string, chunkSize int, yieldDuration time.Duration) {
    for {
        n := backfillResolvedPathsChunk(store, dbPath, chunkSize)
        if n == 0 {
            break // done
        }
        log.Printf("[store] backfilled resolved_path for %d observations (async)", n)
        time.Sleep(yieldDuration) // yield to HTTP handlers
    }
    log.Printf("[store] async resolved_path backfill complete")
}

Each chunk:

  1. Takes a read lock, collects up to chunkSize pending observations, releases lock
  2. Resolves paths (no lock held — resolvePathForObs only reads immutable data)
  3. Opens a separate RW SQLite connection, writes results in a transaction
  4. Takes a write lock, updates in-memory obs.ResolvedPath and re-picks best obs for affected transmissions, releases lock
  5. Sleeps briefly to yield CPU/lock time to HTTP handlers

2. Readiness flag and API degraded-mode header

Add a boolean to PacketStore:

type PacketStore struct {
    // ...
    backfillComplete atomic.Bool
}

API responses include a header during backfill:

X-CoreScope-Status: backfilling
X-CoreScope-Backfill-Remaining: 4523000

After backfill completes:

X-CoreScope-Status: ready

The frontend can read this header and show a subtle banner: "Resolving hop paths… some paths may show abbreviated pubkeys."

3. Index rebuilds

The subpath, distance, and path-hop indexes are built during startup from whatever data exists. During backfill, newly resolved paths need to update these indexes incrementally.

Options (in order of preference):

Option A: Defer index updates to end of backfill. Indexes work fine with unresolved paths — they just produce slightly less precise results. After backfill completes, rebuild indexes once. Simple, correct, low risk.

Option B: Incremental index updates per chunk. After each chunk, update affected index entries. More complex, better real-time accuracy. Only worth it if index accuracy during backfill matters for production use.

Recommendation: Option A. The indexes are usable with unresolved paths. A single rebuild at the end (~35s) is cheap compared to the backfill duration. The API works throughout — results just improve after backfill finishes.

4. SQLite contention

The backfill opens a separate RW connection for writes. The main server uses a read-only connection for polling. SQLite WAL mode (already in use) allows concurrent readers and one writer. Contention risk is minimal:

  • Write transactions are small (5,000 UPDATEs per chunk, batched in a single tx)
  • Read queries from HTTP handlers are unaffected by WAL writes
  • The 100ms yield between chunks prevents sustained write pressure

5. Lock contention

The write lock is held only during the in-memory update phase of each chunk (~5,000 pointer assignments + re-picks). This takes microseconds. HTTP handlers acquire read locks for API responses — they will not be blocked for any perceptible duration.

6. Frontend handling

The hop-resolver.js module already handles unresolved (prefix) hops gracefully — it shows abbreviated pubkeys. No frontend changes are required for correctness.

Optional enhancement: read the X-CoreScope-Status header and show a transient info banner during backfill. This is cosmetic and can be done in a follow-up.

What about first-run specifically?

On first run with a pre-existing database (e.g., migrating from a version without resolved_path), ALL 7.3M observations need backfill. The async approach handles this identically — it just takes longer in the background while HTTP is already serving.

On subsequent restarts, resolved_path is already persisted in SQLite and loaded by store.Load(). The backfill loop finds zero pending observations and exits immediately.

What about new observations during backfill?

The poller ingests new packets continuously. New observations written by the ingestor already have resolved_path set at ingest time (this is already implemented). The backfill only processes observations with ResolvedPath == nil, so there's no conflict with new data.

Alternatives considered

Lazy resolution (resolve on API access)

Resolve resolved_path only when an observation is accessed via API, cache the result.

Rejected because:

  • Adds latency to every API call that touches unresolved observations
  • Cache invalidation complexity (when does a cached resolution become stale?)
  • Doesn't help with index accuracy — indexes still need full data
  • The backfill is a one-time cost; lazy resolution makes it a recurring cost

Progressive loading (recent data first)

Load only the last 24h into memory, start serving, load historical data in background.

Rejected because:

  • Significantly more complex — all store operations need "is this data loaded yet?" checks
  • Memory implications: need to track which time ranges are loaded
  • Historical queries return wrong results during loading (not just degraded — wrong)
  • The actual bottleneck is backfill, not Load(). Even loading all 7.3M observations takes only ~90s.

Chunked blocking backfill (yield to HTTP between chunks, but keep in main startup)

Process N observations per tick with runtime.Gosched() between chunks, but still in main() before ListenAndServe.

Rejected because:

  • HTTP still isn't available until all chunks complete
  • Adds complexity without solving the core problem

Carmack Review (Performance)

The approach is sound. Moving a 2030 minute blocking operation to a background goroutine is the right call. Some notes:

  1. Chunk size tuning. 5,000 is a reasonable starting point. Monitor: if write lock contention shows up in pprof (unlikely with microsecond hold times), reduce chunk size. If backfill is too slow, increase it or reduce yield time.

  2. Memory is not a concern. The observations are already fully loaded in memory by Load(). The backfill only mutates the ResolvedPath field on existing objects — no additional memory allocation beyond temporary slices for the chunk.

  3. No hidden costs in resolvePathForObs. It reads nodePM (a PrefixMatcher, immutable after startup) and graph (neighbor graph, immutable after startup). No locks needed during resolution. This is embarrassingly parallelizable if needed, but single-goroutine processing with chunking is sufficient.

  4. The index rebuild at the end is O(n) and takes ~35s. This is a one-time cost after the first backfill. Not worth optimizing further unless the profile shows otherwise.

  5. Risk: pickBestObservation during backfill. API responses may flip their "best" observation as resolved paths become available. This is cosmetically noisy but functionally correct. Document this as expected behavior.

  6. Future optimization if needed: The backfill loop could be parallelized across multiple goroutines (partition observations by transmission hash). The resolution step is CPU-bound and read-only. This would reduce backfill wall time from 30 min to ~5 min on 8 cores. Not needed for MVP — the goal is HTTP availability, not backfill speed.

Implementation plan

  1. Refactor backfillResolvedPaths into chunked async version — new function backfillResolvedPathsAsync that processes in chunks and yields
  2. Move backfill call in main.go to after ListenAndServe — wrap in goroutine
  3. Add backfillComplete atomic flag to PacketStore — set after backfill finishes
  4. Add X-CoreScope-Status response header — middleware reads the flag
  5. Rebuild indexes after backfill completes — single call to rebuild subpath/distance/path-hop
  6. Tests: unit test for chunked backfill (mock store with N unresolved obs, verify chunks process correctly)
  7. Frontend (follow-up): optional banner during backfill state

Estimated effort: 12 hours for steps 15, plus tests.