meshcore-analyzer/docs/specs/startup-performance.md

# Startup Performance: Serve HTTP Within 2 Minutes on Any Database Size

## Problem

CoreScope takes 30–45 minutes to start on large databases (325K transmissions, 7.3M observations, 1.4GB SQLite). The HTTP server is completely unavailable during this time. Operators cannot restart without 30+ minutes of downtime.

### Where time goes (7.3M observation benchmark)

| Phase | Time | Blocking? |
|---|---|---|
| `Load()` — read SQLite → memory | ~90s | Yes |
| Build subpath index | ~20s | Yes |
| Build distance index | ~15s | Yes |
| Build path-hop index | <1s | Yes |
| Load neighbor edges from SQLite | <1s | Yes |
| **Backfill `resolved_path` for NULL observations** | **20–30+ min** | **Yes — the killer** |
| Re-pick best observations | ~10s | Yes |

The backfill calls `resolvePathForObs` for every observation with `resolved_path IS NULL`, then writes results back to SQLite and updates in-memory state. On first run (or after schema migration), this means resolving all 7.3M observations.

### Root cause

`backfillResolvedPaths()` in `neighbor_persist.go` runs synchronously in `main()` before `httpServer.ListenAndServe()`. It:
1. Collects all observations with `ResolvedPath == nil` under a read lock
2. Resolves paths (CPU-bound, ~millions of calls to `resolvePathForObs`)
3. Writes results to SQLite in a single transaction
4. Updates in-memory state under a write lock

Steps 2–4 block the main goroutine for 20–30 minutes.

## Solution: Async Chunked Backfill

### Design

Move `backfillResolvedPaths` out of the startup critical path. Start the HTTP server immediately after loading data and building indexes. Run backfill in a background goroutine with chunked processing that yields between batches.

### Startup sequence (new)

```
1. OpenDB, verify tables                    (~1s)
2. store.Load()                             (~90s)
3. ensureNeighborEdgesTable                  (<1s)
4. ensureResolvedPathColumn                  (<1s)
5. Load/build neighbor graph                 (<1s)
6. Build subpath/distance/path-hop indexes   (~35s)
7. pickBestObservation (with whatever        (~10s)
   resolved_path data exists)
8. *** START HTTP SERVER ***                 — serving at ~2min mark
9. Background: backfillResolvedPaths         (20-30 min, non-blocking)
   → chunked, yields between batches
   → updates in-memory + SQLite incrementally
   → re-picks best obs for affected txs
```

Total time to first HTTP response: **~2 minutes** regardless of database size.

### Implementation details

#### 1. Background backfill goroutine

```go
// In main(), after starting HTTP server:
go func() {
    backfillResolvedPathsAsync(store, dbPath, 5000, 100*time.Millisecond)
}()
```

The async backfill processes observations in chunks of N (e.g., 5,000):

```go
func backfillResolvedPathsAsync(store *PacketStore, dbPath string, chunkSize int, yieldDuration time.Duration) {
    for {
        n := backfillResolvedPathsChunk(store, dbPath, chunkSize)
        if n == 0 {
            break // done
        }
        log.Printf("[store] backfilled resolved_path for %d observations (async)", n)
        time.Sleep(yieldDuration) // yield to HTTP handlers
    }
    log.Printf("[store] async resolved_path backfill complete")
}
```

Each chunk:
1. Takes a read lock, collects up to `chunkSize` pending observations, releases lock
2. Resolves paths (no lock held — `resolvePathForObs` only reads immutable data)
3. Opens a separate RW SQLite connection, writes results in a transaction
4. Takes a write lock, updates in-memory `obs.ResolvedPath` and re-picks best obs for affected transmissions, releases lock
5. Sleeps briefly to yield CPU/lock time to HTTP handlers

#### 2. Readiness flag and API degraded-mode header

Add a boolean to `PacketStore`:

```go
type PacketStore struct {
    // ...
    backfillComplete atomic.Bool
}
```

API responses include a header during backfill:

```
X-CoreScope-Status: backfilling
X-CoreScope-Backfill-Remaining: 4523000
```

After backfill completes:
```
X-CoreScope-Status: ready
```

The frontend can read this header and show a subtle banner: *"Resolving hop paths… some paths may show abbreviated pubkeys."*

#### 3. Index rebuilds

The subpath, distance, and path-hop indexes are built during startup from whatever data exists. During backfill, newly resolved paths need to update these indexes incrementally.

Options (in order of preference):

**Option A: Defer index updates to end of backfill.** Indexes work fine with unresolved paths — they just produce slightly less precise results. After backfill completes, rebuild indexes once. Simple, correct, low risk.

**Option B: Incremental index updates per chunk.** After each chunk, update affected index entries. More complex, better real-time accuracy. Only worth it if index accuracy during backfill matters for production use.

**Recommendation: Option A.** The indexes are usable with unresolved paths. A single rebuild at the end (~35s) is cheap compared to the backfill duration. The API works throughout — results just improve after backfill finishes.

#### 4. SQLite contention

The backfill opens a separate RW connection for writes. The main server uses a read-only connection for polling. SQLite WAL mode (already in use) allows concurrent readers and one writer. Contention risk is minimal:

- Write transactions are small (5,000 UPDATEs per chunk, batched in a single tx)
- Read queries from HTTP handlers are unaffected by WAL writes
- The 100ms yield between chunks prevents sustained write pressure

#### 5. Lock contention

The write lock is held only during the in-memory update phase of each chunk (~5,000 pointer assignments + re-picks). This takes microseconds. HTTP handlers acquire read locks for API responses — they will not be blocked for any perceptible duration.

#### 6. Frontend handling

The `hop-resolver.js` module already handles unresolved (prefix) hops gracefully — it shows abbreviated pubkeys. No frontend changes are required for correctness.

Optional enhancement: read the `X-CoreScope-Status` header and show a transient info banner during backfill. This is cosmetic and can be done in a follow-up.

### What about first-run specifically?

On first run with a pre-existing database (e.g., migrating from a version without `resolved_path`), ALL 7.3M observations need backfill. The async approach handles this identically — it just takes longer in the background while HTTP is already serving.

On subsequent restarts, `resolved_path` is already persisted in SQLite and loaded by `store.Load()`. The backfill loop finds zero pending observations and exits immediately.

### What about new observations during backfill?

The poller ingests new packets continuously. New observations written by the ingestor already have `resolved_path` set at ingest time (this is already implemented). The backfill only processes observations with `ResolvedPath == nil`, so there's no conflict with new data.

## Alternatives considered

### Lazy resolution (resolve on API access)

Resolve `resolved_path` only when an observation is accessed via API, cache the result.

**Rejected because:**
- Adds latency to every API call that touches unresolved observations
- Cache invalidation complexity (when does a cached resolution become stale?)
- Doesn't help with index accuracy — indexes still need full data
- The backfill is a one-time cost; lazy resolution makes it a recurring cost

### Progressive loading (recent data first)

Load only the last 24h into memory, start serving, load historical data in background.

**Rejected because:**
- Significantly more complex — all store operations need "is this data loaded yet?" checks
- Memory implications: need to track which time ranges are loaded
- Historical queries return wrong results during loading (not just degraded — wrong)
- The actual bottleneck is backfill, not `Load()`. Even loading all 7.3M observations takes only ~90s.

### Chunked blocking backfill (yield to HTTP between chunks, but keep in main startup)

Process N observations per tick with `runtime.Gosched()` between chunks, but still in `main()` before `ListenAndServe`.

**Rejected because:**
- HTTP still isn't available until all chunks complete
- Adds complexity without solving the core problem

## Carmack Review (Performance)

**The approach is sound.** Moving a 20–30 minute blocking operation to a background goroutine is the right call. Some notes:

1. **Chunk size tuning.** 5,000 is a reasonable starting point. Monitor: if write lock contention shows up in pprof (unlikely with microsecond hold times), reduce chunk size. If backfill is too slow, increase it or reduce yield time.

2. **Memory is not a concern.** The observations are already fully loaded in memory by `Load()`. The backfill only mutates the `ResolvedPath` field on existing objects — no additional memory allocation beyond temporary slices for the chunk.

3. **No hidden costs in `resolvePathForObs`.** It reads `nodePM` (a `PrefixMatcher`, immutable after startup) and `graph` (neighbor graph, immutable after startup). No locks needed during resolution. This is embarrassingly parallelizable if needed, but single-goroutine processing with chunking is sufficient.

4. **The index rebuild at the end is O(n) and takes ~35s.** This is a one-time cost after the first backfill. Not worth optimizing further unless the profile shows otherwise.

5. **Risk: `pickBestObservation` during backfill.** API responses may flip their "best" observation as resolved paths become available. This is cosmetically noisy but functionally correct. Document this as expected behavior.

6. **Future optimization if needed:** The backfill loop could be parallelized across multiple goroutines (partition observations by transmission hash). The resolution step is CPU-bound and read-only. This would reduce backfill wall time from 30 min to ~5 min on 8 cores. Not needed for MVP — the goal is HTTP availability, not backfill speed.

## Implementation plan

1. **Refactor `backfillResolvedPaths` into chunked async version** — new function `backfillResolvedPathsAsync` that processes in chunks and yields
2. **Move backfill call in `main.go` to after `ListenAndServe`** — wrap in goroutine
3. **Add `backfillComplete` atomic flag to `PacketStore`** — set after backfill finishes
4. **Add `X-CoreScope-Status` response header** — middleware reads the flag
5. **Rebuild indexes after backfill completes** — single call to rebuild subpath/distance/path-hop
6. **Tests:** unit test for chunked backfill (mock store with N unresolved obs, verify chunks process correctly)
7. **Frontend (follow-up):** optional banner during backfill state

Estimated effort: 1–2 hours for steps 1–5, plus tests.