mirror of
https://github.com/Kpa-clawbot/meshcore-analyzer.git
synced 2026-05-12 07:54:43 +00:00
fb744d895f
Fixes #1143. ## Summary Replaces the structurally unsound `decoded_json LIKE '%pubkey%'` (and `OR LIKE '%name%'`) attribution path with an exact-match lookup on a dedicated, indexed `transmissions.from_pubkey` column. This closes both holes documented in #1143: - **Hole 1** — same-name false positives via `OR LIKE '%name%'` - **Hole 2a** — adversarial spoofing: a malicious node names itself with another node's pubkey and gets attributed to the victim - **Hole 2b** — accidental false positive when any free-text field (path elements, channel names, message bodies) contains a 64-char hex substring matching a real pubkey - **Perf** — query now uses an index instead of a full-table scan against `LIKE '%substring%'` ## TDD Two-commit history shows red-then-green: | Commit | Status | Purpose | |---|---|---| | `7f0f08e` | RED — tests assertion-fail on master behaviour | Adversarial fixtures + spec | | `59327db` | GREEN — schema + ingestor + server + migration | Implementation | The red commit's test schema includes the new column so the file compiles, but the production code still uses LIKE — the assertions fail because the malicious / same-name / free-text rows are returned. The green commit changes the query plus adds the migration/ingest path. ## Changes ### Schema - new column `transmissions.from_pubkey TEXT` - new index `idx_transmissions_from_pubkey` ### Ingestor (`cmd/ingestor/`) - `PacketData.FromPubkey` populated from decoded ADVERT `pubKey` at write time. Cheap — already parsing `decoded_json`. Non-ADVERTs stay NULL. - `stmtInsertTransmission` writes the column. - Migration `from_pubkey_v1` ALTERs legacy DBs to add the column + index. - Bonus: rewrote the recipe in the gated one-shot `advert_count_unique_v1` migration to use `from_pubkey` (already marked done on existing DBs; kept correct for fresh installs). ### Server (`cmd/server/`) - `ensureFromPubkeyColumn` mirrors the ingestor migration so the server can boot against a DB the ingestor has never touched (e2e fixture, fresh installs). - `backfillFromPubkeyAsync` runs **after** HTTP starts. Scans `WHERE from_pubkey IS NULL AND payload_type = 4` in 5000-row chunks with a 100ms yield between chunks. Cannot block boot even on prod-sized DBs (100K+ transmissions). Queries handle NULL gracefully (return empty for that pubkey, same as today's unknown-pubkey path). - All in-scope LIKE call sites switched to exact match: | Site | Before | After | |---|---|---| | `buildPacketWhere` (was db.go:582) | `decoded_json LIKE '%pubkey%'` | `from_pubkey = ?` | | `buildTransmissionWhere` (was db.go:626) | `t.decoded_json LIKE '%pubkey%'` | `t.from_pubkey = ?` | | `GetRecentTransmissionsForNode` (was db.go:910) | `LIKE '%pubkey%' OR LIKE '%name%'` | `t.from_pubkey = ?` | | `QueryMultiNodePackets` (was db.go:1785) | `decoded_json LIKE '%pubkey%' OR ...` | `t.from_pubkey IN (?, ?, ...)` | | `advert_count_unique_v1` (was ingestor/db.go:257) | `decoded_json LIKE '%' \|\| nodes.public_key \|\| '%'` | `t.from_pubkey = nodes.public_key` | `GetRecentTransmissionsForNode` signature simplifies: the `name` parameter is gone (it was only ever used for the legacy `OR LIKE '%name%'` fallback). Sole caller in `routes.go:1243` updated. ### Tests - `cmd/server/from_pubkey_attribution_test.go` — adversarial fixtures + Hole 1/2a/2b/QueryMultiNodePackets exact-match assertions, EXPLAIN QUERY PLAN index check, migration backfill correctness. - `cmd/ingestor/from_pubkey_test.go` — write-time correctness (BuildPacketData populates FromPubkey for ADVERT only; InsertTransmission persists it; non-ADVERTs stay NULL). - Existing test schemas (server v2, server v3, coverage) get the new column **plus a SQLite trigger** that auto-populates `from_pubkey` from `decoded_json` on ADVERT inserts. This means existing fixtures (which only seed `decoded_json`) keep attributing correctly without per-test edits. - `seedTestData`'s ADVERTs explicitly set `from_pubkey`. ## Performance — index is used ``` $ EXPLAIN QUERY PLAN SELECT id FROM transmissions WHERE from_pubkey = ? SEARCH transmissions USING INDEX idx_transmissions_from_pubkey (from_pubkey=?) ``` Asserted in `TestFromPubkeyIndexUsed`. ## Migration approach - **Sync at boot**: `ALTER TABLE transmissions ADD COLUMN from_pubkey TEXT` is a metadata-only operation in SQLite — microseconds regardless of table size. `CREATE INDEX IF NOT EXISTS idx_transmissions_from_pubkey` is **not** metadata-only: it scans the table once. Empirically a few hundred ms on a 100K-row table; expect a few seconds on a 10M-row table (one-time cost, blocking boot during that window). Subsequent boots no-op via `IF NOT EXISTS`. If this boot delay becomes an operational concern at prod scale we can defer the `CREATE INDEX` to a goroutine — for now a few-second one-time delay is acceptable. - **Async**: row-level backfill of legacy NULL ADVERTs (chunked 5000 / 100ms yield). On a 100K-ADVERT prod DB, this completes in seconds in the background; HTTP is fully available throughout. - **Safety**: queries handle NULL gracefully — a node whose ADVERTs haven't backfilled yet returns empty, identical to today's behaviour for unknown pubkeys. No half-state regression. ## Out of scope (intentionally) The free-text `LIKE` paths the issue explicitly leaves alone (e.g. user-typed packet search) are untouched. Only the pubkey-attribution sites get the column treatment. ## Cycle-3 review fixes | Finding | Status | Commit | |---|---|---| | **M1c** — async-contract test was tautological (test's own `go`, not production's) | Fixed | `23ace71` (red) → `a05b50c` (green) | | **m1c** — package-global atomic resets unsafe under `t.Parallel()` | Fixed (`// DO NOT t.Parallel` comment + `Reset()` helper) | rolled into `23ace71` / `241ec69` | | **m2c** — `/api/healthz` read 3 atomics non-atomically (torn snapshot) | Fixed (single RWMutex-guarded snapshot + race test) | `241ec69` | | **n3c.m1** — vestigial OR-scaffolding in `QueryMultiNodePackets` | Fixed (cleanup) | `5a53ceb` | | **n3c.m2** — verify PR body language about `ALTER` vs `CREATE INDEX` | Verified accurate (already corrected in cycle 2) | (no change) | | **n3c.m3** — `json.Unmarshal` per row in backfill → could use SQL `json_extract` | **Deferred as known followup** — pure perf optimization (current per-row Unmarshal is correct, just slower); SQL rewrite would unwind the chunked-yield architecture and is non-trivial. Acceptable for one-time backfill at boot on legacy DBs. | ### M1c implementation detail `startFromPubkeyBackfill(dbPath, chunkSize, yieldDuration)` is now the single production entry point used by `main.go`. It internally does `go backfillFromPubkeyAsync(...)`. The test calls `startFromPubkeyBackfill` (no `go` prefix) and asserts the dispatch returns within 50ms — so if anyone removes the `go` keyword inside the wrapper, the test fails. **Manually verified**: removing the `go` keyword causes `TestBackfillFromPubkey_DoesNotBlockBoot` to fail with "backfill dispatch took ~1s (>50ms): not async — would block boot." ### m2c implementation detail `fromPubkeyBackfillTotal/Processed/Done` are now plain `int64`/`bool` package globals guarded by a single `sync.RWMutex`. `fromPubkeyBackfillSnapshot()` returns all three under one RLock. `TestHealthzFromPubkeyBackfillConsistentSnapshot` races a writer (lock-step total/processed updates with periodic done flips) against 8 readers hammering `/api/healthz`, asserting `processed<=total` and `(done => processed==total)` on every response. Verified the test catches torn reads (manually injected a 3-RLock implementation; test failed within milliseconds with "processed>total" and "done=true but processed!=total" errors). --------- Co-authored-by: openclaw-bot <bot@openclaw.local> Co-authored-by: openclaw-bot <bot@openclaw.dev>
592 lines
18 KiB
Go
592 lines
18 KiB
Go
package main
|
|
|
|
import (
|
|
"context"
|
|
"database/sql"
|
|
"flag"
|
|
"fmt"
|
|
"log"
|
|
"net/http"
|
|
_ "net/http/pprof"
|
|
"os"
|
|
"os/exec"
|
|
"os/signal"
|
|
"path/filepath"
|
|
"strings"
|
|
"sync"
|
|
"syscall"
|
|
"time"
|
|
|
|
"github.com/gorilla/mux"
|
|
)
|
|
|
|
// Set via -ldflags at build time
|
|
var Version string
|
|
var Commit string
|
|
var BuildTime string
|
|
|
|
func resolveCommit() string {
|
|
if Commit != "" {
|
|
return Commit
|
|
}
|
|
// Try .git-commit file (baked by Docker / CI)
|
|
if data, err := os.ReadFile(".git-commit"); err == nil {
|
|
if c := strings.TrimSpace(string(data)); c != "" && c != "unknown" {
|
|
return c
|
|
}
|
|
}
|
|
// Try git rev-parse at runtime
|
|
if out, err := exec.Command("git", "rev-parse", "--short", "HEAD").Output(); err == nil {
|
|
return strings.TrimSpace(string(out))
|
|
}
|
|
return "unknown"
|
|
}
|
|
|
|
func resolveVersion() string {
|
|
if Version != "" {
|
|
return Version
|
|
}
|
|
return "unknown"
|
|
}
|
|
|
|
func resolveBuildTime() string {
|
|
if BuildTime != "" {
|
|
return BuildTime
|
|
}
|
|
return "unknown"
|
|
}
|
|
|
|
func main() {
|
|
// pprof profiling — off by default, enable with ENABLE_PPROF=true
|
|
if os.Getenv("ENABLE_PPROF") == "true" {
|
|
pprofPort := os.Getenv("PPROF_PORT")
|
|
if pprofPort == "" {
|
|
pprofPort = "6060"
|
|
}
|
|
go func() {
|
|
log.Printf("[pprof] profiling UI at http://localhost:%s/debug/pprof/", pprofPort)
|
|
if err := http.ListenAndServe(":"+pprofPort, nil); err != nil {
|
|
log.Printf("[pprof] failed to start: %v (non-fatal)", err)
|
|
}
|
|
}()
|
|
}
|
|
|
|
var (
|
|
configDir string
|
|
port int
|
|
dbPath string
|
|
publicDir string
|
|
pollMs int
|
|
)
|
|
|
|
flag.StringVar(&configDir, "config-dir", ".", "Directory containing config.json")
|
|
flag.IntVar(&port, "port", 0, "HTTP port (overrides config)")
|
|
flag.StringVar(&dbPath, "db", "", "SQLite database path (overrides config/env)")
|
|
flag.StringVar(&publicDir, "public", "public", "Directory to serve static files from")
|
|
flag.IntVar(&pollMs, "poll-ms", 1000, "SQLite poll interval for WebSocket broadcast (ms)")
|
|
flag.Parse()
|
|
|
|
// Load config
|
|
cfg, err := LoadConfig(configDir)
|
|
if err != nil {
|
|
log.Printf("[config] warning: %v (using defaults)", err)
|
|
}
|
|
|
|
// CLI flags override config
|
|
if port > 0 {
|
|
cfg.Port = port
|
|
}
|
|
if cfg.Port == 0 {
|
|
cfg.Port = 3000
|
|
}
|
|
if dbPath != "" {
|
|
cfg.DBPath = dbPath
|
|
}
|
|
if cfg.APIKey == "" {
|
|
log.Printf("[security] WARNING: no apiKey configured — write endpoints are BLOCKED (set apiKey in config.json to enable them)")
|
|
} else if IsWeakAPIKey(cfg.APIKey) {
|
|
log.Printf("[security] WARNING: API key is weak or a known default — write endpoints are vulnerable")
|
|
}
|
|
|
|
// Apply Go runtime soft memory limit (#836).
|
|
// Honors GOMEMLIMIT if set; otherwise derives from packetStore.maxMemoryMB.
|
|
{
|
|
_, envSet := os.LookupEnv("GOMEMLIMIT")
|
|
maxMB := 0
|
|
if cfg.PacketStore != nil {
|
|
maxMB = cfg.PacketStore.MaxMemoryMB
|
|
}
|
|
limit, source := applyMemoryLimit(maxMB, envSet)
|
|
switch source {
|
|
case "env":
|
|
log.Printf("[memlimit] using GOMEMLIMIT from environment (%s)", os.Getenv("GOMEMLIMIT"))
|
|
case "derived":
|
|
log.Printf("[memlimit] derived from packetStore.maxMemoryMB=%d → %d MiB (1.5x headroom)", maxMB, limit/(1024*1024))
|
|
default:
|
|
log.Printf("[memlimit] no soft memory limit set (GOMEMLIMIT unset, packetStore.maxMemoryMB=0); recommend setting one to avoid container OOM-kill")
|
|
}
|
|
}
|
|
|
|
// Resolve DB path
|
|
resolvedDB := cfg.ResolveDBPath(configDir)
|
|
log.Printf("[config] port=%d db=%s public=%s", cfg.Port, resolvedDB, publicDir)
|
|
if len(cfg.NodeBlacklist) > 0 {
|
|
log.Printf("[config] nodeBlacklist: %d node(s) will be hidden from API", len(cfg.NodeBlacklist))
|
|
for _, pk := range cfg.NodeBlacklist {
|
|
if trimmed := strings.ToLower(strings.TrimSpace(pk)); trimmed != "" {
|
|
log.Printf("[config] blacklisted: %s", trimmed)
|
|
}
|
|
}
|
|
}
|
|
|
|
// Open database
|
|
database, err := OpenDB(resolvedDB)
|
|
if err != nil {
|
|
log.Fatalf("[db] failed to open %s: %v", resolvedDB, err)
|
|
}
|
|
var dbCloseOnce sync.Once
|
|
dbClose := func() error {
|
|
var err error
|
|
dbCloseOnce.Do(func() { err = database.Close() })
|
|
return err
|
|
}
|
|
defer dbClose()
|
|
|
|
// Verify DB has expected tables
|
|
var tableName string
|
|
err = database.conn.QueryRow("SELECT name FROM sqlite_master WHERE type='table' AND name='transmissions'").Scan(&tableName)
|
|
if err == sql.ErrNoRows {
|
|
log.Fatalf("[db] table 'transmissions' not found — is this a CoreScope database?")
|
|
}
|
|
|
|
stats, err := database.GetStats()
|
|
if err != nil {
|
|
log.Printf("[db] warning: could not read stats: %v", err)
|
|
} else {
|
|
log.Printf("[db] transmissions=%d observations=%d nodes=%d observers=%d",
|
|
stats.TotalTransmissions, stats.TotalObservations, stats.TotalNodes, stats.TotalObservers)
|
|
}
|
|
|
|
// Check auto_vacuum mode and optionally migrate (#919)
|
|
checkAutoVacuum(database, cfg, resolvedDB)
|
|
|
|
// In-memory packet store
|
|
store := NewPacketStore(database, cfg.PacketStore, cfg.CacheTTL)
|
|
if err := store.Load(); err != nil {
|
|
log.Fatalf("[store] failed to load: %v", err)
|
|
}
|
|
|
|
// Initialize persisted neighbor graph
|
|
dbPath = database.path
|
|
if err := ensureNeighborEdgesTable(dbPath); err != nil {
|
|
log.Printf("[neighbor] warning: could not create neighbor_edges table: %v", err)
|
|
}
|
|
// Add resolved_path column if missing.
|
|
// NOTE on startup ordering (review item #10): ensureResolvedPathColumn runs AFTER
|
|
// OpenDB/detectSchema, so db.hasResolvedPath will be false on first run with a
|
|
// pre-existing DB. This means Load() won't SELECT resolved_path from SQLite.
|
|
// Async backfill runs after HTTP starts (see backfillResolvedPathsAsync below)
|
|
// AND to SQLite. On next restart, detectSchema finds the column and Load() reads it.
|
|
if err := ensureResolvedPathColumn(dbPath); err != nil {
|
|
log.Printf("[store] warning: could not add resolved_path column: %v", err)
|
|
} else {
|
|
database.hasResolvedPath = true // detectSchema ran before column was added; fix the flag
|
|
}
|
|
|
|
// Ensure observers.inactive column exists (PR #954 filters on it; ingestor migration
|
|
// adds it but server may run against DBs ingestor never touched, e.g. e2e fixture).
|
|
if err := ensureObserverInactiveColumn(dbPath); err != nil {
|
|
log.Printf("[store] warning: could not add observers.inactive column: %v", err)
|
|
}
|
|
|
|
// Ensure observers.last_packet_at column exists (PR #905 reads it; ingestor migration
|
|
// adds it but server may run against DBs ingestor never touched, e.g. e2e fixture).
|
|
if err := ensureLastPacketAtColumn(dbPath); err != nil {
|
|
log.Printf("[store] warning: could not add observers.last_packet_at column: %v", err)
|
|
}
|
|
|
|
// Ensure nodes.foreign_advert column exists (#730 reads it on every /api/nodes
|
|
// scan; ingestor migration foreign_advert_v1 adds it but server may run against
|
|
// DBs ingestor never touched, e.g. e2e fixture).
|
|
if err := ensureForeignAdvertColumn(dbPath); err != nil {
|
|
log.Printf("[store] warning: could not add nodes.foreign_advert column: %v", err)
|
|
}
|
|
|
|
// Ensure transmissions.from_pubkey column + index exists (#1143). Backfill
|
|
// for legacy NULL rows runs async after HTTP starts so it can't block boot
|
|
// even on prod-sized DBs (100K+ transmissions).
|
|
if err := ensureFromPubkeyColumn(dbPath); err != nil {
|
|
log.Printf("[store] warning: could not add transmissions.from_pubkey column: %v", err)
|
|
}
|
|
|
|
// Soft-delete observers that are in the blacklist (mark inactive=1) so
|
|
// historical data from a prior unblocked window is hidden too.
|
|
if len(cfg.ObserverBlacklist) > 0 {
|
|
softDeleteBlacklistedObservers(dbPath, cfg.ObserverBlacklist)
|
|
}
|
|
|
|
// WaitGroup for background init steps that gate /api/healthz readiness.
|
|
var initWg sync.WaitGroup
|
|
|
|
// Load or build neighbor graph
|
|
if neighborEdgesTableExists(database.conn) {
|
|
store.graph = loadNeighborEdgesFromDB(database.conn)
|
|
log.Printf("[neighbor] loaded persisted neighbor graph")
|
|
} else {
|
|
log.Printf("[neighbor] no persisted edges found, will build in background...")
|
|
store.graph = NewNeighborGraph() // empty graph — gets populated by background goroutine
|
|
initWg.Add(1)
|
|
go func() {
|
|
defer initWg.Done()
|
|
defer func() {
|
|
if r := recover(); r != nil {
|
|
log.Printf("[neighbor] graph build panic recovered: %v", r)
|
|
}
|
|
}()
|
|
rw, rwErr := cachedRW(dbPath)
|
|
if rwErr == nil {
|
|
edgeCount := buildAndPersistEdges(store, rw)
|
|
log.Printf("[neighbor] persisted %d edges", edgeCount)
|
|
}
|
|
built := BuildFromStore(store)
|
|
store.mu.Lock()
|
|
store.graph = built
|
|
store.mu.Unlock()
|
|
log.Printf("[neighbor] graph build complete")
|
|
}()
|
|
}
|
|
|
|
// Initial pickBestObservation runs in background — doesn't need to block HTTP.
|
|
// API serves best-effort data until this completes (~10s for 100K txs).
|
|
// Processes in chunks of 5000, releasing the lock between chunks so API
|
|
// handlers remain responsive.
|
|
initWg.Add(1)
|
|
go func() {
|
|
defer initWg.Done()
|
|
defer func() {
|
|
if r := recover(); r != nil {
|
|
log.Printf("[store] pickBestObservation panic recovered: %v", r)
|
|
}
|
|
}()
|
|
const chunkSize = 5000
|
|
store.mu.RLock()
|
|
totalPackets := len(store.packets)
|
|
store.mu.RUnlock()
|
|
|
|
for i := 0; i < totalPackets; i += chunkSize {
|
|
end := i + chunkSize
|
|
if end > totalPackets {
|
|
end = totalPackets
|
|
}
|
|
store.mu.Lock()
|
|
for j := i; j < end && j < len(store.packets); j++ {
|
|
pickBestObservation(store.packets[j])
|
|
}
|
|
store.mu.Unlock()
|
|
if end < totalPackets {
|
|
time.Sleep(10 * time.Millisecond) // yield to API handlers
|
|
}
|
|
}
|
|
log.Printf("[store] initial pickBestObservation complete (%d transmissions)", totalPackets)
|
|
}()
|
|
|
|
// Mark server ready once all background init completes.
|
|
go func() {
|
|
initWg.Wait()
|
|
readiness.Store(1)
|
|
log.Printf("[server] readiness: ready=true (background init complete)")
|
|
}()
|
|
|
|
// WebSocket hub
|
|
hub := NewHub()
|
|
|
|
// HTTP server
|
|
srv := NewServer(database, cfg, hub)
|
|
srv.store = store
|
|
router := mux.NewRouter()
|
|
srv.RegisterRoutes(router)
|
|
|
|
// WebSocket endpoint
|
|
router.HandleFunc("/ws", hub.ServeWS)
|
|
|
|
// Static files + SPA fallback
|
|
absPublic, _ := filepath.Abs(publicDir)
|
|
if _, err := os.Stat(absPublic); err == nil {
|
|
fs := http.FileServer(http.Dir(absPublic))
|
|
router.PathPrefix("/").Handler(wsOrStatic(hub, spaHandler(absPublic, fs)))
|
|
log.Printf("[static] serving %s", absPublic)
|
|
} else {
|
|
log.Printf("[static] directory %s not found — API-only mode", absPublic)
|
|
router.PathPrefix("/").HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
|
w.Header().Set("Content-Type", "text/html")
|
|
w.Write([]byte(`<!DOCTYPE html><html><body><h1>CoreScope</h1><p>Frontend not found. API available at /api/</p></body></html>`))
|
|
})
|
|
}
|
|
|
|
// Start SQLite poller for WebSocket broadcast
|
|
poller := NewPoller(database, hub, time.Duration(pollMs)*time.Millisecond)
|
|
poller.store = store
|
|
go poller.Start()
|
|
|
|
// Start periodic eviction
|
|
stopEviction := store.StartEvictionTicker()
|
|
defer stopEviction()
|
|
|
|
// Auto-prune old packets if retention.packetDays is configured
|
|
vacuumPages := cfg.IncrementalVacuumPages()
|
|
var stopPrune func()
|
|
if cfg.Retention != nil && cfg.Retention.PacketDays > 0 {
|
|
days := cfg.Retention.PacketDays
|
|
pruneTicker := time.NewTicker(24 * time.Hour)
|
|
pruneDone := make(chan struct{})
|
|
stopPrune = func() {
|
|
pruneTicker.Stop()
|
|
close(pruneDone)
|
|
}
|
|
go func() {
|
|
defer func() {
|
|
if r := recover(); r != nil {
|
|
log.Printf("[prune] panic recovered: %v", r)
|
|
}
|
|
}()
|
|
time.Sleep(1 * time.Minute)
|
|
if n, err := database.PruneOldPackets(days); err != nil {
|
|
log.Printf("[prune] error: %v", err)
|
|
} else {
|
|
log.Printf("[prune] deleted %d transmissions older than %d days", n, days)
|
|
if n > 0 {
|
|
runIncrementalVacuum(resolvedDB, vacuumPages)
|
|
}
|
|
}
|
|
for {
|
|
select {
|
|
case <-pruneTicker.C:
|
|
if n, err := database.PruneOldPackets(days); err != nil {
|
|
log.Printf("[prune] error: %v", err)
|
|
} else {
|
|
log.Printf("[prune] deleted %d transmissions older than %d days", n, days)
|
|
if n > 0 {
|
|
runIncrementalVacuum(resolvedDB, vacuumPages)
|
|
}
|
|
}
|
|
case <-pruneDone:
|
|
return
|
|
}
|
|
}
|
|
}()
|
|
log.Printf("[prune] auto-prune enabled: packets older than %d days will be removed daily", days)
|
|
}
|
|
|
|
// Auto-prune old metrics
|
|
var stopMetricsPrune func()
|
|
{
|
|
metricsDays := cfg.MetricsRetentionDays()
|
|
metricsPruneTicker := time.NewTicker(24 * time.Hour)
|
|
metricsPruneDone := make(chan struct{})
|
|
stopMetricsPrune = func() {
|
|
metricsPruneTicker.Stop()
|
|
close(metricsPruneDone)
|
|
}
|
|
go func() {
|
|
defer func() {
|
|
if r := recover(); r != nil {
|
|
log.Printf("[metrics-prune] panic recovered: %v", r)
|
|
}
|
|
}()
|
|
time.Sleep(2 * time.Minute) // stagger after packet prune
|
|
database.PruneOldMetrics(metricsDays)
|
|
runIncrementalVacuum(resolvedDB, vacuumPages)
|
|
for {
|
|
select {
|
|
case <-metricsPruneTicker.C:
|
|
database.PruneOldMetrics(metricsDays)
|
|
runIncrementalVacuum(resolvedDB, vacuumPages)
|
|
case <-metricsPruneDone:
|
|
return
|
|
}
|
|
}
|
|
}()
|
|
log.Printf("[metrics-prune] auto-prune enabled: metrics older than %d days", metricsDays)
|
|
}
|
|
|
|
// Auto-prune stale observers
|
|
var stopObserverPrune func()
|
|
{
|
|
observerDays := cfg.ObserverDaysOrDefault()
|
|
if observerDays <= -1 {
|
|
// -1 means keep forever, skip
|
|
} else {
|
|
observerPruneTicker := time.NewTicker(24 * time.Hour)
|
|
observerPruneDone := make(chan struct{})
|
|
stopObserverPrune = func() {
|
|
observerPruneTicker.Stop()
|
|
close(observerPruneDone)
|
|
}
|
|
go func() {
|
|
defer func() {
|
|
if r := recover(); r != nil {
|
|
log.Printf("[observer-prune] panic recovered: %v", r)
|
|
}
|
|
}()
|
|
time.Sleep(3 * time.Minute) // stagger after metrics prune
|
|
database.RemoveStaleObservers(observerDays)
|
|
runIncrementalVacuum(resolvedDB, vacuumPages)
|
|
for {
|
|
select {
|
|
case <-observerPruneTicker.C:
|
|
database.RemoveStaleObservers(observerDays)
|
|
runIncrementalVacuum(resolvedDB, vacuumPages)
|
|
case <-observerPruneDone:
|
|
return
|
|
}
|
|
}
|
|
}()
|
|
log.Printf("[observer-prune] auto-prune enabled: observers not seen in %d days will be removed", observerDays)
|
|
}
|
|
}
|
|
|
|
// Auto-prune old neighbor edges
|
|
var stopEdgePrune func()
|
|
{
|
|
maxAgeDays := cfg.NeighborMaxAgeDays()
|
|
edgePruneTicker := time.NewTicker(24 * time.Hour)
|
|
edgePruneDone := make(chan struct{})
|
|
stopEdgePrune = func() {
|
|
edgePruneTicker.Stop()
|
|
close(edgePruneDone)
|
|
}
|
|
go func() {
|
|
defer func() {
|
|
if r := recover(); r != nil {
|
|
log.Printf("[neighbor-prune] panic recovered: %v", r)
|
|
}
|
|
}()
|
|
time.Sleep(4 * time.Minute) // stagger after metrics prune
|
|
store.mu.RLock()
|
|
g := store.graph
|
|
store.mu.RUnlock()
|
|
PruneNeighborEdges(dbPath, g, maxAgeDays)
|
|
runIncrementalVacuum(resolvedDB, vacuumPages)
|
|
for {
|
|
select {
|
|
case <-edgePruneTicker.C:
|
|
store.mu.RLock()
|
|
g := store.graph
|
|
store.mu.RUnlock()
|
|
PruneNeighborEdges(dbPath, g, maxAgeDays)
|
|
runIncrementalVacuum(resolvedDB, vacuumPages)
|
|
case <-edgePruneDone:
|
|
return
|
|
}
|
|
}
|
|
}()
|
|
log.Printf("[neighbor-prune] auto-prune enabled: edges older than %d days", maxAgeDays)
|
|
}
|
|
|
|
// Graceful shutdown
|
|
httpServer := &http.Server{
|
|
Addr: fmt.Sprintf(":%d", cfg.Port),
|
|
Handler: router,
|
|
ReadTimeout: 30 * time.Second,
|
|
WriteTimeout: 60 * time.Second,
|
|
IdleTimeout: 120 * time.Second,
|
|
}
|
|
|
|
go func() {
|
|
sigCh := make(chan os.Signal, 1)
|
|
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
|
|
sig := <-sigCh
|
|
log.Printf("[server] received %v, shutting down...", sig)
|
|
|
|
// 1. Stop accepting new WebSocket/poll data
|
|
poller.Stop()
|
|
|
|
// 1b. Stop auto-prune ticker
|
|
if stopPrune != nil {
|
|
stopPrune()
|
|
}
|
|
if stopMetricsPrune != nil {
|
|
stopMetricsPrune()
|
|
}
|
|
if stopObserverPrune != nil {
|
|
stopObserverPrune()
|
|
}
|
|
if stopEdgePrune != nil {
|
|
stopEdgePrune()
|
|
}
|
|
|
|
// 2. Gracefully drain HTTP connections (up to 15s)
|
|
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
|
|
defer cancel()
|
|
if err := httpServer.Shutdown(ctx); err != nil {
|
|
log.Printf("[server] HTTP shutdown error: %v", err)
|
|
}
|
|
|
|
// 3. Close WebSocket hub
|
|
hub.Close()
|
|
|
|
// 4. Close database (release SQLite WAL lock)
|
|
if err := dbClose(); err != nil {
|
|
log.Printf("[server] DB close error: %v", err)
|
|
}
|
|
log.Println("[server] shutdown complete")
|
|
}()
|
|
|
|
log.Printf("[server] CoreScope (Go) listening on http://localhost:%d", cfg.Port)
|
|
|
|
// Start async backfill in background — HTTP is now available.
|
|
go backfillResolvedPathsAsync(store, dbPath, 5000, 100*time.Millisecond, cfg.BackfillHours())
|
|
// #1143: backfill from_pubkey for legacy ADVERT rows. Async so even
|
|
// 100K+ rows can't block boot; queries handle NULL gracefully.
|
|
// startFromPubkeyBackfill wraps the goroutine dispatch so the async
|
|
// contract is testable (see TestBackfillFromPubkey_DoesNotBlockBoot).
|
|
startFromPubkeyBackfill(dbPath, 5000, 100*time.Millisecond)
|
|
|
|
// Migrate old content hashes in background (one-time, idempotent).
|
|
go migrateContentHashesAsync(store, 5000, 100*time.Millisecond)
|
|
|
|
if err := httpServer.ListenAndServe(); err != http.ErrServerClosed {
|
|
log.Fatalf("[server] %v", err)
|
|
}
|
|
}
|
|
|
|
// spaHandler serves static files, falling back to index.html for SPA routes.
|
|
// It reads index.html once at creation time and replaces the __BUST__ placeholder
|
|
// with a Unix timestamp so browsers fetch fresh JS/CSS after each server restart.
|
|
func spaHandler(root string, fs http.Handler) http.Handler {
|
|
// Pre-process index.html: replace __BUST__ with a cache-bust timestamp
|
|
indexPath := filepath.Join(root, "index.html")
|
|
rawHTML, err := os.ReadFile(indexPath)
|
|
if err != nil {
|
|
log.Printf("[static] warning: could not read index.html for cache-bust: %v", err)
|
|
rawHTML = []byte("<!DOCTYPE html><html><body><h1>CoreScope</h1><p>index.html not found</p></body></html>")
|
|
}
|
|
bustValue := fmt.Sprintf("%d", time.Now().Unix())
|
|
indexHTML := []byte(strings.ReplaceAll(string(rawHTML), "__BUST__", bustValue))
|
|
log.Printf("[static] cache-bust value: %s", bustValue)
|
|
|
|
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
|
// Serve pre-processed index.html for root and /index.html
|
|
if r.URL.Path == "/" || r.URL.Path == "/index.html" {
|
|
w.Header().Set("Content-Type", "text/html; charset=utf-8")
|
|
w.Header().Set("Cache-Control", "no-cache, no-store, must-revalidate")
|
|
w.Write(indexHTML)
|
|
return
|
|
}
|
|
|
|
path := filepath.Join(root, r.URL.Path)
|
|
if _, err := os.Stat(path); os.IsNotExist(err) {
|
|
// SPA fallback — serve pre-processed index.html
|
|
w.Header().Set("Content-Type", "text/html; charset=utf-8")
|
|
w.Header().Set("Cache-Control", "no-cache, no-store, must-revalidate")
|
|
w.Write(indexHTML)
|
|
return
|
|
}
|
|
// Disable caching for JS/CSS/HTML
|
|
if filepath.Ext(path) == ".js" || filepath.Ext(path) == ".css" || filepath.Ext(path) == ".html" {
|
|
w.Header().Set("Cache-Control", "no-cache, no-store, must-revalidate")
|
|
}
|
|
fs.ServeHTTP(w, r)
|
|
})
|
|
}
|