Files
meshcore-analyzer/cmd/server/from_pubkey_attribution_test.go
T
Kpa-clawbot fb744d895f fix(#1143): structural pubkey attribution via from_pubkey column (#1152)
Fixes #1143.

## Summary

Replaces the structurally unsound `decoded_json LIKE '%pubkey%'` (and
`OR LIKE '%name%'`) attribution path with an exact-match lookup on a
dedicated, indexed `transmissions.from_pubkey` column.

This closes both holes documented in #1143:
- **Hole 1** — same-name false positives via `OR LIKE '%name%'`
- **Hole 2a** — adversarial spoofing: a malicious node names itself with
another node's pubkey and gets attributed to the victim
- **Hole 2b** — accidental false positive when any free-text field (path
elements, channel names, message bodies) contains a 64-char hex
substring matching a real pubkey
- **Perf** — query now uses an index instead of a full-table scan
against `LIKE '%substring%'`

## TDD

Two-commit history shows red-then-green:

| Commit | Status | Purpose |
|---|---|---|
| `7f0f08e` | RED — tests assertion-fail on master behaviour |
Adversarial fixtures + spec |
| `59327db` | GREEN — schema + ingestor + server + migration |
Implementation |

The red commit's test schema includes the new column so the file
compiles, but the production code still uses LIKE — the assertions fail
because the malicious / same-name / free-text rows are returned. The
green commit changes the query plus adds the migration/ingest path.

## Changes

### Schema
- new column `transmissions.from_pubkey TEXT`
- new index `idx_transmissions_from_pubkey`

### Ingestor (`cmd/ingestor/`)
- `PacketData.FromPubkey` populated from decoded ADVERT `pubKey` at
write time. Cheap — already parsing `decoded_json`. Non-ADVERTs stay
NULL.
- `stmtInsertTransmission` writes the column.
- Migration `from_pubkey_v1` ALTERs legacy DBs to add the column +
index.
- Bonus: rewrote the recipe in the gated one-shot
`advert_count_unique_v1` migration to use `from_pubkey` (already marked
done on existing DBs; kept correct for fresh installs).

### Server (`cmd/server/`)
- `ensureFromPubkeyColumn` mirrors the ingestor migration so the server
can boot against a DB the ingestor has never touched (e2e fixture, fresh
installs).
- `backfillFromPubkeyAsync` runs **after** HTTP starts. Scans `WHERE
from_pubkey IS NULL AND payload_type = 4` in 5000-row chunks with a
100ms yield between chunks. Cannot block boot even on prod-sized DBs
(100K+ transmissions). Queries handle NULL gracefully (return empty for
that pubkey, same as today's unknown-pubkey path).
- All in-scope LIKE call sites switched to exact match:

| Site | Before | After |
|---|---|---|
| `buildPacketWhere` (was db.go:582) | `decoded_json LIKE '%pubkey%'` |
`from_pubkey = ?` |
| `buildTransmissionWhere` (was db.go:626) | `t.decoded_json LIKE
'%pubkey%'` | `t.from_pubkey = ?` |
| `GetRecentTransmissionsForNode` (was db.go:910) | `LIKE '%pubkey%' OR
LIKE '%name%'` | `t.from_pubkey = ?` |
| `QueryMultiNodePackets` (was db.go:1785) | `decoded_json LIKE
'%pubkey%' OR ...` | `t.from_pubkey IN (?, ?, ...)` |
| `advert_count_unique_v1` (was ingestor/db.go:257) | `decoded_json LIKE
'%' \|\| nodes.public_key \|\| '%'` | `t.from_pubkey = nodes.public_key`
|

`GetRecentTransmissionsForNode` signature simplifies: the `name`
parameter is gone (it was only ever used for the legacy `OR LIKE
'%name%'` fallback). Sole caller in `routes.go:1243` updated.

### Tests
- `cmd/server/from_pubkey_attribution_test.go` — adversarial fixtures +
Hole 1/2a/2b/QueryMultiNodePackets exact-match assertions, EXPLAIN QUERY
PLAN index check, migration backfill correctness.
- `cmd/ingestor/from_pubkey_test.go` — write-time correctness
(BuildPacketData populates FromPubkey for ADVERT only;
InsertTransmission persists it; non-ADVERTs stay NULL).
- Existing test schemas (server v2, server v3, coverage) get the new
column **plus a SQLite trigger** that auto-populates `from_pubkey` from
`decoded_json` on ADVERT inserts. This means existing fixtures (which
only seed `decoded_json`) keep attributing correctly without per-test
edits.
- `seedTestData`'s ADVERTs explicitly set `from_pubkey`.

## Performance — index is used

```
$ EXPLAIN QUERY PLAN SELECT id FROM transmissions WHERE from_pubkey = ?
SEARCH transmissions USING INDEX idx_transmissions_from_pubkey (from_pubkey=?)
```

Asserted in `TestFromPubkeyIndexUsed`.

## Migration approach

- **Sync at boot**: `ALTER TABLE transmissions ADD COLUMN from_pubkey
TEXT` is a metadata-only operation in SQLite — microseconds regardless
of table size. `CREATE INDEX IF NOT EXISTS
idx_transmissions_from_pubkey` is **not** metadata-only: it scans the
table once. Empirically a few hundred ms on a 100K-row table; expect a
few seconds on a 10M-row table (one-time cost, blocking boot during that
window). Subsequent boots no-op via `IF NOT EXISTS`. If this boot delay
becomes an operational concern at prod scale we can defer the `CREATE
INDEX` to a goroutine — for now a few-second one-time delay is
acceptable.
- **Async**: row-level backfill of legacy NULL ADVERTs (chunked 5000 /
100ms yield). On a 100K-ADVERT prod DB, this completes in seconds in the
background; HTTP is fully available throughout.
- **Safety**: queries handle NULL gracefully — a node whose ADVERTs
haven't backfilled yet returns empty, identical to today's behaviour for
unknown pubkeys. No half-state regression.

## Out of scope (intentionally)

The free-text `LIKE` paths the issue explicitly leaves alone (e.g.
user-typed packet search) are untouched. Only the pubkey-attribution
sites get the column treatment.



## Cycle-3 review fixes

| Finding | Status | Commit |
|---|---|---|
| **M1c** — async-contract test was tautological (test's own `go`, not
production's) | Fixed | `23ace71` (red) → `a05b50c` (green) |
| **m1c** — package-global atomic resets unsafe under `t.Parallel()` |
Fixed (`// DO NOT t.Parallel` comment + `Reset()` helper) | rolled into
`23ace71` / `241ec69` |
| **m2c** — `/api/healthz` read 3 atomics non-atomically (torn snapshot)
| Fixed (single RWMutex-guarded snapshot + race test) | `241ec69` |
| **n3c.m1** — vestigial OR-scaffolding in `QueryMultiNodePackets` |
Fixed (cleanup) | `5a53ceb` |
| **n3c.m2** — verify PR body language about `ALTER` vs `CREATE INDEX` |
Verified accurate (already corrected in cycle 2) | (no change) |
| **n3c.m3** — `json.Unmarshal` per row in backfill → could use SQL
`json_extract` | **Deferred as known followup** — pure perf optimization
(current per-row Unmarshal is correct, just slower); SQL rewrite would
unwind the chunked-yield architecture and is non-trivial. Acceptable for
one-time backfill at boot on legacy DBs. |

### M1c implementation detail

`startFromPubkeyBackfill(dbPath, chunkSize, yieldDuration)` is now the
single production entry point used by `main.go`. It internally does `go
backfillFromPubkeyAsync(...)`. The test calls `startFromPubkeyBackfill`
(no `go` prefix) and asserts the dispatch returns within 50ms — so if
anyone removes the `go` keyword inside the wrapper, the test fails.
**Manually verified**: removing the `go` keyword causes
`TestBackfillFromPubkey_DoesNotBlockBoot` to fail with "backfill
dispatch took ~1s (>50ms): not async — would block boot."

### m2c implementation detail

`fromPubkeyBackfillTotal/Processed/Done` are now plain `int64`/`bool`
package globals guarded by a single `sync.RWMutex`.
`fromPubkeyBackfillSnapshot()` returns all three under one RLock.
`TestHealthzFromPubkeyBackfillConsistentSnapshot` races a writer
(lock-step total/processed updates with periodic done flips) against 8
readers hammering `/api/healthz`, asserting `processed<=total` and
`(done => processed==total)` on every response. Verified the test
catches torn reads (manually injected a 3-RLock implementation; test
failed within milliseconds with "processed>total" and "done=true but
processed!=total" errors).

---------

Co-authored-by: openclaw-bot <bot@openclaw.local>
Co-authored-by: openclaw-bot <bot@openclaw.dev>
2026-05-06 23:50:44 -07:00

435 lines
14 KiB
Go

package main
// Tests for issue #1143: pubkey attribution must use exact-match on a
// dedicated `from_pubkey` column, not `decoded_json LIKE '%pubkey%'`.
//
// These tests demonstrate the structural holes documented in #1143:
// Hole 1: name-LIKE fallback surfaces same-name nodes
// Hole 2a: an attacker can name themselves with someone else's pubkey
// and get their transmissions attributed to the victim
// Hole 2b: any 64-char hex substring inside decoded_json (path elements,
// channel names, message bodies) produces false positives
import (
"database/sql"
"fmt"
"strings"
"testing"
"time"
_ "modernc.org/sqlite"
)
const (
pkVictim = "f7181c468dfe7c55aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
pkAttacker = "deadbeefdeadbeefcccccccccccccccccccccccccccccccccccccccccccccccc"
pkOther = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
)
// seedAttribution inserts the standard adversarial fixture used by the
// issue #1143 tests. It returns the victim pubkey for convenience.
func seedAttribution(t *testing.T, db *DB) string {
t.Helper()
now := time.Now().UTC().Format(time.RFC3339)
// (1) Legitimate ADVERT from the victim.
mustExec(t, db, `INSERT INTO transmissions
(raw_hex, hash, first_seen, route_type, payload_type, decoded_json, from_pubkey)
VALUES ('AA','h_victim_advert',?,1,4,
'{"type":"ADVERT","pubKey":"`+pkVictim+`","name":"VictimNode"}',
?)`, now, pkVictim)
// (2) Hole 1: a different node sharing the *display name* "VictimNode".
mustExec(t, db, `INSERT INTO transmissions
(raw_hex, hash, first_seen, route_type, payload_type, decoded_json, from_pubkey)
VALUES ('BB','h_namespoof_advert',?,1,4,
'{"type":"ADVERT","pubKey":"`+pkOther+`","name":"VictimNode"}',
?)`, now, pkOther)
// (3) Hole 2a: malicious node whose *name* is the victim's pubkey.
// decoded_json contains pkVictim as a substring (in the name field),
// but the actual originator is pkAttacker.
mustExec(t, db, `INSERT INTO transmissions
(raw_hex, hash, first_seen, route_type, payload_type, decoded_json, from_pubkey)
VALUES ('CC','h_spoof_advert',?,1,4,
'{"type":"ADVERT","pubKey":"`+pkAttacker+`","name":"`+pkVictim+`"}',
?)`, now, pkAttacker)
// (4) Hole 2b: free-text packet (e.g. channel message) whose body
// coincidentally contains the victim's pubkey as a substring.
// Real originator is pkAttacker; from_pubkey reflects that.
mustExec(t, db, `INSERT INTO transmissions
(raw_hex, hash, first_seen, route_type, payload_type, decoded_json, from_pubkey)
VALUES ('DD','h_freetext_msg',?,1,5,
'{"type":"GRP_TXT","text":"hello `+pkVictim+` how are you"}',
?)`, now, pkAttacker)
return pkVictim
}
func mustExec(t *testing.T, db *DB, q string, args ...interface{}) {
t.Helper()
if _, err := db.conn.Exec(q, args...); err != nil {
t.Fatalf("exec failed: %v\nquery: %s", err, q)
}
}
func hashesOf(rows []map[string]interface{}) []string {
out := make([]string, 0, len(rows))
for _, r := range rows {
if h, ok := r["hash"].(string); ok {
out = append(out, h)
}
}
return out
}
func TestRecentTransmissions_Hole1_SameNameDifferentPubkey(t *testing.T) {
db := setupTestDB(t)
defer db.Close()
victim := seedAttribution(t, db)
got, err := db.GetRecentTransmissionsForNode(victim, 20)
if err != nil {
t.Fatal(err)
}
hashes := hashesOf(got)
for _, h := range hashes {
if h == "h_namespoof_advert" {
t.Fatalf("Hole 1: same-name node was attributed to the victim. got hashes=%v", hashes)
}
}
}
func TestRecentTransmissions_Hole2a_PubkeyAsNameSpoof(t *testing.T) {
db := setupTestDB(t)
defer db.Close()
victim := seedAttribution(t, db)
got, err := db.GetRecentTransmissionsForNode(victim, 20)
if err != nil {
t.Fatal(err)
}
hashes := hashesOf(got)
for _, h := range hashes {
if h == "h_spoof_advert" {
t.Fatalf("Hole 2a: attacker who named themselves with victim's pubkey "+
"was attributed to the victim. got hashes=%v", hashes)
}
}
}
func TestRecentTransmissions_Hole2b_FreeTextHexFalsePositive(t *testing.T) {
db := setupTestDB(t)
defer db.Close()
victim := seedAttribution(t, db)
got, err := db.GetRecentTransmissionsForNode(victim, 20)
if err != nil {
t.Fatal(err)
}
hashes := hashesOf(got)
for _, h := range hashes {
if h == "h_freetext_msg" {
t.Fatalf("Hole 2b: free-text containing the victim's pubkey as a "+
"substring produced a false positive. got hashes=%v", hashes)
}
}
}
func TestRecentTransmissions_LegitimateAdvertReturned(t *testing.T) {
db := setupTestDB(t)
defer db.Close()
victim := seedAttribution(t, db)
got, err := db.GetRecentTransmissionsForNode(victim, 20)
if err != nil {
t.Fatal(err)
}
hashes := hashesOf(got)
found := false
for _, h := range hashes {
if h == "h_victim_advert" {
found = true
break
}
}
if !found {
t.Fatalf("expected legitimate victim advert (h_victim_advert) in result, got %v", hashes)
}
}
// --- Multi-pubkey OR query (#1143 — db.go:1785) ---
func TestQueryMultiNodePackets_ExactMatchOnly(t *testing.T) {
db := setupTestDB(t)
defer db.Close()
seedAttribution(t, db)
// Query the victim's pubkey via the multi-node API. The malicious
// "name = victim pubkey" row and the free-text row must NOT show up.
res, err := db.QueryMultiNodePackets([]string{pkVictim}, 50, 0, "DESC", "", "")
if err != nil {
t.Fatal(err)
}
hashes := hashesOf(res.Packets)
for _, bad := range []string{"h_spoof_advert", "h_freetext_msg", "h_namespoof_advert"} {
for _, h := range hashes {
if h == bad {
t.Fatalf("QueryMultiNodePackets returned spurious match %q (pubkey %s as substring); hashes=%v",
bad, pkVictim, hashes)
}
}
}
// The legitimate one must still be present.
if !contains(hashes, "h_victim_advert") {
t.Fatalf("expected h_victim_advert in QueryMultiNodePackets result, got %v", hashes)
}
}
func contains(haystack []string, needle string) bool {
for _, s := range haystack {
if s == needle {
return true
}
}
return false
}
// --- Index sanity check (#1143 perf): verify EXPLAIN QUERY PLAN uses the
// new index, not a SCAN. ---
func TestFromPubkeyIndexUsed(t *testing.T) {
db := setupTestDB(t)
defer db.Close()
mustExec(t, db, `CREATE INDEX IF NOT EXISTS idx_transmissions_from_pubkey ON transmissions(from_pubkey)`)
rows, err := db.conn.Query(
`EXPLAIN QUERY PLAN SELECT id FROM transmissions WHERE from_pubkey = ?`,
pkVictim)
if err != nil {
t.Fatal(err)
}
defer rows.Close()
plan := ""
for rows.Next() {
var id, parent, notused int
var detail string
if err := rows.Scan(&id, &parent, &notused, &detail); err == nil {
plan += detail + "\n"
}
}
if !strings.Contains(plan, "idx_transmissions_from_pubkey") {
t.Fatalf("expected EXPLAIN QUERY PLAN to use idx_transmissions_from_pubkey, got:\n%s", plan)
}
}
// TestFromPubkeyIndexUsedForInClause verifies the index is used for the
// IN (?, ?, ...) query path used by QueryMultiNodePackets (db.go ~1787).
// Coverage extension — the equality path is covered above; this asserts
// the multi-node path doesn't silently regress to a full scan when the
// planner can't use the index for set membership.
func TestFromPubkeyIndexUsedForInClause(t *testing.T) {
db := setupTestDB(t)
defer db.Close()
mustExec(t, db, `CREATE INDEX IF NOT EXISTS idx_transmissions_from_pubkey ON transmissions(from_pubkey)`)
rows, err := db.conn.Query(
`EXPLAIN QUERY PLAN SELECT id FROM transmissions WHERE from_pubkey IN (?, ?)`,
pkVictim, pkOther)
if err != nil {
t.Fatal(err)
}
defer rows.Close()
plan := ""
for rows.Next() {
var id, parent, notused int
var detail string
if err := rows.Scan(&id, &parent, &notused, &detail); err == nil {
plan += detail + "\n"
}
}
if !strings.Contains(plan, "idx_transmissions_from_pubkey") {
t.Fatalf("expected EXPLAIN QUERY PLAN for IN(...) to use idx_transmissions_from_pubkey, got:\n%s", plan)
}
}
// --- Migration / backfill ---
func TestBackfillFromPubkey_AdvertRowsPopulated(t *testing.T) {
dir := t.TempDir()
dbPath := dir + "/test.db"
// Create a legacy-style DB: transmissions table WITHOUT from_pubkey,
// then run ensureFromPubkeyColumn to ALTER it in.
rw, err := sql.Open("sqlite", dbPath)
if err != nil {
t.Fatal(err)
}
if _, err := rw.Exec(`CREATE TABLE transmissions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
raw_hex TEXT, hash TEXT UNIQUE, first_seen TEXT,
route_type INTEGER, payload_type INTEGER, payload_version INTEGER,
decoded_json TEXT, created_at TEXT
)`); err != nil {
t.Fatal(err)
}
// Two ADVERTs (different pubkeys) and a non-ADVERT.
if _, err := rw.Exec(`INSERT INTO transmissions (raw_hex, hash, first_seen, payload_type, decoded_json) VALUES
('AA','m1','2026-01-01T00:00:00Z',4,'{"type":"ADVERT","pubKey":"`+pkVictim+`","name":"V"}'),
('BB','m2','2026-01-01T00:00:00Z',4,'{"type":"ADVERT","pubKey":"`+pkOther+`","name":"O"}'),
('CC','m3','2026-01-01T00:00:00Z',5,'{"type":"GRP_TXT","text":"hi"}')`); err != nil {
t.Fatal(err)
}
rw.Close()
if err := ensureFromPubkeyColumn(dbPath); err != nil {
t.Fatalf("ensureFromPubkeyColumn: %v", err)
}
// Run synchronously by calling the function directly.
backfillFromPubkeyAsync(dbPath, 100, 0)
// Verify backfill populated the ADVERT rows.
rw2, err := sql.Open("sqlite", dbPath)
if err != nil {
t.Fatal(err)
}
defer rw2.Close()
rows, err := rw2.Query("SELECT hash, from_pubkey FROM transmissions ORDER BY hash")
if err != nil {
t.Fatal(err)
}
defer rows.Close()
got := map[string]string{}
for rows.Next() {
var h string
var pk sql.NullString
if err := rows.Scan(&h, &pk); err != nil {
t.Fatal(err)
}
got[h] = pk.String
}
if got["m1"] != pkVictim {
t.Errorf("m1 from_pubkey = %q, want %q", got["m1"], pkVictim)
}
if got["m2"] != pkOther {
t.Errorf("m2 from_pubkey = %q, want %q", got["m2"], pkOther)
}
// Non-ADVERT row was not in the backfill scope; from_pubkey stays NULL.
if got["m3"] != "" {
t.Errorf("m3 from_pubkey = %q, want empty (NULL)", got["m3"])
}
}
// TestBackfillFromPubkey_DoesNotBlockBoot exercises the async contract:
// main.go (cmd/server/main.go) calls startFromPubkeyBackfill, which is the
// SAME entry point used at production startup. The wrapper must dispatch
// the backfill in a goroutine; if anyone removes the `go` keyword inside
// startFromPubkeyBackfill, this test fails because the call no longer
// returns within the 50ms boot dispatch budget. The test does NOT use `go`
// itself — that would test only the test's own scheduler, not the
// production code path (cycle-3 M1c).
//
// DO NOT t.Parallel — uses package-global atomics
// (fromPubkeyBackfillTotal/Processed/Done). Concurrent tests would clobber
// the resets (cycle-3 m1c).
func TestBackfillFromPubkey_DoesNotBlockBoot(t *testing.T) {
dir := t.TempDir()
dbPath := dir + "/async_boot.db"
rw, err := sql.Open("sqlite", dbPath)
if err != nil {
t.Fatal(err)
}
if _, err := rw.Exec(`CREATE TABLE transmissions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
raw_hex TEXT, hash TEXT UNIQUE, first_seen TEXT,
route_type INTEGER, payload_type INTEGER, payload_version INTEGER,
decoded_json TEXT, created_at TEXT
)`); err != nil {
t.Fatal(err)
}
// Insert N=1000 legacy ADVERT rows. With chunkSize=100 + yield=100ms
// between chunks, sync would be ~900ms; we assert dispatch is <50ms.
tx, err := rw.Begin()
if err != nil {
t.Fatal(err)
}
stmt, err := tx.Prepare(`INSERT INTO transmissions
(raw_hex, hash, first_seen, payload_type, decoded_json) VALUES (?, ?, ?, 4, ?)`)
if err != nil {
t.Fatal(err)
}
const N = 1000
for i := 0; i < N; i++ {
hash := fmt.Sprintf("h_async_boot_%d", i)
dj := fmt.Sprintf(`{"type":"ADVERT","pubKey":"%s","name":"N%d"}`, pkVictim, i)
if _, err := stmt.Exec("AA", hash, "2026-01-01T00:00:00Z", dj); err != nil {
t.Fatal(err)
}
}
stmt.Close()
if err := tx.Commit(); err != nil {
t.Fatal(err)
}
rw.Close()
if err := ensureFromPubkeyColumn(dbPath); err != nil {
t.Fatalf("ensureFromPubkeyColumn: %v", err)
}
// Reset all backfill state — other tests may have set it.
fromPubkeyBackfillReset()
defer fromPubkeyBackfillReset()
// Dispatch via the production wrapper. startFromPubkeyBackfill is the
// same entry point main.go calls at boot; it must launch the backfill
// in a goroutine internally. We deliberately do NOT prefix `go` here —
// if the wrapper is ever made synchronous, the dispatch budget below
// fires first.
t0 := time.Now()
startFromPubkeyBackfill(dbPath, 100, 100*time.Millisecond)
dispatchElapsed := time.Since(t0)
// (a) Boot-time dispatch budget: must return ~immediately.
if dispatchElapsed > 50*time.Millisecond {
t.Fatalf("backfill dispatch took %v (>50ms): not async — would block boot", dispatchElapsed)
}
// (b) Eventual completion via the fromPubkeyBackfill snapshot.
deadline := time.Now().Add(30 * time.Second)
for time.Now().Before(deadline) {
if _, _, done := fromPubkeyBackfillSnapshot(); done {
break
}
time.Sleep(50 * time.Millisecond)
}
if _, _, done := fromPubkeyBackfillSnapshot(); !done {
t.Fatalf("backfill never flipped Done within 30s; dispatched=%v", dispatchElapsed)
}
// (c) Backfill actually populated rows.
rw2, err := sql.Open("sqlite", dbPath)
if err != nil {
t.Fatal(err)
}
defer rw2.Close()
var nullCount int
if err := rw2.QueryRow(
`SELECT COUNT(*) FROM transmissions WHERE payload_type = 4 AND from_pubkey IS NULL`,
).Scan(&nullCount); err != nil {
t.Fatal(err)
}
if nullCount > 0 {
t.Errorf("backfill left %d ADVERT rows with NULL from_pubkey", nullCount)
}
if _, processed, _ := fromPubkeyBackfillSnapshot(); processed != int64(N) {
t.Errorf("fromPubkeyBackfillProcessed = %d, want %d", processed, N)
}
}