dandri/meshcore-analyzer

Fork 0

mirror of https://github.com/Kpa-clawbot/meshcore-analyzer.git synced 2026-03-29 10:50:17 +00:00

Files

you d621a9f34a Make ADVERT node names clickable links to node detail page

2026-03-21 00:09:30 +00:00

9.5 KiB

Raw Permalink Blame History

Packet Deduplication — Normalized Schema Migration Plan

Overview

Split the monolithic packets table into two tables:

packets — one row per unique physical transmission (keyed by content hash)
observations — one row per observer sighting (SNR, RSSI, path, observer, timestamp)

This fixes inflated packet counts across the entire app and enables proper "1 transmission seen N times" semantics.

Current State

packets table: 1 row per observation. ~61MB, ~30K+ rows. Same hash appears N times (once per observer). Fields mix transmission data (raw_hex, payload_type, decoded_json, hash) with observation data (observer_id, snr, rssi, path_json).

packet-store.js: In-memory mirror of packets table. Indexes: byId, byHash (hash → [packets]), byObserver, byNode. All reads served from RAM. SQLite is write-only for packets.

Touch surface: ~66 SQL queries across db.js/server.js/packet-store.js. ~12 frontend files consume packet data.

Milestone 1: Schema Migration (Backend Only)

Goal: New tables exist, data migrated, old table preserved as backup. No behavioral changes yet.

Tasks

Create new schema in db.js init:

CREATE TABLE IF NOT EXISTS transmissions (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  raw_hex TEXT NOT NULL,
  hash TEXT NOT NULL UNIQUE,
  first_seen TEXT NOT NULL,
  route_type INTEGER,
  payload_type INTEGER,
  payload_version INTEGER,
  decoded_json TEXT,
  created_at TEXT DEFAULT (datetime('now'))
);

CREATE TABLE IF NOT EXISTS observations (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  transmission_id INTEGER NOT NULL REFERENCES transmissions(id),
  hash TEXT NOT NULL,
  observer_id TEXT,
  observer_name TEXT,
  direction TEXT,
  snr REAL,
  rssi REAL,
  score INTEGER,
  path_json TEXT,
  timestamp TEXT NOT NULL,
  created_at TEXT DEFAULT (datetime('now'))
);

CREATE INDEX idx_transmissions_hash ON transmissions(hash);
CREATE INDEX idx_transmissions_first_seen ON transmissions(first_seen);
CREATE INDEX idx_transmissions_payload_type ON transmissions(payload_type);
CREATE INDEX idx_observations_hash ON observations(hash);
CREATE INDEX idx_observations_transmission_id ON observations(transmission_id);
CREATE INDEX idx_observations_observer_id ON observations(observer_id);
CREATE INDEX idx_observations_timestamp ON observations(timestamp);

Write migration script (scripts/migrate-dedup.js):
- Read all rows from packets ordered by timestamp
- Group by hash
- For each unique hash: INSERT into transmissions (use first observation's raw_hex, decoded_json, etc.)
- For each row: INSERT into observations with foreign key to transmission
- Verify counts: SELECT COUNT(*) FROM observations = old packets count
- Verify: SELECT COUNT(*) FROM transmissions < observations count
- Do NOT drop old packets table — rename to packets_backup
Print migration stats: total packets, unique transmissions, dedup ratio, time taken

Validation

COUNT(*) FROM observations = COUNT(*) FROM packets_backup
COUNT(*) FROM transmissions = COUNT(DISTINCT hash) FROM packets_backup
Spot-check: pick 5 known multi-observer packets, verify transmission + observations match

Risk: LOW — additive only, old data preserved

Milestone 2: Dual-Write Ingest

Goal: New packets written to both old and new tables. Read path unchanged. Zero downtime.

Tasks

Update db.js insertPacket():
- On new packet: check if transmissions row exists for hash
- If not: INSERT into transmissions, get id
- If yes: UPDATE first_seen if this timestamp is earlier
- INSERT into observations with transmission_id
- Still also write to old packets table (dual-write for safety)
Update packet-store.js insert(): Mirror the dual-write in memory model
- Maintain both old flat array AND new byTransmission Map

Validation

Send test packets, verify they appear in both old and new tables
Verify multi-observer packet creates 1 transmission + N observations

Risk: LOW — old read path still works as fallback

Milestone 3: In-Memory Store Restructure

Goal: packet-store.js switches from flat packet array to transmission-centric model.

Tasks

New in-memory data model:
```
transmissions: Map<hash, {id, raw_hex, hash, first_seen, payload_type, decoded_json, observations: []}>
```
Each observation: {id, observer_id, observer_name, snr, rssi, path_json, timestamp}
Update indexes:
- byHash: hash → transmission object (1:1 instead of 1:N)
- byObserver: observer_id → [observation references]
- byNode: pubkey → [transmission references] (deduped!)
- byId: observation.id → observation (for backward compat with packet detail links)
Update load(): Read from transmissions JOIN observations instead of packets
Update query methods:
- findPackets() — returns transmissions by default, with .observations attached
- findPacketsForNode() — returns transmissions where node appears in ANY observation's path/decoded_json
- getSiblings() — becomes getObservations(hash) — trivial, just return transmission.observations
- countForNode() — returns {transmissions: N, observations: M}

Validation

All existing API endpoints return valid data
Packet counts decrease (correctly!) for multi-observer nodes
/api/perf shows no regression

Risk: MEDIUM — core read path changes. Test thoroughly.

Milestone 4: API Response Changes

Goal: APIs return deduped data with observation counts.

Tasks

GET /api/packets:
- Default: return transmissions (1 row per unique packet)
- Each transmission includes observation_count and optionally observations[]
- ?expand=observations to include full observation list
- ?groupByHash becomes the default behavior (deprecate param)
- Preserve observer filter: return transmissions where at least one observation matches
GET /api/nodes/:pubkey/health:
- stats.totalPackets → stats.totalTransmissions (distinct hashes)
- Add stats.totalObservations (old count, for reference)
- recentPackets → returns transmissions with observation_count
GET /api/nodes/bulk-health: Same changes as health
GET /api/nodes/network-status: Use transmission counts
GET /api/nodes/:pubkey/analytics: All throughput charts use transmission counts
WebSocket broadcast: Include observation_count when sibling observations exist for same hash

Backward Compatibility

Add ?legacy=1 param that returns old-style flat observations (for any external consumers)
Include both totalTransmissions and totalObservations in health responses during transition

Risk: MEDIUM — frontend expects certain shapes. May need coordinated deploy with Milestone 5.

Milestone 5: Frontend Updates

Goal: UI shows correct counts and leverages observation data.

Tasks

Packets page:
- Default view shows transmissions (already has groupByHash mode — make it default)
- Expand row to see individual observations with their paths/SNR/RSSI
- Badge: "×3 observers" on grouped rows
Node detail panel (nodes.js + live.js):
- Show "X transmissions" not "X packets"
- Or "X packets (seen Y times)" to show both
Home page: Network stats use transmission counts
Node analytics: Throughput charts use transmissions
Observer detail: Keep observation counts (correct metric for observers)
Analytics page: Topology/RF analysis uses all observations (SNR per observation is valid data)

Risk: LOW-MEDIUM — mostly display changes

Milestone 6: Cleanup

Goal: Remove dual-write, drop old table, clean up.

Tasks

Remove dual-write from insertPacket()
Drop packets_backup table (after confirming everything works for 1+ week)
Remove ?legacy=1 support if unused
Update DEDUP-DESIGN.md → mark as complete
VACUUM the database
Tag release (v2.3.0?)

Risk: LOW — cleanup only, all functional changes already proven

Estimated Scope

Milestone	Files Modified	Complexity	Can Deploy Independently?
1. Schema Migration	db.js, new script	Low	Yes — additive only
2. Dual-Write	db.js, packet-store.js	Low	Yes — old reads unchanged
3. Memory Store	packet-store.js	Medium	No — must deploy with M4
4. API Changes	server.js, db.js	Medium	No — must deploy with M5
5. Frontend	8+ public/*.js files	Medium	No — must deploy with M4
6. Cleanup	db.js, server.js	Low	Yes — after bake period

Milestones 1-2: Safe to deploy independently, no user-visible changes.
Milestones 3-5: Must ship together (API shape changes + frontend expects new shape).
Milestone 6: Ship after 1 week bake.

Open Questions

Table naming: transmissions + observations? Or keep packets + add observations? The word "transmission" is more accurate but "packet" is what the whole UI calls them.
Packet detail URLs: Currently #/packet/123 uses the observation ID. Keep observation IDs as the URL key? Or switch to hash?
Path dedup in paths table: The paths table also has per-observation entries. Normalize that too, or leave as-is?
Migration on prod: Run migration script before deploying new code, or make new code handle both old and new schema?

9.5 KiB Raw Permalink Blame History Unescape Escape

Packet Deduplication — Normalized Schema Migration Plan

Overview

Current State

Milestone 1: Schema Migration (Backend Only)

Tasks

Validation

Risk: LOW — additive only, old data preserved

Milestone 2: Dual-Write Ingest

Tasks

Validation

Risk: LOW — old read path still works as fallback

Milestone 3: In-Memory Store Restructure

Tasks

Validation

Risk: MEDIUM — core read path changes. Test thoroughly.

Milestone 4: API Response Changes

Tasks

Backward Compatibility

Risk: MEDIUM — frontend expects certain shapes. May need coordinated deploy with Milestone 5.

Milestone 5: Frontend Updates

Tasks

Risk: LOW-MEDIUM — mostly display changes

Milestone 6: Cleanup

Tasks

Risk: LOW — cleanup only, all functional changes already proven

Estimated Scope

Open Questions

9.5 KiB

Raw Permalink Blame History