# Packet Deduplication Design ## The Problem A single physical RF transmission gets recorded as N rows in the DB, where N = number of observers that heard it. Each row has the same `hash` but different `path_json` and `observer_id`. ### Example ``` Pkt 1 repeat 1: Path: A→B→C→D→E (observer E) Pkt 1 repeat 2: Path: A→B→F→G (observer G) Pkt 1 repeat 3: Path: A→C→H→J→K (observer K) ``` - Repeater A sent 1 packet, not 3 - Repeater B sent 1 packet, not 2 (C and F both heard the same broadcast) - The hash is identical across all 3 rows ### Why the hash works `computeContentHash()` = `SHA256(header_byte + payload)`, skipping path hops. Two observations of the same original packet through different paths produce the same hash. This is the dedup key. ## What's inflated (and what's not) | Context | Current (inflated?) | Correct behavior | |---------|-------------------|------------------| | Node "total packets" | COUNT(*) — inflated | COUNT(DISTINCT hash) for transmissions | | Packets/hour on observer page | Raw count | Correct — each observer DID receive it | | Node analytics throughput | Inflated | DISTINCT hash | | Live map animations | N animations per physical packet | 1 animation? Or 1 per path? TBD | | "Heard By" table | Observations per observer | Correct as-is | | RF analytics (SNR/RSSI) | Mixes observations | Each observation has its own SNR — all valid | | Topology/path analysis | All paths shown | All paths are valuable — don't discard | | Packet list (grouped mode) | Groups by hash already | Probably fine | | Packet list (ungrouped) | Shows every observation | Maybe show distinct, expand for repeats? | ## Key Principle **Observations are valuable data — never discard them.** The paths tell you about mesh topology, coverage, and redundancy. But **counts displayed to users should reflect reality** (1 transmission = 1 count). ## Design Decisions Needed 1. **What does "packets" mean in node detail?** Unique transmissions? Total observations? Both? 2. **Live map**: 1 animation with multiple path lines? Or 1 per observation? 3. **Analytics charts**: Should throughput charts show transmissions or observations? 4. **Packet list default view**: Group by hash by default? 5. **New metric: "observation ratio"?** — avg observations per transmission tells you about mesh redundancy/coverage ## Work Items - [ ] **DB/API: Add distinct counts** — `findPacketsForNode()` and health endpoint should return both `totalTransmissions` (DISTINCT hash) and `totalObservations` (COUNT(*)) - [ ] **Node detail UI** — show "X transmissions seen Y times" or similar - [ ] **Bulk health / network status** — use distinct hash counts - [ ] **Node analytics charts** — throughput should use distinct hashes - [ ] **Packets page default** — consider grouping by hash by default - [ ] **Live map** — decide on animation strategy for repeated observations - [ ] **Observer page** — observation count is correct, but could add "unique packets" column - [ ] **In-memory store** — add hash→[packets] index if not already there (check `pktStore.byHash`) - [ ] **API: packet siblings** — `/api/packets/:id/siblings` or `?groupByHash=true` (may already exist) - [ ] **RF analytics** — keep all observations for SNR/RSSI (each is a real measurement) but label counts correctly - [ ] **"Coverage ratio" metric** — avg(observations per unique hash) per node/observer — measures mesh redundancy ## Live Map Animation Design ### Current behavior Every observation triggers a separate animation. Same packet heard by 3 observers = 3 independent route animations. Looks like 3 packets when it was 1. ### Options considered **Option A: Single animation, all paths simultaneously (PREFERRED)** When a hash first arrives, buffer briefly (500ms-2s) for sibling observations, then animate all paths at once. One pulse from origin, multiple route lines fanning out simultaneously. Most accurate — this IS what physically happened: one RF burst propagating through the mesh along multiple paths at once. Timing challenge: observations don't arrive simultaneously (seconds apart). Need to buffer the first observation, wait for siblings, then render all together. Adds slight latency to "live" feel. **Option B: Single animation, "best" path only** — REJECTED Pick shortest/highest-SNR path, animate only that. Clean but loses coverage/redundancy info. **Option C: Single origin pulse, staggered path reveals** — REJECTED Origin pulses once, paths draw in sequence with delay. Dramatic but busy, and doesn't reflect reality (the propagation is simultaneous). **Option D: Animate first, suppress siblings** — REJECTED (pragmatic but inaccurate) First observation gets animation, subsequent same-hash observations silently logged. Simple but you never see alternate paths on the live map. ### Implementation notes (for when we build this) - Need a client-side hash buffer: `Map` - On first WS packet with new hash: start timer (configurable, ~1-2s) - On subsequent packets with same hash: add to buffer, reset/extend timer - On timer expiry: animate all buffered paths for that hash simultaneously - Feed sidebar could show consolidated entry: "1 packet, 3 paths" with expand - Buffer window should be configurable (config.json) ## Status **Discussion phase** — no code changes yet. User wants to finalize design before implementation. Live map changes tabled for later.