5.4 KiB
Packet Deduplication Design
The Problem
A single physical RF transmission gets recorded as N rows in the DB, where N = number of observers that heard it. Each row has the same hash but different path_json and observer_id.
Example
Pkt 1 repeat 1: Path: A→B→C→D→E (observer E)
Pkt 1 repeat 2: Path: A→B→F→G (observer G)
Pkt 1 repeat 3: Path: A→C→H→J→K (observer K)
- Repeater A sent 1 packet, not 3
- Repeater B sent 1 packet, not 2 (C and F both heard the same broadcast)
- The hash is identical across all 3 rows
Why the hash works
computeContentHash() = SHA256(header_byte + payload), skipping path hops. Two observations of the same original packet through different paths produce the same hash. This is the dedup key.
What's inflated (and what's not)
| Context | Current (inflated?) | Correct behavior |
|---|---|---|
| Node "total packets" | COUNT(*) — inflated | COUNT(DISTINCT hash) for transmissions |
| Packets/hour on observer page | Raw count | Correct — each observer DID receive it |
| Node analytics throughput | Inflated | DISTINCT hash |
| Live map animations | N animations per physical packet | 1 animation? Or 1 per path? TBD |
| "Heard By" table | Observations per observer | Correct as-is |
| RF analytics (SNR/RSSI) | Mixes observations | Each observation has its own SNR — all valid |
| Topology/path analysis | All paths shown | All paths are valuable — don't discard |
| Packet list (grouped mode) | Groups by hash already | Probably fine |
| Packet list (ungrouped) | Shows every observation | Maybe show distinct, expand for repeats? |
Key Principle
Observations are valuable data — never discard them. The paths tell you about mesh topology, coverage, and redundancy. But counts displayed to users should reflect reality (1 transmission = 1 count).
Design Decisions Needed
- What does "packets" mean in node detail? Unique transmissions? Total observations? Both?
- Live map: 1 animation with multiple path lines? Or 1 per observation?
- Analytics charts: Should throughput charts show transmissions or observations?
- Packet list default view: Group by hash by default?
- New metric: "observation ratio"? — avg observations per transmission tells you about mesh redundancy/coverage
Work Items
- DB/API: Add distinct counts —
findPacketsForNode()and health endpoint should return bothtotalTransmissions(DISTINCT hash) andtotalObservations(COUNT(*)) - Node detail UI — show "X transmissions seen Y times" or similar
- Bulk health / network status — use distinct hash counts
- Node analytics charts — throughput should use distinct hashes
- Packets page default — consider grouping by hash by default
- Live map — decide on animation strategy for repeated observations
- Observer page — observation count is correct, but could add "unique packets" column
- In-memory store — add hash→[packets] index if not already there (check
pktStore.byHash) - API: packet siblings —
/api/packets/:id/siblingsor?groupByHash=true(may already exist) - RF analytics — keep all observations for SNR/RSSI (each is a real measurement) but label counts correctly
- "Coverage ratio" metric — avg(observations per unique hash) per node/observer — measures mesh redundancy
Live Map Animation Design
Current behavior
Every observation triggers a separate animation. Same packet heard by 3 observers = 3 independent route animations. Looks like 3 packets when it was 1.
Options considered
Option A: Single animation, all paths simultaneously (PREFERRED) When a hash first arrives, buffer briefly (500ms-2s) for sibling observations, then animate all paths at once. One pulse from origin, multiple route lines fanning out simultaneously. Most accurate — this IS what physically happened: one RF burst propagating through the mesh along multiple paths at once.
Timing challenge: observations don't arrive simultaneously (seconds apart). Need to buffer the first observation, wait for siblings, then render all together. Adds slight latency to "live" feel.
Option B: Single animation, "best" path only — REJECTED Pick shortest/highest-SNR path, animate only that. Clean but loses coverage/redundancy info.
Option C: Single origin pulse, staggered path reveals — REJECTED Origin pulses once, paths draw in sequence with delay. Dramatic but busy, and doesn't reflect reality (the propagation is simultaneous).
Option D: Animate first, suppress siblings — REJECTED (pragmatic but inaccurate) First observation gets animation, subsequent same-hash observations silently logged. Simple but you never see alternate paths on the live map.
Implementation notes (for when we build this)
- Need a client-side hash buffer:
Map<hash, {timer, packets[]}> - On first WS packet with new hash: start timer (configurable, ~1-2s)
- On subsequent packets with same hash: add to buffer, reset/extend timer
- On timer expiry: animate all buffered paths for that hash simultaneously
- Feed sidebar could show consolidated entry: "1 packet, 3 paths" with expand
- Buffer window should be configurable (config.json)
Status
Discussion phase — no code changes yet. Iavor wants to finalize design before implementation. Live map changes tabled for later.