Packet Deduplication Design

The Problem

A single physical RF transmission gets recorded as N rows in the DB, where N = number of observers that heard it. Each row has the same hash but different path_json and observer_id.

Example

Pkt 1 repeat 1: Path: A→B→C→D→E   (observer E)
Pkt 1 repeat 2: Path: A→B→F→G      (observer G)
Pkt 1 repeat 3: Path: A→C→H→J→K    (observer K)

Repeater A sent 1 packet, not 3
Repeater B sent 1 packet, not 2 (C and F both heard the same broadcast)
The hash is identical across all 3 rows

Why the hash works

computeContentHash() = SHA256(header_byte + payload), skipping path hops. Two observations of the same original packet through different paths produce the same hash. This is the dedup key.

What's inflated (and what's not)

Context	Current (inflated?)	Correct behavior
Node "total packets"	COUNT(*) — inflated	COUNT(DISTINCT hash) for transmissions
Packets/hour on observer page	Raw count	Correct — each observer DID receive it
Node analytics throughput	Inflated	DISTINCT hash
Live map animations	N animations per physical packet	1 animation? Or 1 per path? TBD
"Heard By" table	Observations per observer	Correct as-is
RF analytics (SNR/RSSI)	Mixes observations	Each observation has its own SNR — all valid
Topology/path analysis	All paths shown	All paths are valuable — don't discard
Packet list (grouped mode)	Groups by hash already	Probably fine
Packet list (ungrouped)	Shows every observation	Maybe show distinct, expand for repeats?

Key Principle

Observations are valuable data — never discard them. The paths tell you about mesh topology, coverage, and redundancy. But counts displayed to users should reflect reality (1 transmission = 1 count).

Design Decisions Needed

What does "packets" mean in node detail? Unique transmissions? Total observations? Both?
Live map: 1 animation with multiple path lines? Or 1 per observation?
Analytics charts: Should throughput charts show transmissions or observations?
Packet list default view: Group by hash by default?
New metric: "observation ratio"? — avg observations per transmission tells you about mesh redundancy/coverage

Work Items

DB/API: Add distinct counts — findPacketsForNode() and health endpoint should return both totalTransmissions (DISTINCT hash) and totalObservations (COUNT(*))
Node detail UI — show "X transmissions seen Y times" or similar
Bulk health / network status — use distinct hash counts
Node analytics charts — throughput should use distinct hashes
Packets page default — consider grouping by hash by default
Live map — decide on animation strategy for repeated observations
Observer page — observation count is correct, but could add "unique packets" column
In-memory store — add hash→[packets] index if not already there (check pktStore.byHash)
API: packet siblings — /api/packets/:id/siblings or ?groupByHash=true (may already exist)
RF analytics — keep all observations for SNR/RSSI (each is a real measurement) but label counts correctly
"Coverage ratio" metric — avg(observations per unique hash) per node/observer — measures mesh redundancy

Live Map Animation Design

Current behavior

Every observation triggers a separate animation. Same packet heard by 3 observers = 3 independent route animations. Looks like 3 packets when it was 1.

Options considered

Option A: Single animation, all paths simultaneously (PREFERRED) When a hash first arrives, buffer briefly (500ms-2s) for sibling observations, then animate all paths at once. One pulse from origin, multiple route lines fanning out simultaneously. Most accurate — this IS what physically happened: one RF burst propagating through the mesh along multiple paths at once.

Timing challenge: observations don't arrive simultaneously (seconds apart). Need to buffer the first observation, wait for siblings, then render all together. Adds slight latency to "live" feel.

Option B: Single animation, "best" path only — REJECTED Pick shortest/highest-SNR path, animate only that. Clean but loses coverage/redundancy info.

Option C: Single origin pulse, staggered path reveals — REJECTED Origin pulses once, paths draw in sequence with delay. Dramatic but busy, and doesn't reflect reality (the propagation is simultaneous).

Option D: Animate first, suppress siblings — REJECTED (pragmatic but inaccurate) First observation gets animation, subsequent same-hash observations silently logged. Simple but you never see alternate paths on the live map.

Implementation notes (for when we build this)

Need a client-side hash buffer: Map<hash, {timer, packets[]}>
On first WS packet with new hash: start timer (configurable, ~1-2s)
On subsequent packets with same hash: add to buffer, reset/extend timer
On timer expiry: animate all buffered paths for that hash simultaneously
Feed sidebar could show consolidated entry: "1 packet, 3 paths" with expand
Buffer window should be configurable (config.json)

Status

Discussion phase — no code changes yet. Iavor wants to finalize design before implementation. Live map changes tabled for later.

5.4 KiB Raw Blame History