mirror of
https://github.com/simplex-chat/simplex-chat.git
synced 2026-05-11 21:55:04 +00:00
docs: channel subscriber profiles plan (#6918)
This commit is contained in:
@@ -0,0 +1,237 @@
|
||||
# Plan: Member Profile Sending in Channels
|
||||
|
||||
## Context
|
||||
|
||||
In channels (relayed groups), subscribers don't know profiles of other subscribers. When subscriber A sends a reaction/message that gets forwarded to subscriber B, B creates an "unknown member" record with a synthesized name. This degrades UX — subscribers see "unknown member" instead of real profiles.
|
||||
|
||||
We can't eagerly send all subscriber profiles to all subscribers (doesn't scale to 100K+ channels). We need on-demand, deduplicated profile delivery: the relay tracks which subscribers have received which sender's profile, and prepends profile info when forwarding a message from a sender the recipient doesn't know.
|
||||
|
||||
## Approach: Vector-tracked profile delivery
|
||||
|
||||
### Core idea
|
||||
|
||||
Each member record on the relay stores a `sent_profile_vector BLOB` — a byte vector where position `i` represents the recipient at `index_in_group = i`. Value 0 = profile not sent, non-zero = sent.
|
||||
|
||||
When the relay forwards a batch (possibly from multiple senders):
|
||||
1. Collect distinct senders in the batch. Load each sender's `sent_profile_vector`.
|
||||
2. For each cursor-batch of recipients, partition into two groups:
|
||||
- **Knows all**: recipient's index is marked as sent in every sender's vector → gets bare batch
|
||||
- **Needs profiles**: recipient's index is unmarked in at least one sender's vector → gets batch with all sender profiles prepended as `XGrpMemNew` elements
|
||||
3. Update all senders' vectors to mark recipients who were delivered to.
|
||||
|
||||
When a sender updates their profile (relay receives `XInfo`): clear that sender's `sent_profile_vector`, so the updated profile is re-sent on next forwarded message.
|
||||
|
||||
In steady state, most long-standing subscribers have received all active senders' profiles from previous deliveries. The "knows all" group dominates; the "needs profiles" group consists mainly of newcomers and is small. The partition converges quickly to near-zero redundancy.
|
||||
|
||||
### Why this approach
|
||||
|
||||
**Considered alternatives:**
|
||||
- **Include profile in every FwdSender**: Wastes bandwidth sending profile on every message.
|
||||
- **Subscriber requests profile from relay**: Adds latency (round-trip) and new request-response protocol complexity.
|
||||
- **Separate delivery worker** (using commented-out `DWSMemberProfileUpdate` stubs): Harder to guarantee ordering (profile must arrive before message).
|
||||
- **Bloom filters / epoch-based**: Same storage complexity as vectors, more complex to implement, probabilistic (false positives).
|
||||
|
||||
**Advantages of prepend-to-batch approach:**
|
||||
- Profile + forwarded message arrive in a single SMP message (no extra 16KB block overhead)
|
||||
- SMP guarantees in-order processing within a batch
|
||||
- No protocol changes — `XGrpMemNew` is already handled by subscribers
|
||||
- No subscriber-side code changes for receiving
|
||||
|
||||
### Design decisions to discuss
|
||||
|
||||
**1. Bit-level vs byte-level vector**
|
||||
|
||||
Byte-per-position is consistent with `member_relations_vector` but uses 8x more space. For 100K members: byte=100KB/sender, bit=12.5KB/sender. With 1000 active senders: byte=100MB, bit=12.5MB. Byte is simpler; bit is more space-efficient. **Recommend: byte-level for consistency, optimize to bit-level later if needed.**
|
||||
|
||||
**2. Multi-sender batch profile strategy**
|
||||
|
||||
Channels batch tasks from multiple senders into one job (`singleSenderGMId_ = Nothing`). Profile tracking requires knowing which senders' profiles each recipient has seen. Three approaches:
|
||||
|
||||
**Option A — Per-sender precise targeting (rejected)**: For a batch with senders {A, B, C}, construct a separate batch variant for each combination of missing profiles: recipients missing only A get `profile(A) + batch`, those missing A and C get `profile(A) + profile(C) + batch`, etc. This produces up to 2^k batch variants for k senders — a combinatorial explosion that is fundamentally at odds with batching efficiency. Constructing nearly per-recipient blobs is worse than not batching at all. **Rejected.**
|
||||
|
||||
**Option B — All-or-nothing profile sidecar (probably preferable)**: Partition recipients into two groups: those who know ALL senders (get bare batch) and those missing ANY sender profile (get all sender profiles prepended). Only 2 batch variants regardless of sender count. Preserves current multi-sender batching — no changes to `getNextDeliveryTasks`. Some recipients may receive profiles they already know, but XGrpMemNew is idempotent (~200-500 bytes per profile), and this redundancy only occurs at the rare intersection of a multi-sender batch AND a partially-informed recipient. In steady state, long-standing subscribers know all active senders, so the "needs profiles" group shrinks to just newcomers.
|
||||
- Pros: preserves current batching, smaller diff (no `Store/Delivery.hs` changes), 2 variants only, fast convergence to zero-redundancy steady state
|
||||
- Cons: slight redundancy for partially-informed recipients in multi-sender batches (rare and transient)
|
||||
|
||||
**Option C — Force single-sender jobs**: Add `sender_group_member_id` filter to `getNextDeliveryTasks` for channels, same as fully connected groups. Each delivery job has exactly one sender, so profile sidecar is always one XGrpMemNew. Clean binary partition with zero redundancy.
|
||||
- Pros: zero redundant profiles, simplest per-job logic
|
||||
- Cons: changes delivery task query logic, slightly less batching efficiency (separate jobs per sender), though multi-sender batches are rare anyway
|
||||
|
||||
---
|
||||
|
||||
## Detailed changes
|
||||
|
||||
The code below assumes Option B (all-or-nothing sidecar). Option C would simplify section 4 (always one sender) and add a query change in `Store/Delivery.hs`.
|
||||
|
||||
### 1. Database migration
|
||||
|
||||
New migration file: `M{date}_sent_profile_vector.hs`
|
||||
|
||||
```sql
|
||||
ALTER TABLE group_members ADD COLUMN sent_profile_vector BLOB;
|
||||
```
|
||||
|
||||
**Files:**
|
||||
- `src/Simplex/Chat/Store/SQLite/Migrations/M{date}_sent_profile_vector.hs` (new)
|
||||
- `src/Simplex/Chat/Store/SQLite/Migrations.hs` (register migration)
|
||||
- `src/Simplex/Chat/Store/Postgres/Migrations/M{date}_sent_profile_vector.hs` (new)
|
||||
- `src/Simplex/Chat/Store/Postgres/Migrations.hs` (register migration)
|
||||
- `simplex-chat.cabal` (add module)
|
||||
|
||||
### 2. Sent profile vector operations
|
||||
|
||||
New functions in `src/Simplex/Chat/Store/Groups.hs`:
|
||||
|
||||
```haskell
|
||||
getSentProfileVector :: DB.Connection -> GroupMemberId -> IO ByteString
|
||||
|
||||
-- Expands vector if needed (same expand-on-write pattern as setRelation in Types/MemberRelations.hs)
|
||||
markProfilesSentToMembers :: DB.Connection -> GroupMemberId -> [Int64] -> IO ()
|
||||
|
||||
clearSentProfileVector :: DB.Connection -> GroupMemberId -> IO ()
|
||||
```
|
||||
|
||||
Pure helpers:
|
||||
```haskell
|
||||
isProfileSentTo :: ByteString -> Int64 -> Bool
|
||||
isProfileSentTo vec idx
|
||||
| idx < 0 || fromIntegral idx >= B.length vec = False
|
||||
| otherwise = B.index vec (fromIntegral idx) /= 0
|
||||
|
||||
markSentPositions :: [Int64] -> ByteString -> ByteString
|
||||
```
|
||||
|
||||
### 3. Profile batch element encoding
|
||||
|
||||
New functions in `src/Simplex/Chat/Messages/Batch.hs`:
|
||||
|
||||
```haskell
|
||||
-- Prepend an element to an existing binary batch body
|
||||
-- batchBody format: '=' <count:Word16> (<len:Word16> <element>)*
|
||||
-- Increments count and inserts element at front without parsing/re-encoding existing elements
|
||||
prependBatchElement :: ByteString -> ByteString -> ByteString
|
||||
|
||||
-- Encode XGrpMemNew as a batch-ready element for a given member
|
||||
-- Constructs ChatMessage with XGrpMemNew (memberToMemberInfo member) Nothing
|
||||
encodeMemberProfileElement :: VersionRangeChat -> GroupMember -> ByteString
|
||||
```
|
||||
|
||||
Check whether `memberInfo` or similar helper already exists for constructing `MemberInfo` from `GroupMember`.
|
||||
|
||||
### 4. Delivery job worker changes
|
||||
|
||||
**File:** `src/Simplex/Chat/Library/Subscriber.hs` — `processDeliveryJob` / `sendBodyToMembers`
|
||||
|
||||
In the channel path (`useRelays' gInfo`, `DJSGroup {}`):
|
||||
|
||||
**Before the cursor loop**, collect distinct senders from delivery tasks and load their profile data:
|
||||
```haskell
|
||||
senderProfiles <- forM (nub senderGMIds) $ \senderGMId -> do
|
||||
sender <- withStore $ \db -> getGroupMemberById db vr user senderGMId
|
||||
vec <- withStore' $ \db -> getSentProfileVector db senderGMId
|
||||
pure (senderGMId, sender, vec)
|
||||
|
||||
let profileElements = map (\(_, sender, _) -> encodeMemberProfileElement vr sender) senderProfiles
|
||||
extBody = foldl' (flip prependBatchElement) body profileElements
|
||||
```
|
||||
|
||||
**In the cursor loop**, partition recipients:
|
||||
```haskell
|
||||
sendLoop bucketSize cursorGMId_ = do
|
||||
mems <- withStore' $ \db -> getGroupMembersByCursor ...
|
||||
unless (null mems) $ do
|
||||
if null senderProfiles
|
||||
then deliver body mems
|
||||
else do
|
||||
let knowsAll m = all (\(_, _, vec) -> isProfileSentTo vec (indexInGroup' m)) senderProfiles
|
||||
(hasAllProfiles, needsProfiles) = partition knowsAll mems
|
||||
unless (null needsProfiles) $ deliver extBody needsProfiles
|
||||
unless (null hasAllProfiles) $ deliver body hasAllProfiles
|
||||
forM_ senderProfiles $ \(senderGMId, _, _) ->
|
||||
withStore' $ \db -> markProfilesSentToMembers db senderGMId
|
||||
(map indexInGroup' deliveredMems)
|
||||
...
|
||||
```
|
||||
|
||||
Only mark vector bits for members who were actually delivered to (those with `readyMemberConn`), not all members in the cursor batch — otherwise members without ready connections get marked as "profile sent" without receiving it.
|
||||
|
||||
### 5. Clear vector on profile update
|
||||
|
||||
**File:** `src/Simplex/Chat/Library/Subscriber.hs` — `xInfoMember`
|
||||
|
||||
After `processMemberProfileUpdate`, if the group uses relays and the user is the relay, clear the sender's vector:
|
||||
|
||||
```haskell
|
||||
xInfoMember gInfo m p' msg brokerTs = do
|
||||
void $ processMemberProfileUpdate gInfo m p' (Just (msg, brokerTs))
|
||||
when (useRelays' gInfo && isRelay (membership gInfo)) $
|
||||
withStore' $ \db -> clearSentProfileVector db (groupMemberId' m)
|
||||
pure $ memberEventDeliveryScope m
|
||||
```
|
||||
|
||||
When the vector is cleared and XInfo is forwarded, the delivery prepends XGrpMemNew before the forwarded XInfo. Recipients process both — XGrpMemNew creates/updates the member record, then XInfo updates it again. Slightly redundant but correct and harmless.
|
||||
|
||||
### 6. Set vector bits when relay announces members at join time
|
||||
|
||||
When a new subscriber joins and the relay sends `XGrpMemNew` for owners/existing announced members, set the corresponding bits in those members' `sent_profile_vector` for the new subscriber's index. The exact location needs to be identified during implementation — look for where the relay processes new member joins and sends XGrpMemNew announcements.
|
||||
|
||||
### 7. Update channel tests
|
||||
|
||||
**File:** `tests/ChatTests/Groups.hs`
|
||||
|
||||
Update `testChannels1RelayDeliver` and related tests:
|
||||
- After cath sends a reaction, dan and eve should no longer see "forwarded a message from an unknown member, creating unknown member record cath"
|
||||
- Instead, they receive cath's profile via XGrpMemNew (processed silently before the reaction)
|
||||
- Test assertions for dan and eve should show the reaction with cath's name
|
||||
|
||||
Add new tests:
|
||||
- Profile update triggers re-announcement (clear vector → re-send on next message)
|
||||
- New subscriber joining after a sender has been active gets the profile on first forwarded message
|
||||
- Multiple senders: each sender's profile is independently tracked
|
||||
|
||||
---
|
||||
|
||||
## Files to modify
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `src/Simplex/Chat/Store/SQLite/Migrations/M{date}_sent_profile_vector.hs` | New migration |
|
||||
| `src/Simplex/Chat/Store/SQLite/Migrations.hs` | Register migration |
|
||||
| `src/Simplex/Chat/Store/Postgres/Migrations/M{date}_sent_profile_vector.hs` | New migration |
|
||||
| `src/Simplex/Chat/Store/Postgres/Migrations.hs` | Register migration |
|
||||
| `simplex-chat.cabal` | Add migration module |
|
||||
| `src/Simplex/Chat/Store/Groups.hs` | Vector CRUD operations |
|
||||
| `src/Simplex/Chat/Messages/Batch.hs` | `prependBatchElement`, `encodeMemberProfileElement` |
|
||||
| `src/Simplex/Chat/Library/Subscriber.hs` | Delivery job worker profile logic, xInfoMember vector clear |
|
||||
| `src/Simplex/Chat/Store/Delivery.hs` | Only if Option C chosen (single-sender jobs) |
|
||||
| `tests/ChatTests/Groups.hs` | Update channel tests |
|
||||
|
||||
## Subscriber-side impact
|
||||
|
||||
**None required for receiving.** The subscriber already handles:
|
||||
- `XGrpMemNew` from relay → creates member record with full profile
|
||||
- `XGrpMsgForward` → finds existing member record
|
||||
- Mixed batch elements (direct + forwarded) processed in order
|
||||
|
||||
The only subscriber-side change is the test expectations.
|
||||
|
||||
## Verification
|
||||
|
||||
1. **Build**: `cabal build --ghc-options=-O0`
|
||||
2. **Run channel tests**: `cabal test simplex-chat-test --test-options='-m "channels"'`
|
||||
3. **Verification scenarios**:
|
||||
- New subscriber sends reaction → other subscribers receive profile + reaction (no "unknown member")
|
||||
- Subscriber updates profile → next message re-sends updated profile
|
||||
- New subscriber joins after sender was active → first forwarded message from that sender includes profile
|
||||
|
||||
## Known considerations
|
||||
|
||||
1. **Vector expansion**: A member with `index_in_group = 100000` causes vector expansion to 100KB. `markSentPositions` handles this via the same expand-on-write pattern as `setRelation` in `Types/MemberRelations.hs`.
|
||||
|
||||
2. **Delivery filtering**: Only mark vector bits for members who were actually delivered to (those with `readyMemberConn`). The `deliver` function filters for ready connections — if `markProfilesSentToMembers` marked all cursor members including those without connections, disconnected members would never receive the profile on reconnection.
|
||||
|
||||
3. **Scope**: Profile tracking applies only to `DJSGroup` scope. Support scope (`DJSMemberSupport`) delivers to moderators who already know members — no profile tracking needed there.
|
||||
|
||||
4. **Sender exclusion**: `getGroupMembersByCursor` already filters out the sender via `singleSenderGMId_` in the WHERE clause, so no self-profile issue arises.
|
||||
|
||||
5. **Race: vector clear vs delivery**: If profile update and message delivery overlap, the delivery sees an empty vector and sends the profile. This is correct — the delivery uses the current (updated) profile, so recipients get the new profile.
|
||||
Reference in New Issue
Block a user