From eb4f601c8bbbdb354bd4f23d8f8e9ad372dee382 Mon Sep 17 00:00:00 2001 From: Narasimha-sc <166327228+Narasimha-sc@users.noreply.github.com> Date: Tue, 12 May 2026 09:11:26 +0000 Subject: [PATCH] core: keep whitelisted query parameters when removing link tracking (#6965) * core: keep whitelisted query parameters when removing link tracking In safe mode, "remove link tracking" stripped any query parameter whose name started with a known tracking prefix in qsSafeBlacklist, ignoring qsWhitelist. So "list" (e.g. YouTube playlist links) was dropped because "li" (LinkedIn) is a prefix of it, and github's "ref" was dropped too. Make the safe-mode filter consult the whitelist, like the other branches. * docs: plan for keeping whitelisted query parameters when removing link tracking Design doc for the safe-mode sanitizeUri change (PR #6965): why "?list=" was stripped from YouTube links, the root cause (safe mode ignoring qsWhitelist), the fix, what it does/doesn't change, and alternatives considered. --- plans/2026-05-11-link-tracking-whitelist.md | 85 +++++++++++++++++++++ src/Simplex/Chat/Markdown.hs | 2 +- tests/MarkdownTests.hs | 4 + 3 files changed, 90 insertions(+), 1 deletion(-) create mode 100644 plans/2026-05-11-link-tracking-whitelist.md diff --git a/plans/2026-05-11-link-tracking-whitelist.md b/plans/2026-05-11-link-tracking-whitelist.md new file mode 100644 index 0000000000..7bebe0ea03 --- /dev/null +++ b/plans/2026-05-11-link-tracking-whitelist.md @@ -0,0 +1,85 @@ +# "Remove link tracking" strips whitelisted query parameters (`?list=` in YouTube links, github `ref`) + +Design doc for the fix in PR #6965 (`nd/fix-list-in-link` → `master`). + +## Problem — what prompted this + +With **"remove link tracking"** enabled (Settings → Privacy & security), sending a message +with a YouTube link that has a `list` query parameter — `https://www.youtube.com/playlist?list=PL...` +or a video-in-playlist link `https://www.youtube.com/watch?v=...&list=PL...` — sent the URL +with `?list=...` removed, so the recipient got a plain (non-playlist) link instead of the +playlist. Fixing that `?list=` stripping is the immediate purpose of this change. + +iOS, Android and desktop are all affected — the URI sanitiser lives in the shared Haskell +core (`src/Simplex/Chat/Markdown.hs`). + +## Cause + +"Remove link tracking" on send uses *safe mode* of `sanitizeUri`: +`ComposeView.sanitizeMessage` → `parseSanitizeUri(_, safe = true)` → `chatParseUri 1` → +`sanitizeUri True`. + +`sanitizeUri` has three branches that pick which query parameters to keep; two of them +already consult `qsWhitelist` (the list of parameter names known *not* to be tracking — `q`, +`search`, `list`, `page`, youtube's `v`/`t`, github's `ref`, …): + +```haskell +let sanitizedQS + | safe = filter (not . isSafeBlacklisted . fst) originalQS -- ← whitelist NOT consulted + | isNamePath = case originalQS of + p@(n, _) : ps -> (if isWhitelisted n || not (isBlacklisted n) then (p :) else id) $ filter (isWhitelisted . fst) ps + [] -> [] + | otherwise = filter (isWhitelisted . fst) originalQS +... +isSafeBlacklisted p = any (`B.isPrefixOf` p) qsSafeBlacklist +qsSafeBlacklist = [ "ad", "af", ..., "li", ..., "ref", ... ] -- name *prefixes*; "li" → LinkedIn (li_fat_id, lipi, licu) +``` + +The safe-mode branch is the odd one out: it drops a parameter whenever its name *starts +with* a known tracking prefix, and never looks at `qsWhitelist`. So `list` was dropped +because `"li"` is a tracking prefix, and github's whitelisted `ref` was dropped because +`"ref"` is itself a tracking prefix — even though both are explicitly listed as non-tracking +and are kept by every other branch. + +## Fix + +Make the safe-mode branch apply the same "whitelisted *or* not blacklisted" rule the other +branches already use: + +```haskell +| safe = filter (\(n, _) -> isWhitelisted n || not (isSafeBlacklisted n)) originalQS +``` + +This *removes* a special case rather than adding one — `list` is no longer handled +differently from any other whitelisted parameter; `qsWhitelist` becomes authoritative in all +three branches. Effects relative to the previous behaviour: + +- `list` is kept everywhere (the reported `?list=` bug); +- github's `ref` is kept on `github.com` in safe mode too (it was already kept in eager + mode — it's in the whitelist for exactly that reason); +- nothing else changes: of all whitelist entries, only `list` (vs the `"li"` prefix) and + `ref` (vs the `"ref"` prefix) collide with a `qsSafeBlacklist` prefix today; +- every actual tracking parameter is still stripped — `qsWhitelist` does not contain + `li_fat_id`, `lipi`, `licu`, `utm*`, etc., and `ref` on any non-github host stays stripped. + +Regression tests added in `testSanitizeUri` (`tests/MarkdownTests.hs`): + +```haskell +it "should keep whitelisted parameters in safe mode even if they match a blacklist prefix" $ do + "https://example.com/playlist?list=abc" `sanitized` Nothing -- "list" is whitelisted, "li" is blacklisted + "https://example.com/playlist?list=abc&si=def" `sanitized` Just "https://example.com/playlist?list=abc" + "https://github.com/owner/repo?ref=main" `sanitized` Nothing -- "ref" is whitelisted for github.com +``` + +Verified: full library + test rebuild, then `cabal run simplex-chat-test -- --match /sanitizeUri/` +→ 4 examples, 0 failures (the new block plus the three pre-existing `sanitizeUri` cases). + +## Alternatives considered + +- **Special-case `list`** (`isSafeBlacklisted p = p /= "list" && …`). Smallest possible diff, + provably zero collateral, but it hard-codes one parameter name into a predicate and leaves + the structural inconsistency (safe mode ignoring the whitelist) in place — a fix by + exception rather than by rule. (This was the first version; replaced.) +- **Narrow the `"li"` blacklist entry to `"li_"`.** Fixes `list` but stops matching `lipi` + and `licu` (real LinkedIn email-link params), i.e. changes more than `list` while still + not addressing `ref` or the underlying inconsistency. diff --git a/src/Simplex/Chat/Markdown.hs b/src/Simplex/Chat/Markdown.hs index 86ba441a57..39000fde84 100644 --- a/src/Simplex/Chat/Markdown.hs +++ b/src/Simplex/Chat/Markdown.hs @@ -360,7 +360,7 @@ parseUri s = case U.parseURI U.laxURIParserOptions s of sanitizeUri :: Bool -> U.URI -> Maybe U.URI sanitizeUri safe uri@U.URI {uriAuthority, uriPath, uriQuery = U.Query originalQS} = let sanitizedQS - | safe = filter (not . isSafeBlacklisted . fst) originalQS + | safe = filter (\(n, _) -> isWhitelisted n || not (isSafeBlacklisted n)) originalQS | isNamePath = case originalQS of p@(n, _) : ps -> (if isWhitelisted n || not (isBlacklisted n) then (p :) else id) $ filter (isWhitelisted . fst) ps [] -> [] diff --git a/tests/MarkdownTests.hs b/tests/MarkdownTests.hs index 3683b536dd..54483babec 100644 --- a/tests/MarkdownTests.hs +++ b/tests/MarkdownTests.hs @@ -416,6 +416,10 @@ testSanitizeUri = describe "sanitizeUri" $ do "https://example.com/page/a123?source=abc" `safeSanitized` Nothing -- source is in unsafe blacklist "https://example.com/page/a123?name=abc" `eagerSanitized` Just "https://example.com/page/a123" "https://example.com/page/a123?name=abc" `safeSanitized` Nothing -- name is not in a whitelist + it "should keep whitelisted parameters in safe mode even if they match a blacklist prefix" $ do + "https://example.com/playlist?list=abc" `sanitized` Nothing -- "list" is whitelisted, "li" is blacklisted + "https://example.com/playlist?list=abc&si=def" `sanitized` Just "https://example.com/playlist?list=abc" + "https://github.com/owner/repo?ref=main" `sanitized` Nothing -- "ref" is whitelisted for github.com where s `eagerSanitized` res = sanitized_ False s res s `safeSanitized` res = sanitized_ True s res