Some test fixtures construct event/PDU dicts inline rather than via
make_test_event — typically because they then run them through
add_hashes_and_signatures or otherwise depend on the exact dict shape.
Add the format-required fields (depth, hashes, auth_events,
prev_events, ...) inline in those cases so they keep working under
the strict (Rust-bound) constructor.
Switch test sites that build EventBase/PDU instances over to
make_test_event / make_test_pdu_event so they pick up the defaults
the strict (Rust-bound) constructor will require.
Prepare for porting the event class into Rust, where the constructor
strictly validates that all format-required fields (depth, hashes,
origin_server_ts, auth_events, prev_events, ...) are present. Most
tests build minimal dicts that omit these fields because they only
care about the fields the test exercises. Introduce make_test_event
and make_test_pdu_event, which layer format-version-aware defaults
on top of caller-supplied fields so individual tests don't need to
spell out every required key.
Based on #19708.
This is on the path to porting the entire event class to Rust, as
`event.content` will then return the new Rust class `JsonObject`.
This PR adds a pure Rust `JsonObject` class that is a `Mapping`
representing a json-style object. It uses `serde_json::Value` as its
in-memory representation and `pythonize` for conversion when a field is
looked up on the object.
I'm not thrilled with the name, but couldn't think of a better one.
This also adds `JsonObject` handling to the JSON serialisation functions
we use, as well as to the `freeze(..)` function.
Reviewable commit-by-commit.
Fixes the symptoms of https://github.com/element-hq/synapse/issues/19315
/ https://github.com/element-hq/synapse/issues/19588 but not the
underlying reason causing the number to grow so large in the first
place.
```
ValueError: Exceeds the limit (4300 digits) for integer string conversion; use sys.set_int_max_str_digits() to increase the limit
```
Copied from the original pull request on [Famedly's Synapse
repo](https://github.com/famedly/synapse/pull/221) (with some edits):
Basing the time interval around a 5 seconds leaves a big window of
waiting especially as this window is doubled each retry, when another
worker could be making progress but can not.
Right now, the retry interval in seconds looks like `[0.2, 5, 10, 20,
40, 80, 160, 320, (continues to double)]` after which logging should
start about excessive times and (relatively quickly) end up with an
extremely large retry interval with an unrealistic expectation past the
heat death of the universe. 1 year in seconds = 31,536,000.
With this change, retry intervals in seconds should look more like:
```
[
0.2,
0.4,
0.8,
1.6,
3.2,
6.4,
12.8,
25.6,
51.2,
60, < never goes higher than this
]
```
Logging about excessive wait times will start at 10 minutes.
<details>
<summary>Previous breakdown when we were using 15 minutes</summary>
```
[
0.2,
0.4,
0.8,
1.6,
3.2,
6.4,
12.8,
25.6,
51.2,
102.4, # 1.7 minutes
204.8, # 3.41 minutes
409.6, # 6.83 minutes
819.2, # 13.65 minutes < logging about excessive times will start here, 13th iteration
900, # 15 minutes < never goes higher than this
]
```
</details>
Further suggested work in this area could be to define the cap, the
retry interval starting point and the multiplier depending on how
frequently this lock should be checked. See data below for reasons why.
Increasing the jitter range may also be a good idea
---------
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
(cherry picked from commit 3f58bc50df)
Similar to #19706, let's port the `unsigned` field into a Rust class.
This does change things a bit in that we now define exactly what
unsigned fields that are allowed to be added to an event, and what
actually gets persisted. This should be a noop though, as we carefully
filter out what unsigned fields we allow in from federation, for example
As a side effect of this cleanup, I think this fixes handling
`unsigned.age` on events received over federation.
This is another stepping stone in porting the event class fully to Rust.
The new `Signatures` class is relatively simple, as we actually don't
interact with it that much in the code. It does *not* implement
`Mapping` or `MutableMapping` as that takes quite a lot of effort that
we don't need, even though it would be more ergonomic.
Fixes the symptoms of https://github.com/element-hq/synapse/issues/19315
/ https://github.com/element-hq/synapse/issues/19588 but not the
underlying reason causing the number to grow so large in the first
place.
```
ValueError: Exceeds the limit (4300 digits) for integer string conversion; use sys.set_int_max_str_digits() to increase the limit
```
Copied from the original pull request on [Famedly's Synapse
repo](https://github.com/famedly/synapse/pull/221) (with some edits):
Basing the time interval around a 5 seconds leaves a big window of
waiting especially as this window is doubled each retry, when another
worker could be making progress but can not.
Right now, the retry interval in seconds looks like `[0.2, 5, 10, 20,
40, 80, 160, 320, (continues to double)]` after which logging should
start about excessive times and (relatively quickly) end up with an
extremely large retry interval with an unrealistic expectation past the
heat death of the universe. 1 year in seconds = 31,536,000.
With this change, retry intervals in seconds should look more like:
```
[
0.2,
0.4,
0.8,
1.6,
3.2,
6.4,
12.8,
25.6,
51.2,
60, < never goes higher than this
]
```
Logging about excessive wait times will start at 10 minutes.
<details>
<summary>Previous breakdown when we were using 15 minutes</summary>
```
[
0.2,
0.4,
0.8,
1.6,
3.2,
6.4,
12.8,
25.6,
51.2,
102.4, # 1.7 minutes
204.8, # 3.41 minutes
409.6, # 6.83 minutes
819.2, # 13.65 minutes < logging about excessive times will start here, 13th iteration
900, # 15 minutes < never goes higher than this
]
```
</details>
Further suggested work in this area could be to define the cap, the
retry interval starting point and the multiplier depending on how
frequently this lock should be checked. See data below for reasons why.
Increasing the jitter range may also be a good idea
---------
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
When upgrading a room to v12, we accidentally ended up mutating the
content of the old power level. Since we cache events, this meant any
future usage of the old power level event would see the wrong content
(until it dropped from the cache).
This meant that the creator of the new room would not be able to perform
admin actions on the old room. Any federation requests for the event
would fail the hash checks, since the content had been changed.
All in all, quite a nasty bug.
This fixes the bug described in #19713 (and double-checked against the
SDK integration test, which now passes with this change). A sync
response must be returned immediately if a room subscription
configuration change caused a new non-empty response (checked with `if
response` in the code) to be produced.
Fixes#19713.
Fixes#18844.
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
Currently synapse returns `M_FORBIDDEN` when trying to use the account
deactivation API, if the server admin disabled displayname changes. This
is undesirable, since it prevents GDPR erasure without admin
interaction. The admin API seems to work fine though. This also only
seems to affect the deactivate API, when the erase flag is true.
Relevant endpoint:
https://spec.matrix.org/latest/client-server-api/#post_matrixclientv3accountdeactivate
This change only removes the checked for condition that the displayname
and profile avatar are allowed to be changed per the configuration
setting. If a user is deleting themselves, why is that denied?
There did not seem to be a basic test for this endpoint that checks the
`erase` usage, so that was added as well as checking the above mentioned
behavior.
Both `__getitem__` and `.user_id` were removed in #19680 to simplify the
event class. However, `EventBase` is exposed to modules who might also
make use of those methods, so let's reinstate them (but otherwise not
reinstate the usage of them in the code).
Fixes#13043
The usages of the table mostly already correctly handled if we don't
have old entries, as that was needed when we first added the table.
I arbitrarily set the prune time to 30 days. The only use for old
entries is for sync streams that haven't synced since then, and we
should very rarely see sync streams that haven't been used in 30 days.
Reviewable commit-by-commit.
---------
Co-authored-by: Olivier 'reivilibre' <oliverw@element.io>
Co-authored-by: Olivier 'reivilibre' <olivier@librepush.net>
Closes: #19688
Part of: MSC4450 whose Experimental Feature tracking issue is #19691
Add an unstable, namespaced `idp_id` query parameter to `fallback/web` \
This allows clients to specify the identity provider they'd like to log
in with for SSO when they have multiple upstream IdPs associated with
their account.
Previously, Synapse would just pick one arbitrarily. But this was
undesirable as you may want to use a different one at that point in
time. When logging in, the user is able to choose when IdP they use -
during UIA (which uses fallback auth mechanism) they should be able to
do the same.
-----
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
When we return events to clients we need to annotate them with the
membership of the user at the time of the event, in the `unsigned`
section. We already check the membership at the event during the
visibility checks, and so we annotate events there. However, since this
a per-user field we end up having to clone the event in question.
Instead, let's add a `FilteredEvent` class that is returned by the
visibility checks, which allows returning the membership without editing
the event. This has three benefits:
1. Avoids the clones of the event.
2. Allows us to statically check that we have filtered events before
returning them to clients.
3. We no longer edit `unsigned` data after event deserialization, this
makes it easier to port the event class to Rust.
The last benefit is why we're doing this *now*, however IMV it shouldn't
affect whether we want this change or not.
Reviewable commit-by-commit
---------
Co-authored-by: Olivier 'reivilibre' <oliverw@element.io>
Follows: #19365
Part of: MSC4354 Sticky Events (experimental feature #19409)
This PR introduces a `spam_checker_spammy` flag, analogous to
`policy_server_spammy`, as an explicit flag
that an event was decided to be spammy by a spam-checker module.
The original Sticky Events PR (#18968) just reused
`policy_server_spammy`, but it didn't sit right with me
because we (at least appear to be experimenting with features that)
allow users to opt-in to seeing
`policy_server_spammy` events (presumably for moderation purposes).
Keeping these flags separate felt best, therefore.
As for why we need this flag: soon soft-failed status won't be
permanent, at least for sticky events.
The spam checker modules currently work by making events soft-failed.
We want to prevent spammy events from getting
reconsidered/un-soft-failed, so it seems like we need
a flag to track spam-checker spamminess *separately* from soft-failed.
Should be commit-by-commit friendly, but is also small.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
This is to make it easier to port to Rust, as well as making things
conceptually simpler.
Two changes:
1. Remove the `__getitem__` interface on events
2. Remove `.user_id` as an alias of `.sender`.
The spec says `device_keys` may be omitted, but not set to `null`.
This was temporarily allowed as a workaround for misbehaving clients
(see #19023), which have since been fixed.
Fixes#19030
The Rust port of `KNOWN_ROOM_VERSIONS` (#19589) made `__contains__`
strict about key types, raising `TypeError` when called with `None`
instead of returning `False` like a Python dict would.
This broke `/sync` for rooms with a NULL `room_version` in the database.
```
File "/home/synapse/src/synapse/handlers/sync.py", line 2628, in _get_room_changes_for_initial_sync
if event.room_version_id not in KNOWN_ROOM_VERSIONS:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument 'key': 'NoneType' object cannot be cast as 'str'
```
Reverts element-hq/synapse#18416
Unfortunately, this causes failures on `/sendToDevice` endpoint in
normal circumstances. If a single user has, say, a hundred devices then
we easily go over the limit. This blocks message sending entirely in
encrypted rooms.
cc @MadLittleMods @MatMaul
This is a simplification so that `unsigned` only includes "simple"
values, to make it easier to port to Rust.
Reviewable commit-by-commit
Summary:
1. **Add `recheck` column to `redactions` table**
A new boolean `recheck` column (default true) is added to the
`redactions` table. This captures whether a redaction needs its sender
domain checked at read time — required for room v3+ where redactions are
accepted speculatively and later validated. When persisting a new
redaction, `recheck` is set directly from
`event.internal_metadata.need_to_check_redaction()`.
It's fine if initially we recheck all redactions, as it only results in
a little more CPU overhead (as we always pull out the redaction event
regardless).
2. **Backfill `recheck` via background update**
A background update (`redactions_recheck`) backfills the new column for
existing rows by reading `recheck_redaction` from each event's
`internal_metadata` JSON. This avoids loading full event objects by
reading `event_json` directly via a SQL JOIN.
3. **Don't fetch confirmed redaction events from the DB**
Previously, when loading events, Synapse recursively fetched all
redaction events regardless of whether they needed domain rechecking.
Now `_fetch_event_rows` reads the `recheck` column and splits redactions
into two lists:
- `unconfirmed_redactions` — need fetching and domain validation
- `confirmed_redactions` — already validated, applied directly without
fetching the event
This avoids unnecessary DB reads for the common case of
already-confirmed redactions.
4. **Move `redacted_because` population to `EventClientSerializer`**
Previously, `redacted_because` (the full redaction event object) was
stored in `event.unsigned` at DB fetch time, coupling storage-layer code
to client serialization concerns. This is removed from
`_maybe_redact_event_row` and moved into
`EventClientSerializer.serialize_event`, which fetches the redaction
event on demand. The storage layer now only sets
`unsigned["redacted_by"]` (the redaction event ID).
5. **Always use `EventClientSerializer`**
The standalone `serialize_event` function was made private
(`_serialize_event`). All external callers — `rest/client/room.py`,
`rest/admin/events.py, appservice/api.py`, and `tests` — were updated to
use `EventClientSerializer.serialize_event` / `serialize_events`,
ensuring
`redacted_because` is always populated correctly via the serializer.
6. **Batch-fetch redaction events in `serialize_events`**
`serialize_events` now collects all `redacted_by` IDs from the event
batch upfront and fetches them in a single `get_events` call, passing
the result as a `redaction_map` to each `serialize_event` call. This
reduces N individual DB round-trips to one when serializing a batch of
events that includes redacted events.
---------
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Principally so that we can share the same room version configuration
between Python and Rust.
For the most part, this is a direct port. Some special handling has had
to go into `KNOWN_ROOM_VERSIONS` so that it can be sensibly shared
between Python and Rust, since we do update it during config parsing.
---------
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Fixes: #8088
Previously we would perform OIDC discovery on startup,
which involves making HTTP requests to the identity provider(s).
If that took a long time, we would block startup.
If that failed, we would crash startup.
This commit:
- makes the loading happen in the background on startup
- makes an error in the 'preload' non-fatal (though it logs at CRITICAL
for visibility)
- adds a templated error page to show on failed redirects (for
unavailable providers), as otherwise you get a JSON response in your
navigator.
- This involves introducing 2 new exception types to mark other
exceptions and keep the error handling fine-grained.
The machinery was already there to load-on-demand the discovery config,
so when the identity provider
comes back up, the discovery is reattempted and login can succeed.
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>