Reverts element-hq/synapse#18416
Unfortunately, this causes failures on `/sendToDevice` endpoint in
normal circumstances. If a single user has, say, a hundred devices then
we easily go over the limit. This blocks message sending entirely in
encrypted rooms.
cc @MadLittleMods @MatMaul
This is a simplification so that `unsigned` only includes "simple"
values, to make it easier to port to Rust.
Reviewable commit-by-commit
Summary:
1. **Add `recheck` column to `redactions` table**
A new boolean `recheck` column (default true) is added to the
`redactions` table. This captures whether a redaction needs its sender
domain checked at read time — required for room v3+ where redactions are
accepted speculatively and later validated. When persisting a new
redaction, `recheck` is set directly from
`event.internal_metadata.need_to_check_redaction()`.
It's fine if initially we recheck all redactions, as it only results in
a little more CPU overhead (as we always pull out the redaction event
regardless).
2. **Backfill `recheck` via background update**
A background update (`redactions_recheck`) backfills the new column for
existing rows by reading `recheck_redaction` from each event's
`internal_metadata` JSON. This avoids loading full event objects by
reading `event_json` directly via a SQL JOIN.
3. **Don't fetch confirmed redaction events from the DB**
Previously, when loading events, Synapse recursively fetched all
redaction events regardless of whether they needed domain rechecking.
Now `_fetch_event_rows` reads the `recheck` column and splits redactions
into two lists:
- `unconfirmed_redactions` — need fetching and domain validation
- `confirmed_redactions` — already validated, applied directly without
fetching the event
This avoids unnecessary DB reads for the common case of
already-confirmed redactions.
4. **Move `redacted_because` population to `EventClientSerializer`**
Previously, `redacted_because` (the full redaction event object) was
stored in `event.unsigned` at DB fetch time, coupling storage-layer code
to client serialization concerns. This is removed from
`_maybe_redact_event_row` and moved into
`EventClientSerializer.serialize_event`, which fetches the redaction
event on demand. The storage layer now only sets
`unsigned["redacted_by"]` (the redaction event ID).
5. **Always use `EventClientSerializer`**
The standalone `serialize_event` function was made private
(`_serialize_event`). All external callers — `rest/client/room.py`,
`rest/admin/events.py, appservice/api.py`, and `tests` — were updated to
use `EventClientSerializer.serialize_event` / `serialize_events`,
ensuring
`redacted_because` is always populated correctly via the serializer.
6. **Batch-fetch redaction events in `serialize_events`**
`serialize_events` now collects all `redacted_by` IDs from the event
batch upfront and fetches them in a single `get_events` call, passing
the result as a `redaction_map` to each `serialize_event` call. This
reduces N individual DB round-trips to one when serializing a batch of
events that includes redacted events.
---------
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Principally so that we can share the same room version configuration
between Python and Rust.
For the most part, this is a direct port. Some special handling has had
to go into `KNOWN_ROOM_VERSIONS` so that it can be sensibly shared
between Python and Rust, since we do update it during config parsing.
---------
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Fixes: #8088
Previously we would perform OIDC discovery on startup,
which involves making HTTP requests to the identity provider(s).
If that took a long time, we would block startup.
If that failed, we would crash startup.
This commit:
- makes the loading happen in the background on startup
- makes an error in the 'preload' non-fatal (though it logs at CRITICAL
for visibility)
- adds a templated error page to show on failed redirects (for
unavailable providers), as otherwise you get a JSON response in your
navigator.
- This involves introducing 2 new exception types to mark other
exceptions and keep the error handling fine-grained.
The machinery was already there to load-on-demand the discovery config,
so when the identity provider
comes back up, the discovery is reattempted and login can succeed.
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Fixes https://github.com/element-hq/synapse/issues/19494
MSC4284 policy servers
This:
* removes the old `/check` (recommendation) support because it's from an
older design. Policy servers should have updated to `/sign` by now. We
also remove optionality around the policy server's public key because it
was only optional to support `/check`.
* supports the stable `m.room.policy` state event and `/sign` endpoints,
falling back to unstable if required. Note the changes between unstable
and stable:
* Stable `/sign` uses errors instead of an empty signatures block to
indicate refusal.
* Stable `m.room.policy` nests the public key in an object with explicit
key algorithm (always ed25519 for now)
* does *not* introduce tests that the above fallback to unstable works.
If it breaks, we're not going to be sad about an early transition. Tests
can be added upon request, though.
* fixes a bug where the policy server was asked to sign policy server
state events (the events were correctly skipped in `is_event_allowed`,
but `ask_policy_server_to_sign_event` didn't do the same).
* fixes a bug where the original event sender's signature can be deleted
if the sending server is the same as the policy server.
* proxies Matrix-shaped errors from the policy server to the
Client-Server API as `SynapseError`s (a new capability of the stable
API).
Membership event handling (from the issue) is expected to be a different
PR due to the size of changes involved (tracked by
https://github.com/element-hq/synapse/issues/19587).
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: turt2live <1190097+turt2live@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
Updates the error codes to match MSC2666 changes (user ID query param
validation + proper errcode for requesting rooms with self), added the
new `count` field, and stabilized the endpoint.
This moves the dev dependencies to PEP 735 dependency groups, to help us
move to standard project metadata, which will help us moving to `uv`
(#19566)
This requires poetry 2.2.0
It's pretty hard to remember the order of all of these ambiguous
numbers. I assume they're not totally labeled already to cut down on the
length when scanning with your eyes. This just adds a few hints of what
each grouping is.
Spawning from [staring at some Synapse
logs](https://github.com/element-hq/matrix-hosted/issues/10631) and
cross-referencing the Synapse source code over and over.
Companion PR:
https://github.com/element-hq/matrix-authentication-service/pull/5550
to 1) send this flag
and 2) provision users proactively when their lock status changes.
---
Currently Synapse and MAS have two independent user lock
implementations. This PR makes it so that MAS can push its lock status
to Synapse when 'provisioning' the user.
Having the lock status in Synapse is useful for removing users from the
user directory
when they are locked.
There is otherwise no authentication requirement to have it in Synapse;
the enforcement is done
by MAS at token introspection time.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Can be reviewed commit by commit.
This sets caching headers on the /versions and /auth_metadata endpoints
to:
- allow clients to cache the response for up to 10 minutes
(`max-age=600`)
- allow proxies to cache the response for up to an hour
(`s-maxage=3600`)
- make proxies serve stale response for up to an hour (`s-maxage=3600`)
but make them refresh their response after 10 minutes
(`stale-while-revalidate=600`) so that we always have a snappy response
to client, but also have fresh responses most of the time
- only cache the response for unauthenticated requests on /versions
(`Vary: Authorization`)
I'm not too worried about the 1h TTL on the proxy side, as with the
`stale-while-revalidate` directive, one just needs to do two requests
after 10 minutes to get a fresh response from the cache.
The reason we want this, is that clients usually load this right away,
leading to a lot of traffic from people just loading the Element Web
login screen with the default config. This is currently routed to
`client_readers` on matrix.org (and ESS) which can be overwhelmed for
other reasons, leading to slow response times on those endpoints (3s+).
Overwhelmed workers shouldn't prevent people from logging in, and
shouldn't result in a long loading spinner in clients. This PR allows
caching proxies (like Cloudflare) to publicly cache the unauthenticated
response of those two endpoints and make it load quicker, reducing
server load as well.
`event_id` is a lazily-computed property on events, as it's a hash of
the event content on room version 3 and later. The reason we do this is
that it helps finding database inconsistencies by not trusting the event
ID we got from the database.
The thing is, when we clone events (to return them through /sync or
/messages for example) we don't copy the computed hash if we already
computed it, duplicating the work. This copies the internal `_event_id`
property.
This way we actually detect problems like
https://github.com/element-hq/synapse/pull/19475 as they happen instead
of being invisible until something breaks.
Sanity check that Complement is testing against your code changes
(whether it be local or from the PR in CI).
```
COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh --in-repo -run TestSynapseVersion
```
Fixes: #19540Fixes: #16290 (side effect of the proposed fix)
Closes: #12804 (side effect of the proposed fix)
Introduced in: https://github.com/matrix-org/synapse/pull/8932
---
This PR is a relatively simple simplification of the profile change on
deactivation that appears to remove multiple bugs.
This PR's **primary motivating fix** is #19540: when a user is
deactivated and erased, they would be kept in the user directory. This
bug appears to have been here since #8932 (previously
https://github.com/matrix-org/synapse/pull/8932) (v1.26.0).
The root cause of this bug is that after removing the user from the user
directory, we would immediately update their displayname and avatar to
empty strings (one at a time), which re-inserts
the user into the user directory.
With this PR, we now delete the entire `profiles` row upon user erasure,
which is cleaner (from a 'your database goes back to zero after
deactivating and erasing a user' point of view) and
only needs one database operation (instead of doing displayname then
avatar).
With this PR, we also no longer send the 2 (deferred) `m.room.member`
`join` events to every room to propagate the displayname and avatar_url
changes.
This is good for two reasons:
- the user is about to get parted from those rooms anyway, so this
reduces the number of state events sent per room from 3 to 1. (More
efficient for us in the moment and leaves less litter in the room DAG.)
- it is possible for the displayname/avatar update to be sent **after**
the user parting, which seems as though it could trigger the user to be
re-joined to a public room.
(With that said, although this sounds vaguely familiar in my lossy
memory, I can't find a ticket that actually describes this bug, so this
might be fictional. Edit: #16290 seems to describe this, although the
title is misleading.)
Additionally, as a side effect of the proposed fix (deleting the
`profiles` row), this PR also now deletes custom profile fields upon
user erasure, which is a new feature/bugfix (not sure which) in its own
right.
I do not see a ticket that corresponds to this feature gap, possibly
because custom profile fields are still a niche feature without
mainstream support (to the best of my knowledge).
Tests are included for the primary bugfix and for the cleanup of custom
profile fields.
### `set_displayname` module API change
This change includes a minor _technically_-breaking change to the module
API.
The change concerns `set_displayname` which is exposed to the module API
with a `deactivation: bool = False` flag, matching the internal handler
method it wraps.
I suspect that this is a mistake caused by overly-faithfully piping
through the args from the wrapped method (this Module API was introduced
in
https://github.com/matrix-org/synapse/pull/14629/changes#diff-0b449f6f95672437cf04f0b5512572b4a6a729d2759c438b7c206ea249619885R1592).
The linked PR did the same for `by_admin` originally before it was
changed.
The `deactivation` flag's only purpose is to be piped through to other
Module API callbacks when a module has registered to be notified about
profile changes.
My claim is that it makes no sense for the Module API to have this flag
because it is not the one doing the deactivation, thus it should never
be in a position to set this to `True`.
My proposed change keeps the flag (for function signature
compatibility), but turns it into a no-op (with a `ERROR` log when it's
set to True by the module).
The Module API callback notifying of the module-caused displayname
change will therefore now always have `deactivation = False`.
*Discussed in
[`#synapse-dev:matrix.org`](https://matrix.to/#/!i5D5LLct_DYG-4hQprLzrxdbZ580U9UB6AEgFnk6rZQ/$1f8N6G_EJUI_I_LvplnVAF2UFZTw_FzgsPfB6pbcPKk?via=element.io&via=matrix.org&via=beeper.com)*
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Summary
- drop the `systemd` extra from `pyproject.toml` and the
`systemd-python` optional dependency
- this means we don't ship the journald log handler, so it clarifies the
docs how to install that in the venv
- ensure the Debian virtualenv build keeps shipping
`systemd-python>=231` in the venv, so the packaged log config can keep
using `systemd.journal.JournalHandler`
Context of this is the following:
> Today in my 'how hard would it be to move to uv' journey:
https://github.com/systemd/python-systemd/issues/167
>
> The gist of it is that uv really wants to create a universal lock
file, which means it needs to be able to resolve the package metadata,
even for packages locked for other platforms. In the case of
systemd-python, they use mesonpy as build backend, which doesn't
implement prepare_metadata_for_build_wheel, which means it needs to run
meson to be able to resolve the package metadata. And it will hard-fail
if libsystemd dev headers aren't available 😭
>
> [*message in
#synapse-dev:matrix.org*](https://matrix.to/#/!i5D5LLct_DYG-4hQprLzrxdbZ580U9UB6AEgFnk6rZQ/$OKLB3TJVXAwq43sAZFJ-_PvMMzl4P_lWmSAtlmsoMuM?via=element.io&via=matrix.org&via=beeper.com)
This is a manual lock bump, as it looks like Dependabot is currently
timing out updating dependencies. This should hopefully unlock it, as it
will have fewer dependencies to update.
Two outstanding exceptions:
- pympler upgrade adds a pywin32 deps, which is missing sdist (so CI is
complaining)
- pysaml2 for some unknown reason pinned the MAX version of pyopenssl,
which duplicates pyopenssl and cryptography, which obviously breaks
stuff
Use non-deprecated imports for collections
Other than being deprecated, these legacy imports also don't seem to be
compatible with [Ty](https://github.com/astral-sh/ty)
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
When a worker gets very busy some of these loops can get large and end
up taking hundreds of ms to complete. To help keep the reactor tick
times reasonable we add a periodic yield into these loops.
These were found by doing a `py-spy` and speedscope.net (in time order)
to see where we were spending blocks of time
This fixes one of the 2 blockers to using pytest instead of Trial (which
is not formally-motivated, but sometimes seems like an interesting idea
because
pytest has seen a lot of developer experience features that Trial
hasn't. It would also removes one more coupling to the Twisted
framework.)
---
The `test_` prefix to this test helper makes it appear as a test to
pytest.
We *can* set a `__test__ = False` attribute on the test, but it felt
cleaner to just rename it (as I also thought it would be a test from
that name!).
This was previously reported as:
https://github.com/element-hq/synapse/issues/18665
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>