Also refactor the "user left" to not listen to the "user_left_room" dispatch hook since we only want to do this once, thus call the profile handler immediately in the instance responsible for receiving the event.
Remove the checking of any stream token changes and ensure we always collect profiles for users even if they have not done profile updates, if they have events in the timeline. Also fix the cache.
This table tracks which users should receive profile updates. It is intended as a write-heavy, cheap-read mechanism.
At profile update time, the homeserver determines all the users that should receive the update. At the point of /sync, we then quickly read this table and check for any relevant updates for the syncing user.
The timestamp field will be used to cull the table over time (so it doesn't grow indefinitely).
Bringing it up to parity with the other 404 StoreErrors naming the table
name already.
More helpful response when debugging issues with incomplete relations.
Concretly: a user that got partly removed and then reinstated, missing
an entry in 'profiles' table.
This should also contribute to a better understanding of #2807 and #2173
Not aware of an open ticket for this.
I came across it when I accidentally broke the feature even more (as
part of another piece of work),
then discovered there weren't tests for this.
So this is overall a low-priority drive-by fix.
Requires a fix to SyTest https://github.com/matrix-org/sytest/pull/1426
(as it depended on the bug).
<ol>
<li>
Add a test for purging rooms with `delete_local_events=False` \
Parameterised by room version, this test currently succeeds
on v2 but fails on v12.
This is because the condition checking for local events relies
on the old event ID format, which has not been used since v2.
</li>
<li>
Fix delete_local_events=False for room versions above v2 \
The event ID format changing means that we have to rely on `sender`
to know the origin of an event
</li>
</ol>
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
When I run `ruff` locally, either `ruff` or one of the linting scripts
adds `.ruff_cache` to the `.gitignore` file. This PR adds that line so
that running linters doesn't result in a dirty git working tree (and to
ignore ruff's cache, of course).
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
Fixes: #19844
Concretely, this changes `ResponseCache` to unset cache entries once
they resolve to a `Failure`.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
Running `scripts-dev/lint.sh` locally revealed a failure, so here are the fixes that the linter
applied + a manual change to address an issue that couldn't be
auto-fixed.
The Complement integration-test job prints the full raw `go test -json`
stream straight to the GitHub Actions build log. That's an enormous,
unreadable wall of JSON, and the GitHub web UI renders very large logs
poorly — making the run hard to view and slow to load.
Instead, let's store the raw JSON and upload it as an artefact. We still
render failing test output as before.
We will still stream when tests finish (i.e. PASS / FAIL) so that people
can see that things are actually running. However, all other output
(such as logs) are hidden.
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
When building the v1.155.0rc1, the release notification said the
workflow was successful
but it had actually failed.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
The sytest `After /purge_history users still get pushed for new
messages` is flaky. The flakiness exposes a real bug rather than a
test-timing issue.
Notification counts are stored in two places: `event_push_actions` (one
row per unread event) and `event_push_summary` (aggregate counts
populated periodically by `_rotate_notifs`, which runs on a 30-second
timer). `_purge_history_txn` deletes the purged events' rows from
`event_push_actions` but never adjusts `event_push_summary` (only the
full-room `purge_room` drops that table).
So the result depends on a race: if rotation hasn't fired, counts come
live from `event_push_actions`, the purge removes the right rows, and
the count is correct. If rotation fires before the purge — more likely
under the slower
multi-postgres/workers/asyncio CI config — the events get folded into
`event_push_summary`, the purge then deletes the underlying
`event_push_actions` rows but leaves the summary untouched, and the
count comes out inflated.
### Fix
Before deleting the rotated rows from `event_push_actions`, decrement
`event_push_summary` by the amount attributable to the events being
deleted. The decrement mirrors the counting logic in
`_rotate_notifs_before_txn`: only rows that were already rotated
(`stream_ordering <= event_push_summary_stream_ordering`) and that fall
after the summary's receipt are subtracted, so it stays correct in the
presence of read receipts and unread/highlight rows. The SQL avoids
`UPDATE ... FROM` and CTEs so it works on both SQLite and Postgres.
End-of-purge cache invalidation already covers
`get_unread_event_push_actions_by_room_for_user`.
### Tests
Adds `test_count_aggregation_after_purge`, which forces a rotation
before purging and asserts the aggregate count reflects only the
surviving events, covering read receipts and a subsequent re-rotation.
It fails (`3 != 1`) without the fix.
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
I.e. `default_config("test")` and `default_config("test", False)` are so
opaque and hard to connect the dots with.
Spawning from trying to figure out what the `server_name` is set as for
our `HomeserverTestCase` in order to reference it in
https://github.com/element-hq/synapse/pull/19848#discussion_r3397455309
This was all done through claude.
## Fix flaky `test_edu_large_messages_not_splitting_one_user` (`TooLong`
under `trial -jN`)
### Problem
The "Build .deb packages" CI step intermittently failed with:
```
twisted.protocols.amp.TooLong
...in
tests.rest.client.test_sendtodevice.SendToDeviceTestCase.test_edu_large_messages_not_splitting_one_user
```
The deb build runs the suite with `twisted.trial -j2`. In that mode,
worker log events are shipped to the manager process over Twisted's AMP
protocol, which encodes each value with a 2-byte length prefix — so any
single log line of **64 KiB or more** raises `TooLong`.
### Root cause
Not a bug in the to-device/EDU logic, and **not** debug logging enabled
by the build (it runs at the default `ERROR` level). It's a **log-level
leak between tests sharing a `-j2` worker**:
- `tests/logging/test_loggers.py::ExplicitlyConfiguredLoggerTestCase`
calls `root_logger.setLevel(logging.DEBUG)` directly and never restores
it (no `setUp`/`tearDown`/`addCleanup`).
- When that test runs before
`test_edu_large_messages_not_splitting_one_user` **in the same worker
process**, the root logger is left at `DEBUG`.
- That test deliberately builds an EDU of exactly `SOFT_MAX_EDU_SIZE -
1` (65 535) bytes. Storing it triggers `synapse/storage/database.py`'s
`[SQL values]` DEBUG log, which dumps the full query params — producing
a **65 708-byte**
line that overflows AMP's cap.
It looks "flaky" purely because of `-j2` scheduling: whether the two
tests land on the same worker, and in what order.
### Fix
Three commits:
1. **Restore the root logger level in
`ExplicitlyConfiguredLoggerTestCase`** — `addCleanup` to put the level
back. Fixes the root cause.
2. **Truncate oversized log lines in the test log handler**
(`ToTwistedHandler.emit`) — caps lines at 1000 chars so no debug line
can break `trial -jN`, regardless of which query is logged (defense in
depth).
3. **Truncate values in `[SQL values]` debug logging** — caps the logged
param repr at 1000 chars (guarded by `isEnabledFor(DEBUG)` to keep the
hot path lazy). Keeps production debug logs sane too.
### Testing
- The reproduction (`trial -j2 tests.logging.test_loggers <the EDU
test>`) went from **~3/8 failing** to **10/10 passing**.
- Confirmed the root level is restored after the logging tests, and that
the `[SQL values]` line is now capped (~50 KB with a `[truncated]`
marker, was 65 708).
- `ruff` + `mypy` clean; `tests.logging.test_loggers` and
`tests.rest.client.test_sendtodevice` pass (14/14).
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Eric Eastwood <erice@element.io>
When building v1.155.0rc1, flaky deb builds combined with fail-fast
behaviour
meant that I had to press the retry button 5 times to get a full set.
Without fail-fast, I suspect 1 or 2 retries would have done the job.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
This is in prep for converting the event serialization to Rust.
This is a fairly mechanical port, except that we store the appservice ID
rather than the appservice object. This avoids us having to store a
`Py<..>` (or port the appservice object over).
This makes writes heavier when profile updates happen, but reduces the effort to produce an incremental sync response by not needing to look whether users share rooms.
The reason for querying this support was wanting support for SQLite's JSON operators, which are currently not present in the SQLite version found in Ubuntu's oldest supported LTS.
The JSON operators were used in some of the sticky events work (related: #19452).
Our ruling was that we should support Ubuntu oldest LTS equally to Debian oldstable, so support the oldest of the two versions from those.
That makes some kind of sense as it would be difficult to do otherwise without dropping support for that version of Ubuntu altogether, given if we kept publishing packages intended for use with Postgres, there's a risk that an innocent sysadmin would update their SQLite deployment without realising that it is no longer supported.
This was [discussed months ago
(private)](https://docs.google.com/document/d/12RZKPk3a4__JUSH9wYHODo9rRyKzsHg6BSCAcmqmbOU/edit?tab=t.0#bookmark=id.fcdvoc88dy5s)
and at [private](https://docs.google.com/document/d/12RZKPk3a4__JUSH9wYHODo9rRyKzsHg6BSCAcmqmbOU/edit?tab=t.0#bookmark=id.u48ivjge4qpt).
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>