This reports the total count of users (split by appservice) which is
meant to be the monthless counterpart to the MAU metric.
Context:
> So this is largely for billing purposes and wanting to know the change
in the number of users. If a user is deactivated then we no longer want
to count them. Consumers *might* want to count appservice users, and
maybe count them based on the service (perhaps you change more for users
under bridge X or bridge Y).
>
> *-- https://github.com/element-hq/synapse/pull/19848#discussion_r3402216234*
After looking into it, just a couple of things to pick a bone at in the
old wording,
which I thought could be clarified for when I next come to look at this
again.
- the claim that there's a fundamental difference; I'd argue there isn't
really, it's just by convention
on some mainstream distros. So I have changed this to 'typically'
- statements that some distros fetch dependencies at build time
(probably does happen, but
traditional distros make a point of not doing this for the reasons you'd
expect).
- This was probably meant to be talking about Debian, but my observation
based on sample size of 3 is that some crates are packaged natively,
others are vendored in the respective application's source package (like
they do for us) and sometime they patch the bounds a bit
There could probably be room to talk about how distros vendoring
packages is a maintenance burden on them,
but I guess it's a bit moot as we would struggle to conform to wide
enough bounds to make everyone
happy (and anyway; I expect the distros that vendor packages have the
tooling to make this easy to
update and we do keep on top of security updates and release
frequently...)
---
Spawning from discussion in
[`#element-backend-internal:matrix.org`](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$VttYPPUevn2S_W_rrzg2ZOXWI6aKebk2ganTgrLEWUc?via=jki.re&via=element.io&via=matrix.org)
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Follows: #19487
Part of: MSC4354 whose experimental feature tracking issue is #19409
This PR implements the Sliding Sync (MSC4186) extension described in
MSC4354, allowing sliding sync clients
to receive sticky events in a reliable way.
The logic is much the same as for oldschool sync (implementation in
#19487),
although in the sliding sync extension, the client can choose their own
limit
and must control their own pagination through an extra token in the
extension request/response bodies.
Note this does not yet send down existing sticky events in the
room when the room has been newly-joined.
This newly-discovered gap is tracked at #19662 and will be addressed for
both current sync and MSC4186 SSS soon.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
Bringing it up to parity with the other 404 StoreErrors naming the table
name already.
More helpful response when debugging issues with incomplete relations.
Concretly: a user that got partly removed and then reinstated, missing
an entry in 'profiles' table.
This should also contribute to a better understanding of #2807 and #2173
Not aware of an open ticket for this.
I came across it when I accidentally broke the feature even more (as
part of another piece of work),
then discovered there weren't tests for this.
So this is overall a low-priority drive-by fix.
Requires a fix to SyTest https://github.com/matrix-org/sytest/pull/1426
(as it depended on the bug).
<ol>
<li>
Add a test for purging rooms with `delete_local_events=False` \
Parameterised by room version, this test currently succeeds
on v2 but fails on v12.
This is because the condition checking for local events relies
on the old event ID format, which has not been used since v2.
</li>
<li>
Fix delete_local_events=False for room versions above v2 \
The event ID format changing means that we have to rely on `sender`
to know the origin of an event
</li>
</ol>
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
When I run `ruff` locally, either `ruff` or one of the linting scripts
adds `.ruff_cache` to the `.gitignore` file. This PR adds that line so
that running linters doesn't result in a dirty git working tree (and to
ignore ruff's cache, of course).
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
Fixes: #19844
Concretely, this changes `ResponseCache` to unset cache entries once
they resolve to a `Failure`.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
Running `scripts-dev/lint.sh` locally revealed a failure, so here are the fixes that the linter
applied + a manual change to address an issue that couldn't be
auto-fixed.
The Complement integration-test job prints the full raw `go test -json`
stream straight to the GitHub Actions build log. That's an enormous,
unreadable wall of JSON, and the GitHub web UI renders very large logs
poorly — making the run hard to view and slow to load.
Instead, let's store the raw JSON and upload it as an artefact. We still
render failing test output as before.
We will still stream when tests finish (i.e. PASS / FAIL) so that people
can see that things are actually running. However, all other output
(such as logs) are hidden.
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
When building the v1.155.0rc1, the release notification said the
workflow was successful
but it had actually failed.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
The sytest `After /purge_history users still get pushed for new
messages` is flaky. The flakiness exposes a real bug rather than a
test-timing issue.
Notification counts are stored in two places: `event_push_actions` (one
row per unread event) and `event_push_summary` (aggregate counts
populated periodically by `_rotate_notifs`, which runs on a 30-second
timer). `_purge_history_txn` deletes the purged events' rows from
`event_push_actions` but never adjusts `event_push_summary` (only the
full-room `purge_room` drops that table).
So the result depends on a race: if rotation hasn't fired, counts come
live from `event_push_actions`, the purge removes the right rows, and
the count is correct. If rotation fires before the purge — more likely
under the slower
multi-postgres/workers/asyncio CI config — the events get folded into
`event_push_summary`, the purge then deletes the underlying
`event_push_actions` rows but leaves the summary untouched, and the
count comes out inflated.
### Fix
Before deleting the rotated rows from `event_push_actions`, decrement
`event_push_summary` by the amount attributable to the events being
deleted. The decrement mirrors the counting logic in
`_rotate_notifs_before_txn`: only rows that were already rotated
(`stream_ordering <= event_push_summary_stream_ordering`) and that fall
after the summary's receipt are subtracted, so it stays correct in the
presence of read receipts and unread/highlight rows. The SQL avoids
`UPDATE ... FROM` and CTEs so it works on both SQLite and Postgres.
End-of-purge cache invalidation already covers
`get_unread_event_push_actions_by_room_for_user`.
### Tests
Adds `test_count_aggregation_after_purge`, which forces a rotation
before purging and asserts the aggregate count reflects only the
surviving events, covering read receipts and a subsequent re-rotation.
It fails (`3 != 1`) without the fix.
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
I.e. `default_config("test")` and `default_config("test", False)` are so
opaque and hard to connect the dots with.
Spawning from trying to figure out what the `server_name` is set as for
our `HomeserverTestCase` in order to reference it in
https://github.com/element-hq/synapse/pull/19848#discussion_r3397455309
This was all done through claude.
## Fix flaky `test_edu_large_messages_not_splitting_one_user` (`TooLong`
under `trial -jN`)
### Problem
The "Build .deb packages" CI step intermittently failed with:
```
twisted.protocols.amp.TooLong
...in
tests.rest.client.test_sendtodevice.SendToDeviceTestCase.test_edu_large_messages_not_splitting_one_user
```
The deb build runs the suite with `twisted.trial -j2`. In that mode,
worker log events are shipped to the manager process over Twisted's AMP
protocol, which encodes each value with a 2-byte length prefix — so any
single log line of **64 KiB or more** raises `TooLong`.
### Root cause
Not a bug in the to-device/EDU logic, and **not** debug logging enabled
by the build (it runs at the default `ERROR` level). It's a **log-level
leak between tests sharing a `-j2` worker**:
- `tests/logging/test_loggers.py::ExplicitlyConfiguredLoggerTestCase`
calls `root_logger.setLevel(logging.DEBUG)` directly and never restores
it (no `setUp`/`tearDown`/`addCleanup`).
- When that test runs before
`test_edu_large_messages_not_splitting_one_user` **in the same worker
process**, the root logger is left at `DEBUG`.
- That test deliberately builds an EDU of exactly `SOFT_MAX_EDU_SIZE -
1` (65 535) bytes. Storing it triggers `synapse/storage/database.py`'s
`[SQL values]` DEBUG log, which dumps the full query params — producing
a **65 708-byte**
line that overflows AMP's cap.
It looks "flaky" purely because of `-j2` scheduling: whether the two
tests land on the same worker, and in what order.
### Fix
Three commits:
1. **Restore the root logger level in
`ExplicitlyConfiguredLoggerTestCase`** — `addCleanup` to put the level
back. Fixes the root cause.
2. **Truncate oversized log lines in the test log handler**
(`ToTwistedHandler.emit`) — caps lines at 1000 chars so no debug line
can break `trial -jN`, regardless of which query is logged (defense in
depth).
3. **Truncate values in `[SQL values]` debug logging** — caps the logged
param repr at 1000 chars (guarded by `isEnabledFor(DEBUG)` to keep the
hot path lazy). Keeps production debug logs sane too.
### Testing
- The reproduction (`trial -j2 tests.logging.test_loggers <the EDU
test>`) went from **~3/8 failing** to **10/10 passing**.
- Confirmed the root level is restored after the logging tests, and that
the `[SQL values]` line is now capped (~50 KB with a `[truncated]`
marker, was 65 708).
- `ruff` + `mypy` clean; `tests.logging.test_loggers` and
`tests.rest.client.test_sendtodevice` pass (14/14).
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Eric Eastwood <erice@element.io>
When building v1.155.0rc1, flaky deb builds combined with fail-fast
behaviour
meant that I had to press the retry button 5 times to get a full set.
Without fail-fast, I suspect 1 or 2 retries would have done the job.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
This is in prep for converting the event serialization to Rust.
This is a fairly mechanical port, except that we store the appservice ID
rather than the appservice object. This avoids us having to store a
`Py<..>` (or port the appservice object over).
The reason for querying this support was wanting support for SQLite's JSON operators, which are currently not present in the SQLite version found in Ubuntu's oldest supported LTS.
The JSON operators were used in some of the sticky events work (related: #19452).
Our ruling was that we should support Ubuntu oldest LTS equally to Debian oldstable, so support the oldest of the two versions from those.
That makes some kind of sense as it would be difficult to do otherwise without dropping support for that version of Ubuntu altogether, given if we kept publishing packages intended for use with Postgres, there's a risk that an innocent sysadmin would update their SQLite deployment without realising that it is no longer supported.
This was [discussed months ago
(private)](https://docs.google.com/document/d/12RZKPk3a4__JUSH9wYHODo9rRyKzsHg6BSCAcmqmbOU/edit?tab=t.0#bookmark=id.fcdvoc88dy5s)
and at [private](https://docs.google.com/document/d/12RZKPk3a4__JUSH9wYHODo9rRyKzsHg6BSCAcmqmbOU/edit?tab=t.0#bookmark=id.u48ivjge4qpt).
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>