synapse

mirror of https://github.com/element-hq/synapse.git synced 2026-04-03 20:56:22 +00:00

Author	SHA1	Message	Date
Andrew Ferrazzutti	fc244bb592	Use type hinting generics in standard collections (#19046 ) aka PEP 585, added in Python 3.9 - https://peps.python.org/pep-0585/ - https://docs.astral.sh/ruff/rules/non-pep585-annotation/	2025-10-22 16:48:19 -05:00
Eric Eastwood	2d07bd7fd2	Update TODO list of conflicting areas where we encounter metrics being clobbered (`ApplicationService`) (#19040 ) These errors are harmless and are a long-standing issue that is just now being logged, see https://github.com/element-hq/synapse/issues/19042 ``` 2025-10-10 15:30:00,026 - synapse.util.metrics - 330 - ERROR - notify_interested_services-0 - Metric named cache_lru_cache__matches_user_in_member_list_example.com already registered for server example.com 2025-10-10 16:30:00.167 2025-10-10 15:30:00,026 - synapse.util.metrics - 330 - ERROR - notify_interested_services-0 - Metric named cache_lru_cache_is_interested_in_room_example.com already registered for server example.com 2025-10-10 16:30:00.167 2025-10-10 15:30:00,025 - synapse.util.metrics - 330 - ERROR - notify_interested_services-0 - Metric named cache_lru_cache_is_interested_in_event_example.com already registered for server example.com 2025-10-10 16:29:15.560 2025-10-10 15:29:15,449 - synapse.util.metrics - 330 - ERROR - notify_interested_services_ephemeral-0 - Metric named cache_lru_cache__matches_user_in_member_list_example.com already registered for server example.com 2025-10-10 16:29:15.560 2025-10-10 15:29:15,449 - synapse.util.metrics - 330 - ERROR - notify_interested_services_ephemeral-0 - Metric named cache_lru_cache_is_interested_in_room_example.com already registered for server example.com ```	2025-10-13 10:15:47 -05:00
Eric Eastwood	1c093509ce	Switch task scheduler from raw logcontext manipulation (`set_current_context`) to utils (`PreserveLoggingContext`) (#18990 ) Prefer the utils over raw logcontext manipulation. Spawning from adding some logcontext debug logs in https://github.com/element-hq/synapse/pull/18966 and since we're not logging at the `set_current_context(...)` level (see reasoning there), this removes some usage of `set_current_context(...)`.	2025-10-02 10:22:25 -05:00
Devon Hudson	396de6544a	Cleanly shutdown SynapseHomeServer object (#18828 ) This PR aims to allow for a clean shutdown of the `SynapseHomeServer` object so that it can be fully deleted and cleaned up by garbage collection without shutting down the entire python process. Fix https://github.com/element-hq/synapse-small-hosts/issues/50 ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Eric Eastwood <erice@element.io>	2025-10-01 02:42:09 +00:00
Eric Eastwood	5adb08f3c9	Remove `MockClock()` (#18992 ) Spawning from adding some logcontext debug logs in https://github.com/element-hq/synapse/pull/18966 and since we're not logging at the `set_current_context(...)` level (see reasoning there), this removes some usage of `set_current_context(...)`. Specifically, `MockClock.call_later(...)` doesn't handle logcontexts correctly. It uses the calling logcontext as the callback context (wrong, as the logcontext could finish before the callback finishes) and it didn't reset back to the sentinel context before handing back to the reactor. It was like this since it was [introduced 10+ years ago](`38da9884e7`). Instead of fixing the implementation which would just be a copy of our normal `Clock`, we can just remove `MockClock`	2025-09-30 11:27:29 -05:00
Eric Eastwood	5143f93dc9	Fix `server_name` in logging context for multiple Synapse instances in one process (#18868 ) ### Background As part of Element's plan to support a light form of vhosting (virtual host) (multiple instances of Synapse in the same Python process), we're currently diving into the details and implications of running multiple instances of Synapse in the same Python process. "Per-tenant logging" tracked internally by https://github.com/element-hq/synapse-small-hosts/issues/48 ### Prior art Previously, we exposed `server_name` by providing a static logging `MetadataFilter` that injected the values: `205d9e4fc4/synapse/config/logger.py (L216)` While this can work fine for the normal case of one Synapse instance per Python process, this configures things globally and isn't compatible when we try to start multiple Synapse instances because each subsequent tenant will overwrite the previous tenant. ### What does this PR do? We remove the `MetadataFilter` and replace it by tracking the `server_name` in the `LoggingContext` and expose it with our existing [`LoggingContextFilter`](`205d9e4fc4/synapse/logging/context.py (L584-L622)`) that we already use to expose information about the `request`. This means that the `server_name` value follows wherever we log as expected even when we have multiple Synapse instances running in the same process. ### A note on logcontext Anywhere, Synapse mistakenly uses the `sentinel` logcontext to log something, we won't know which server sent the log. We've been fixing up `sentinel` logcontext usage as tracked by https://github.com/element-hq/synapse/issues/18905 Any further `sentinel` logcontext usage we find in the future can be fixed piecemeal as normal. `d2a966f922/docs/log_contexts.md (L71-L81)` ### Testing strategy 1. Adjust your logging config to include `%(server_name)s` in the format ```yaml formatters: precise: format: '%(asctime)s - %(server_name)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s - %(message)s' ``` 1. Start Synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Make some requests (`curl http://localhost:8008/_matrix/client/versions`, etc) 1. Open the homeserver logs and notice the `server_name` in the logs as expected. `unknown_server_from_sentinel_context` is expected for the `sentinel` logcontext (things outside of Synapse).	2025-09-26 17:10:48 -05:00
Andrew Morgan	ddc7627b22	Fix performance regression related to delayed events processing (#18926 )	2025-09-23 09:47:30 +01:00
Eric Eastwood	e7d98d3429	Remove `sentinel` logcontext in `Clock` utilities (`looping_call`, `looping_call_now`, `call_later`) (#18907 ) Part of https://github.com/element-hq/synapse/issues/18905 Lints for ensuring we use `Clock.call_later` instead of `reactor.callLater`, etc are coming in https://github.com/element-hq/synapse/pull/18944 ### Testing strategy 1. Configure Synapse to log at the `DEBUG` level 1. Start Synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Wait 10 seconds for the [database profiling loop](`9cc4001778/synapse/storage/database.py (L711)`) to execute 1. Notice the logcontext being used for the `Total database time` log line Before (`sentinel`): ``` 2025-09-10 16:36:58,651 - synapse.storage.TIME - 707 - DEBUG - sentinel - Total database time: 0.646% {room_forgetter_stream_pos(2): 0.131%, reap_monthly_active_users(1): 0.083%, get_device_change_last_converted_pos(1): 0.078%} ``` After (`looping_call`): ``` 2025-09-10 16:36:58,651 - synapse.storage.TIME - 707 - DEBUG - looping_call - Total database time: 0.646% {room_forgetter_stream_pos(2): 0.131%, reap_monthly_active_users(1): 0.083%, get_device_change_last_converted_pos(1): 0.078%} ```	2025-09-22 14:51:13 -05:00
Eric Eastwood	d05f44a1c6	Introduce `Clock.add_system_event_trigger(...)` to include logcontext by default (#18945 ) Introduce `Clock.add_system_event_trigger(...)` to wrap system event callback code in a logcontext, ensuring we can identify which server generated the logs. Background: > Ideally, nothing from the Synapse homeserver would be logged against the `sentinel` > logcontext as we want to know which server the logs came from. In practice, this is not > always the case yet especially outside of request handling. > > Global things outside of Synapse (e.g. Twisted reactor code) should run in the > `sentinel` logcontext. It's only when it calls into application code that a logcontext > gets activated. This means the reactor should be started in the `sentinel` logcontext, > and any time an awaitable yields control back to the reactor, it should reset the > logcontext to be the `sentinel` logcontext. This is important to avoid leaking the > current logcontext to the reactor (which would then get picked up and associated with > the next thing the reactor does). > > *-- `docs/log_contexts.md` Also adds a lint to prefer `Clock.add_system_event_trigger(...)` over `reactor.addSystemEventTrigger(...)` Part of https://github.com/element-hq/synapse/issues/18905	2025-09-22 11:47:22 -05:00
Eric Eastwood	5a9ca1e3d9	Introduce `Clock.call_when_running(...)` to include logcontext by default (#18944 ) Introduce `Clock.call_when_running(...)` to wrap startup code in a logcontext, ensuring we can identify which server generated the logs. Background: > Ideally, nothing from the Synapse homeserver would be logged against the `sentinel` > logcontext as we want to know which server the logs came from. In practice, this is not > always the case yet especially outside of request handling. > > Global things outside of Synapse (e.g. Twisted reactor code) should run in the > `sentinel` logcontext. It's only when it calls into application code that a logcontext > gets activated. This means the reactor should be started in the `sentinel` logcontext, > and any time an awaitable yields control back to the reactor, it should reset the > logcontext to be the `sentinel` logcontext. This is important to avoid leaking the > current logcontext to the reactor (which would then get picked up and associated with > the next thing the reactor does). > > *-- `docs/log_contexts.md` Also adds a lint to prefer `Clock.call_when_running(...)` over `reactor.callWhenRunning(...)` Part of https://github.com/element-hq/synapse/issues/18905	2025-09-22 10:27:59 -05:00
Andrew Morgan	b596faa4ec	Cache `_get_e2e_cross_signing_signatures_for_devices` (#18899 )	2025-09-18 12:06:08 +01:00
Eric Eastwood	84d64251dc	Remove `sentinel` logcontext where we log in `setup`, `start` and exit (#18870 ) Remove `sentinel` logcontext where we log in `setup`, `start`, and exit. Instead of having one giant PR that removes all places we use `sentinel` logcontext, I've decided to tackle this more piece-meal. This PR covers the parts if you just startup Synapse and exit it with no requests or activity going on in between. Part of https://github.com/element-hq/synapse/issues/18905 (Remove `sentinel` logcontext where we log in Synapse) Prerequisite for https://github.com/element-hq/synapse/pull/18868. Logging with the `sentinel` logcontext means we won't know which server the log came from. ### Why `9cc4001778/docs/log_contexts.md (L71-L81)` (docs updated in https://github.com/element-hq/synapse/pull/18900) ### Testing strategy 1. Run Synapse normally and with `daemonize: true`: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Execute some requests 1. Shutdown the server 1. Look for any bad log entries in your homeserver logs: - `Expected logging context sentinel but found main` - `Expected logging context main was lost` - `Expected previous context` - `utime went backwards!`/`stime went backwards!` - `Called stop on logcontext POST-0 without recording a start rusage` 1. Look for any logs coming from the `sentinel` context With these changes, you should only see the following logs (not from Synapse) using the `sentinel` context if you start up Synapse and exit: `homeserver.log` ``` 2025-09-10 14:45:39,924 - asyncio - 64 - DEBUG - sentinel - Using selector: EpollSelector 2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - Received SIGINT, shutting down. 2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - (TCP Port 9322 Closed) 2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 8008 Closed) 2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 9093 Closed) 2025-09-10 14:45:40,564 - twisted - 281 - INFO - sentinel - Main loop terminated. ```	2025-09-16 17:15:08 -05:00
reivilibre	ada3a3b2b3	Add experimental support for MSC4308: Thread Subscriptions extension to Sliding Sync when MSC4306 and MSC4186 are enabled. (#18695 ) Closes: #18436 Implements: https://github.com/matrix-org/matrix-spec-proposals/pull/4308 Follows: #18674 Adds an extension to Sliding Sync and a companion endpoint needed for backpaginating missed thread subscription changes, as described in MSC4308 --------- Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org> Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2025-09-11 14:45:04 +01:00
Eric Eastwood	bff4a11b3f	Re-introduce: Fix `LaterGauge` metrics to collect from all servers (#18791 ) Re-introduce: https://github.com/element-hq/synapse/pull/18751 that was reverted in https://github.com/element-hq/synapse/pull/18789 (explains why the PR was reverted in the first place). - Adds a `cleanup` pattern that cleans up metrics from each homeserver in the tests. Previously, the list of hooks built up until our CI machines couldn't operate properly, see https://github.com/element-hq/synapse/pull/18789 - Fix long-standing issue with `synapse_background_update_status` metrics only tracking the last database listed in the config (see https://github.com/element-hq/synapse/pull/18791#discussion_r2261706749)	2025-09-02 12:14:27 -05:00
Eric Eastwood	ff03a51cb0	Revert "Fix `LaterGauge` metrics to collect from all servers (#18751 )" (#18789 ) This PR reverts https://github.com/element-hq/synapse/pull/18751 ### Why revert? @reivilibre [found](https://matrix.to/#/!vcyiEtMVHIhWXcJAfl:sw1v.org/$u9OEmMxaFYUzWHhCk1A_r50Y0aGrtKEhepF7WxWJkUA?via=matrix.org&via=node.marinchik.ink&via=element.io) that our CI was failing in bizarre ways (thanks for stepping up to dive into this 🙇). Examples: - `twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended by signal 9.` - `twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended by signal 15.` <details> <summary>More detailed part of the log</summary> https://github.com/element-hq/synapse/actions/runs/16758038107/job/47500520633#step:9:6809 ``` tests.util.test_wheel_timer.WheelTimerTestCase.test_single_insert_fetch =============================================================================== Error: Traceback (most recent call last): File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/trial/_dist/disttrial.py", line 371, in task await worker.run(case, result) File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/trial/_dist/worker.py", line 305, in run return await self.callRemote(workercommands.Run, testCase=testCaseId) # type: ignore[no-any-return] File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/defer.py", line 1187, in __iter__ yield self File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/defer.py", line 1092, in _runCallbacks current.result = callback( # type: ignore[misc] File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/protocols/amp.py", line 1968, in _massageError error.trap(RemoteAmpError) File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/python/failure.py", line 431, in trap self.raiseException() File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/python/failure.py", line 455, in raiseException raise self.value.with_traceback(self.tb) twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended by signal 9. tests.util.test_macaroons.MacaroonGeneratorTestCase.test_guest_access_token ------------------------------------------------------------------------------- Ran 4325 tests in 669.321s FAILED (skips=159, errors=62, successes=4108) while calling from thread Traceback (most recent call last): File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/base.py", line 1064, in runUntilCurrent f(a, kw) File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/base.py", line 790, in stop raise error.ReactorNotRunning("Can't stop reactor that isn't running.") twisted.internet.error.ReactorNotRunning: Can't stop reactor that isn't running. joining disttrial worker #0 failed Traceback (most recent call last): File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/defer.py", line 1853, in _inlineCallbacks result = context.run( File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/python/failure.py", line 467, in throwExceptionIntoGenerator return g.throw(self.value.with_traceback(self.tb)) File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/trial/_dist/worker.py", line 406, in exit await endDeferred File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.9/lib/python3.9/site-packages/twisted/internet/defer.py", line 1187, in __iter__ yield self twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended by signal 15. ``` </details> With more debugging (thanks @devonh for also stepping in as maintainer), we were finding that the CI was consistently failing at `test_exposed_to_prometheus` which was a bit of smoke because of all of the [metrics changes](https://github.com/element-hq/synapse/issues/18592) that were merged recently. Locally, although I wasn't able to reproduce the bizarre errors, I could easily see increased memory usage (~20GB vs ~2GB) and the `test_exposed_to_prometheus` test taking a while to complete when running a full test run (`SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests`). <img width="1485" height="78" alt="Lots of memory usage" src="https://github.com/user-attachments/assets/811e2a96-75e5-4a3c-966c-00dc0512cea9" /> After updating `test_exposed_to_prometheus` to dump the `latest_metrics_response = generate_latest(REGISTRY)`, I could see that it's a massive 3.2GB response. Inspecting the contents, we can see 4.1M (4,137,123) entries for just `synapse_background_update_status{server_name="test"} 3.0` which is a `LaterGauge`. I don't think we have 4.1M test cases so it's also unclear why we end up with so many samples but it does make sense that we do see a lot of duplicates because each `HomeserverTestCase` will create a homeserver for each test case that will `LaterGauge.register_hook(...)` (part of the https://github.com/element-hq/synapse/pull/18751 changes). `tests/storage/databases/main/test_metrics.py` ```python latest_metrics_response = generate_latest(REGISTRY) with open("/tmp/synapse-test-metrics", "wb") as f: f.write(latest_metrics_response) ``` After reverting the https://github.com/element-hq/synapse/pull/18751 changes, running the full test suite locally doesn't result in memory spikes and seems to run normally. ### Dev notes Discussion in the [`#synapse-dev:matrix.org`](https://matrix.to/#/!vcyiEtMVHIhWXcJAfl:sw1v.org/$vkMATs04yqZggVVd6Noop5nU8M2DVoTkrAWshw7u1-w?via=matrix.org&via=node.marinchik.ink&via=element.io) room. ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> [x] Pull request is based on the develop branch * [ ] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [ ] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))	2025-08-06 22:14:40 +00:00
reivilibre	8306cee06a	Update implementation of MSC4306: Thread Subscriptions to include automatic subscription conflict prevention as introduced in later drafts. (#18756 ) Follows: #18674 Implements new drafts of MSC4306 --------- Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org> Co-authored-by: Eric Eastwood <erice@element.io>	2025-08-05 18:22:53 +00:00
Eric Eastwood	076db0ab49	Fix `LaterGauge` metrics to collect from all servers (#18751 ) Fix `LaterGauge` metrics to collect from all servers Follow-up to https://github.com/element-hq/synapse/pull/18714 Previously, our `LaterGauge` metrics did include the `server_name` label as expected but we were only seeing the last server being reported in some cases. Any `LaterGauge` that we were creating multiple times was only reporting the last instance. This PR updates all `LaterGauge` to be created once and then we use `LaterGauge.register_hook(...)` to add in the metric callback as before. This works now because we store a list of callbacks instead of just one. I noticed this problem thanks to some [tests in the Synapse Pro for Small Hosts](https://github.com/element-hq/synapse-small-hosts/pull/173) repo that sanity check all metrics to ensure that we can see each metric includes data from multiple servers. ### Testing strategy 1. This is only noticeable when you run multiple Synapse instances in the same process. 1. TODO (see test that was added) ### Dev notes Previous non-global `LaterGauge`: ``` synapse_federation_send_queue_xxx synapse_federation_transaction_queue_pending_destinations synapse_federation_transaction_queue_pending_pdus synapse_federation_transaction_queue_pending_edus synapse_handlers_presence_user_to_current_state_size synapse_handlers_presence_wheel_timer_size synapse_notifier_listeners synapse_notifier_rooms synapse_notifier_users synapse_replication_tcp_resource_total_connections synapse_replication_tcp_command_queue synapse_background_update_status synapse_federation_known_servers synapse_scheduler_running_tasks ``` ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))	2025-08-05 15:28:55 +00:00
Erik Johnston	20615115fb	Make `.sleep(..)` return a coroutine (#18772 ) This helps ensure that mypy can catch places where we don't await on it, like in #18763. --------- Co-authored-by: Eric Eastwood <erice@element.io>	2025-08-05 09:30:52 +01:00
Eric Eastwood	e16fbdcdcc	Update metrics linting to be able to handle custom metrics (#18733 ) Part of https://github.com/element-hq/synapse/issues/18592	2025-08-01 15:34:11 -05:00
Eric Eastwood	e43a1cec84	Fix cache metrics to collect from all servers (#18748 ) Follow-up to https://github.com/element-hq/synapse/pull/18604 Previously, our cache metrics did include the `server_name` label as expected but we were only seeing the last server being reported. This was caused because we would `CACHE_METRIC_REGISTRY.register_hook(metric_name, metric.collect)` where the `metric_name` only took into account the cache name so it would be overwritten every time we spawn a new server. This PR updates the register logic to include the `server_name` so we have a hook for every cache on every server as expected. I noticed this problem thanks to some [tests in the Synapse Pro for Small Hosts](https://github.com/element-hq/synapse-small-hosts/pull/173) repo that sanity check all metrics to ensure that we can see each metric includes data from multiple servers.	2025-08-01 12:29:58 -05:00
Eric Eastwood	d4af2970f3	Refactor `Histogram` metrics to be homeserver-scoped (#18724 ) Bulk refactor `Histogram` metrics to be homeserver-scoped. We also add lints to make sure that new `Histogram` metrics don't sneak in without using the `server_name` label (`SERVER_NAME_LABEL`). Part of https://github.com/element-hq/synapse/issues/18592 ### Testing strategy 1. Add the `metrics` listener in your `homeserver.yaml` ```yaml listeners: # This is just showing how to configure metrics either way # # `http` `metrics` resource - port: 9322 type: http bind_addresses: ['127.0.0.1'] resources: - names: [metrics] compress: false # `metrics` listener - port: 9323 type: metrics bind_addresses: ['127.0.0.1'] ``` 1. Start the homeserver: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fetch `http://localhost:9322/_synapse/metrics` and/or `http://localhost:9323/metrics` 1. Observe response includes the TODO metrics with the `server_name` label ### Todo - [x] Wait for https://github.com/element-hq/synapse/pull/18656 to merge ### Dev notes ``` LoggingDatabaseConnection make_conn make_pool make_fake_db_pool ``` ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))	2025-07-29 15:35:38 -05:00
Eric Eastwood	3d683350e9	Refactor `LaterGauge` metrics to be homeserver-scoped (#18714 ) Part of https://github.com/element-hq/synapse/issues/18592	2025-07-29 13:49:41 -05:00
Eric Eastwood	f13a136396	Refactor `Gauge` metrics to be homeserver-scoped (#18725 ) Bulk refactor `Gauge` metrics to be homeserver-scoped. We also add lints to make sure that new `Gauge` metrics don't sneak in without using the `server_name` label (`SERVER_NAME_LABEL`). Part of https://github.com/element-hq/synapse/issues/18592 ### Testing strategy 1. Add the `metrics` listener in your `homeserver.yaml` ```yaml listeners: # This is just showing how to configure metrics either way # # `http` `metrics` resource - port: 9322 type: http bind_addresses: ['127.0.0.1'] resources: - names: [metrics] compress: false # `metrics` listener - port: 9323 type: metrics bind_addresses: ['127.0.0.1'] ``` 1. Start the homeserver: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fetch `http://localhost:9322/_synapse/metrics` and/or `http://localhost:9323/metrics` 1. Observe response includes the TODO metrics with the `server_name` label ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))	2025-07-29 10:37:59 -05:00
Eric Eastwood	2c236be058	Refactor `Counter` metrics to be homeserver-scoped (#18656 ) Bulk refactor `Counter` metrics to be homeserver-scoped. We also add lints to make sure that new `Counter` metrics don't sneak in without using the `server_name` label (`SERVER_NAME_LABEL`). All of the "Fill in" commits are just bulk refactor. Part of https://github.com/element-hq/synapse/issues/18592 ### Testing strategy 1. Add the `metrics` listener in your `homeserver.yaml` ```yaml listeners: # This is just showing how to configure metrics either way # # `http` `metrics` resource - port: 9322 type: http bind_addresses: ['127.0.0.1'] resources: - names: [metrics] compress: false # `metrics` listener - port: 9323 type: metrics bind_addresses: ['127.0.0.1'] ``` 1. Start the homeserver: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fetch `http://localhost:9322/_synapse/metrics` and/or `http://localhost:9323/metrics` 1. Observe response includes the `synapse_user_registrations_total`, `synapse_http_server_response_count_total`, etc metrics with the `server_name` label	2025-07-25 14:58:47 -05:00
Eric Eastwood	b7e7f537f1	Refactor background process metrics to be homeserver-scoped (#18670 ) Part of https://github.com/element-hq/synapse/issues/18592 Separated out of https://github.com/element-hq/synapse/pull/18656 because it's a bigger, unique piece of the refactor ### Testing strategy 1. Add the `metrics` listener in your `homeserver.yaml` ```yaml listeners: # This is just showing how to configure metrics either way # # `http` `metrics` resource - port: 9322 type: http bind_addresses: ['127.0.0.1'] resources: - names: [metrics] compress: false # `metrics` listener - port: 9323 type: metrics bind_addresses: ['127.0.0.1'] ``` 1. Start the homeserver: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fetch `http://localhost:9322/_synapse/metrics` and/or `http://localhost:9323/metrics` 1. Observe response includes the background processs metrics (`synapse_background_process_start_count`, `synapse_background_process_db_txn_count_total`, etc) with the `server_name` label	2025-07-23 13:28:17 -05:00
Eric Eastwood	88785dbaeb	Refactor cache metrics to be homeserver-scoped (#18604 ) (add `server_name` label to cache metrics). Part of https://github.com/element-hq/synapse/issues/18592	2025-07-16 16:04:57 -05:00
Eric Eastwood	fc10a5ee29	Refactor `Measure` block metrics to be homeserver-scoped (v2) (#18601 ) Refactor `Measure` block metrics to be homeserver-scoped (add `server_name` label to block metrics). Part of https://github.com/element-hq/synapse/issues/18592 ### Testing strategy #### See behavior of previous `metrics` listener 1. Add the `metrics` listener in your `homeserver.yaml` ```yaml listeners: - port: 9323 type: metrics bind_addresses: ['127.0.0.1'] ``` 1. Start the homeserver: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fetch `http://localhost:9323/metrics` 1. Observe response includes the block metrics (`synapse_util_metrics_block_count`, `synapse_util_metrics_block_in_flight`, etc) #### See behavior of the `http` `metrics` resource 1. Add the `metrics` resource to a new or existing `http` listeners in your `homeserver.yaml` ```yaml listeners: - port: 9322 type: http bind_addresses: ['127.0.0.1'] resources: - names: [metrics] compress: false ``` 1. Start the homeserver: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fetch `http://localhost:9322/_synapse/metrics` (it's just a `GET` request so you can even do in the browser) 1. Observe response includes the block metrics (`synapse_util_metrics_block_count`, `synapse_util_metrics_block_in_flight`, etc)	2025-07-15 15:55:23 -05:00
dependabot[bot]	78ce4dc26f	Bump mypy from 1.13.0 to 1.16.1 (#18653 )	2025-07-15 14:42:54 +01:00
Andrew Morgan	947216abc0	Update `latest_deps` workflow to migrate `poetry --no-dev` -> `--without dev` (#18617 )	2025-07-11 12:34:37 +01:00
Johannes Marbach	b9b8775db7	Add plain-text handling for rich-text topics as per MSC3765 (#18195 ) This implements https://github.com/matrix-org/matrix-spec-proposals/pull/3765 which is already merged and, therefore, can use stable identifiers. For `/publicRooms` and `/hierarchy`, the topic is read from the eponymous field of the `current_state_events` table. Rather than introduce further columns in this table, I changed the insertion / update logic to write the plain-text topic from the rich topic into the existing field. This will not take effect for existing rooms unless their topic is changed. However, existing rooms shouldn't have rich topics to begin with. Similarly, for server-side search, I changed the insertion logic of the `event_search` table to prefer the value from the rich topic. Again, existing events shouldn't have rich topics and, therefore, don't need to be migrated in the table. Spec doc: https://spec.matrix.org/v1.15/client-server-api/#mroomtopic Part of supporting Matrix v1.15: https://spec.matrix.org/v1.15/client-server-api/#mroomtopic Signed-off-by: Johannes Marbach <n0-0ne+github@mailbox.org> Co-authored-by: Eric Eastwood <erice@element.io>	2025-07-09 14:13:54 -05:00
Andrew Morgan	291880012f	Stop sending or processing the `origin` field in PDUs (#18418 ) Co-authored-by: Quentin Gliech <quenting@element.io> Co-authored-by: Eric Eastwood <erice@element.io>	2025-07-01 12:04:23 +01:00
Erik Johnston	33e0c25279	Clean up old `device_federation_inbox` rows (#18546 ) Fixes https://github.com/element-hq/synapse/issues/17370	2025-06-18 11:58:31 +00:00
Andrew Morgan	3b94e40cc8	Fix typo of Math.pow, `^` -> `**` (#18543 )	2025-06-13 11:36:21 +00:00
Erik Johnston	1709957395	Fix bug where sliding sync ignored `room_id_to_include` option (#18535 ) This was correctly handled for the "fallback" case where the background updates hadn't finished --------- Co-authored-by: Eric Eastwood <erice@element.io>	2025-06-13 11:29:23 +01:00
Quentin Gliech	0de7aa9953	Enable `flake8-logging` and `flake8-logging-format` rules in Ruff and fix related issues throughout the codebase (#18542 ) This can be reviewed commit by commit. This enables the `flake8-logging` and `flake8-logging-format` rules in Ruff, as well as logging exception stack traces in a few places where it makes sense - https://docs.astral.sh/ruff/rules/#flake8-logging-log - https://docs.astral.sh/ruff/rules/#flake8-logging-format-g ### Linting to avoid pre-formatting log messages See [`adamchainz/flake8-logging` -> LOG011 avoid pre-formatting log messages](`152db2f167/README.rst (log011-avoid-pre-formatting-log-messages)`) Practically, this means prefer placeholders (`%s`) over f-strings for logging. This is because placeholders are passed as args to loggers, so they can do special handling of them. For example, Sentry will record the args separately in their logging integration: `c15b390dfe/sentry_sdk/integrations/logging.py (L280-L284)` One theoretical small perf benefit is that log levels that aren't enabled won't get formatted, so it doesn't unnecessarily create formatted strings	2025-06-13 09:44:18 +02:00
dependabot[bot]	9d43bec326	Bump ruff from 0.7.3 to 0.11.10 (#18451 ) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Morgan <andrew@amorgan.xyz> Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2025-05-20 15:23:30 +01:00
David Baker	9f9eb56333	Return specific error code when email / phone not supported (#17578 ) Implements https://github.com/matrix-org/matrix-spec-proposals/pull/4178 If this would need tests, could you give some idea of what tests would be needed and how best to add them? ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [ ] Pull request is based on the develop branch * [ ] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [ ] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))	2025-05-05 11:08:50 +02:00
reivilibre	19b0e23c3d	Fix the token introspection cache logging access tokens when MAS integration is in use. (#18335 ) The `ResponseCache` logs keys by default. Let's not do that for access tokens. --------- Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>	2025-04-15 15:58:30 +01:00
V02460	068e22b4b7	Cleanup Python 3.8 leftovers (#17967 ) Some small cleanups after Python3.8 became EOL. - Move some type imports from `typing_extensions` to `typing` - Remove the `abi3-py38` feature from pyo3 ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Quentin Gliech <quenting@element.io>	2025-02-10 16:53:24 +00:00
Patrick Cloke	ca290d325c	Implement MSC4133 to support custom profile fields. (#17488 ) Implementation of [MSC4133](https://github.com/matrix-org/matrix-spec-proposals/pull/4133) to support custom profile fields. It is behind an experimental flag and includes tests. ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2025-01-21 11:11:04 +00:00
Eric Eastwood	aab3672037	Bust `_membership_stream_cache` cache when current state changes (#17732 ) This is particularly a problem in a state reset scenario where the membership might change without a corresponding event. This PR is targeting a scenario where a state reset happens which causes room membership to change. Previously, the cache would just hold onto stale data and now we properly bust the cache in this scenario. We have a few tests for these scenarios which you can see are now fixed because we can remove the `FIXME` where we were previously manually busting the cache in the test itself. This is a general Synapse thing so by it's nature it helps out Sliding Sync. Fix https://github.com/element-hq/synapse/issues/17368 Prerequisite for https://github.com/element-hq/synapse/issues/17929 --- Match when are busting `_curr_state_delta_stream_cache`	2025-01-08 10:11:09 -06:00
Colin Watson	d69c00b5a1	Stop using twisted.internet.defer.returnValue (#18020 ) `defer.returnValue` was only needed in Python 2; in Python 3, a simple `return` is fine. `twisted.internet.defer.returnValue` is deprecated as of Twisted 24.7.0. Most uses of `returnValue` in synapse were removed a while back; this cleans up some remaining bits.	2024-12-20 10:57:59 +00:00
Andrew Morgan	09f377fa52	Wording improvements for the `TaskScheduler` (#17992 ) As I found the current docstrings a bit unclear while trying to wrap my head around this class.	2024-12-18 11:42:34 +00:00
Richard van der Hoff	d80cd57c54	Fix new scheduled tasks jumping the queue (#17962 ) Currently, when a new scheduled task is added and its scheduled time has already passed, we set it to ACTIVE. This is problematic, because it means it will jump the queue ahead of all other SCHEDULED tasks; furthermore, if the Synapse process gets restarted, it will jump ahead of any ACTIVE tasks which have been started but are taking a while to run. Instead, we leave it set to SCHEDULED, but kick off a call to `_launch_scheduled_tasks`, which will decide if we actually have capacity to start a new task, and start the newly-added task if so.	2024-11-28 18:06:19 +00:00
Alexander Udovichenko	211c31dbd7	Fix WheelTimer implementation that can expired timeout early (#17850 ) When entries insert in the end of timer queue, then unnecessary entry inserted (with duplicated key). This can lead to some timeouts expired early and consume memory.	2024-11-05 12:08:17 -06:00
Erik Johnston	83513b75f7	Speed up sliding sync by computing extensions in parallel (#17884 ) The main change here is to add a helper function `gather_optional_coroutines`, which works in a similar way as `yieldable_gather_results` but takes a set of coroutines rather than a function	2024-10-30 10:51:04 +00:00
Erik Johnston	d427403c67	Fix check for outdated Rust library (#17861 ) This failed when install with poetry, so let's properly try and detect what's going on.	2024-10-29 17:06:15 +00:00
Erik Johnston	81e0f57800	Fix perf when streams don't change often (#17767 ) There is a bug with the `StreamChangeCache` where it would incorrectly return that all entities had changed if asked for entities changed since the earliest stream position. Note that for streams we use the inequalities: `$min_stream_id < stream_id <= $max_stream_id`, i.e. when we ask the stream change cache for all things that have changed since `$stream_id` we don't care for events that happened at `$stream_id`. Specifically: `_earliest_known_stream_pos` is the position at which we know that we'll have entries for all changes since that point, we can use the cache for any stream IDs that equal `_earliest_known_stream_pos`. `_earliest_known_stream_pos` is set in three places: - On startup we set it either to: - the current maximum stream ID, with not prefilled values; or - the minimum of the latest N values we pulled from the DB - When we evict items from the bottom, we set it to the stream ID of the evicted items. This was changed in https://github.com/matrix-org/synapse/pull/14435, but I think we were overly conservative there. --------- Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2024-09-30 13:52:33 +01:00
Quentin Gliech	7d52ce7d4b	Format files with Ruff (#17643 ) I thought ruff check would also format, but it doesn't. This runs ruff format in CI and dev scripts. The first commit is just a run of `ruff format .` in the root directory.	2024-09-02 12:39:04 +01:00
Erik Johnston	a51daffba5	Reduce concurrent thread usage in media (#17567 ) Follow on from #17558 Basically, we want to reduce the number of threads we want to use at a time, i.e. reduce the number of threads that are paused/blocked. We do this by returning from the thread when the consumer pauses the producer, rather than pausing in the thread. --------- Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2024-08-14 12:41:53 +01:00

1 2 3 4 5 ...

917 Commits