mirror of
https://github.com/element-hq/synapse.git
synced 2026-04-05 09:45:44 +00:00
Lower the database statement_timeout to 10m (#19604)
Lower the database `statement_timeout` to 10m Part of https://github.com/element-hq/backend-internal/issues/223 The `statement_timeout` was first introduced in https://github.com/matrix-org/synapse/pull/15853 as a great sanity check starting point (any timeout is better than no timeout). This idea is spawning from [discussions on 2026-03-16](https://docs.google.com/document/d/12RZKPk3a4__JUSH9wYHODo9rRyKzsHg6BSCAcmqmbOU/edit?tab=t.0#bookmark=id.x9y4kx82lvaj) about `matrix.org` performance and specifically room event search which frequently causes long-running transactions that hold up vacuums and causes things to go wibbly. It was brought up again today in another `matrix.org` performance [discussion on 2026-03-25](https://docs.google.com/document/d/15h4901gAzGMAol2n1b71OAcesVMNhgQbPNb55xouluw/edit?tab=t.0#bookmark=id.b53a918ntk0j) and was surprised nothing came of this yet. ### Some things may break We expect that when/if things break, to add specific overrides where it matters. For reference, we already have a couple of overrides in the codebase: - When using [`create_index_in_background`](40d35a95e2/synapse/storage/background_updates.py (L802-L804)) - When [purging rooms](40d35a95e2/synapse/storage/databases/main/purge_events.py (L166-L169)) ### Going further Ideally, we could go even further. The `statement_timeout` for any database query coming from people's HTTP requests should be 60s or less (or lower, no one wants to wait that long anyway). For now we'll take this iterative step down to 10m. Another point of reference is that for example on `matrix.org`, we even time out the whole HTTP request after 180s. For reference, we also discussed propagating request cancellation to the database query (cancelling statements when the user goes away) which is a good additional thing we could do. @sandhose looked into this before and it is only a recent libpq / Postgres / psycopg3 feature, not available in psycopg2 (which we currently use). As an example of how weird cancellation can get, `psql` will spawn a new connection to `SELECT pg_cancel_backend(<pid>);` when you `Ctrl+C`, see https://neon.com/blog/ctrl-c-in-psql-gives-me-the-heebie-jeebies ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
This commit is contained in:
1
changelog.d/19604.misc
Normal file
1
changelog.d/19604.misc
Normal file
@@ -0,0 +1 @@
|
||||
Lower the Postgres database `statement_timeout` to 10m (previously 1h).
|
||||
@@ -31,6 +31,7 @@ from synapse.storage.engines._base import (
|
||||
IsolationLevel,
|
||||
)
|
||||
from synapse.storage.types import Cursor
|
||||
from synapse.util.duration import Duration
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from synapse.storage.database import LoggingDatabaseConnection
|
||||
@@ -54,14 +55,15 @@ class PostgresEngine(
|
||||
|
||||
psycopg2.extensions.register_adapter(bytes, _disable_bytes_adapter)
|
||||
self.synchronous_commit: bool = database_config.get("synchronous_commit", True)
|
||||
# Set the statement timeout to 1 hour by default.
|
||||
# Any query taking more than 1 hour should probably be considered a bug;
|
||||
# Set the statement timeout to 10 minutes by default.
|
||||
#
|
||||
# Any query taking more than 10 minutes should probably be considered a bug;
|
||||
# most of the time this is a sign that work needs to be split up or that
|
||||
# some degenerate query plan has been created and the client has probably
|
||||
# timed out/walked off anyway.
|
||||
# This is in milliseconds.
|
||||
self.statement_timeout: int | None = database_config.get(
|
||||
"statement_timeout", 60 * 60 * 1000
|
||||
"statement_timeout", Duration(minutes=10).as_millis()
|
||||
)
|
||||
self._version: int | None = None # unknown as yet
|
||||
|
||||
|
||||
Reference in New Issue
Block a user