mirror of
https://github.com/element-hq/synapse.git
synced 2026-06-22 06:33:09 +00:00
ff034c1f62
This was all done through claude. ## Fix flaky `test_edu_large_messages_not_splitting_one_user` (`TooLong` under `trial -jN`) ### Problem The "Build .deb packages" CI step intermittently failed with: ``` twisted.protocols.amp.TooLong ...in tests.rest.client.test_sendtodevice.SendToDeviceTestCase.test_edu_large_messages_not_splitting_one_user ``` The deb build runs the suite with `twisted.trial -j2`. In that mode, worker log events are shipped to the manager process over Twisted's AMP protocol, which encodes each value with a 2-byte length prefix — so any single log line of **64 KiB or more** raises `TooLong`. ### Root cause Not a bug in the to-device/EDU logic, and **not** debug logging enabled by the build (it runs at the default `ERROR` level). It's a **log-level leak between tests sharing a `-j2` worker**: - `tests/logging/test_loggers.py::ExplicitlyConfiguredLoggerTestCase` calls `root_logger.setLevel(logging.DEBUG)` directly and never restores it (no `setUp`/`tearDown`/`addCleanup`). - When that test runs before `test_edu_large_messages_not_splitting_one_user` **in the same worker process**, the root logger is left at `DEBUG`. - That test deliberately builds an EDU of exactly `SOFT_MAX_EDU_SIZE - 1` (65 535) bytes. Storing it triggers `synapse/storage/database.py`'s `[SQL values]` DEBUG log, which dumps the full query params — producing a **65 708-byte** line that overflows AMP's cap. It looks "flaky" purely because of `-j2` scheduling: whether the two tests land on the same worker, and in what order. ### Fix Three commits: 1. **Restore the root logger level in `ExplicitlyConfiguredLoggerTestCase`** — `addCleanup` to put the level back. Fixes the root cause. 2. **Truncate oversized log lines in the test log handler** (`ToTwistedHandler.emit`) — caps lines at 1000 chars so no debug line can break `trial -jN`, regardless of which query is logged (defense in depth). 3. **Truncate values in `[SQL values]` debug logging** — caps the logged param repr at 1000 chars (guarded by `isEnabledFor(DEBUG)` to keep the hot path lazy). Keeps production debug logs sane too. ### Testing - The reproduction (`trial -j2 tests.logging.test_loggers <the EDU test>`) went from **~3/8 failing** to **10/10 passing**. - Confirmed the root level is restored after the logging tests, and that the `[SQL values]` line is now capped (~50 KB with a `[truncated]` marker, was 65 708). - `ruff` + `mypy` clean; `tests.logging.test_loggers` and `tests.rest.client.test_sendtodevice` pass (14/14). --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Eric Eastwood <erice@element.io>
88 lines
3.2 KiB
Python
88 lines
3.2 KiB
Python
#
|
|
# This file is licensed under the Affero General Public License (AGPL) version 3.
|
|
#
|
|
# Copyright (C) 2023 New Vector, Ltd
|
|
#
|
|
# This program is free software: you can redistribute it and/or modify
|
|
# it under the terms of the GNU Affero General Public License as
|
|
# published by the Free Software Foundation, either version 3 of the
|
|
# License, or (at your option) any later version.
|
|
#
|
|
# See the GNU Affero General Public License for more details:
|
|
# <https://www.gnu.org/licenses/agpl-3.0.html>.
|
|
#
|
|
# Originally licensed under the Apache License, Version 2.0:
|
|
# <http://www.apache.org/licenses/LICENSE-2.0>.
|
|
#
|
|
# [This file includes modifications made by New Vector Limited]
|
|
#
|
|
#
|
|
import logging
|
|
import os
|
|
|
|
import twisted.logger
|
|
|
|
from synapse.logging.context import LoggingContextFilter
|
|
from synapse.synapse_rust import reset_logging_config
|
|
|
|
# The maximum length of a single log line. Extremely long log lines aren't
|
|
# useful, and keeping them short also stays well under the limit that would
|
|
# otherwise break parallel test runs: under `trial -jN` worker log events are
|
|
# shipped to the manager process over trial's AMP, which raises `TooLong` for
|
|
# any value of 64KiB or more (e.g. a DEBUG dump of a large SQL query's values)
|
|
# (c.f. https://github.com/twisted/twisted/issues/12482).
|
|
_MAX_LOG_LINE_LENGTH = 60_000
|
|
|
|
|
|
class ToTwistedHandler(logging.Handler):
|
|
"""logging handler which sends the logs to the twisted log"""
|
|
|
|
tx_log = twisted.logger.Logger()
|
|
|
|
def emit(self, record: logging.LogRecord) -> None:
|
|
log_entry = self.format(record)
|
|
if len(log_entry) > _MAX_LOG_LINE_LENGTH:
|
|
log_entry = (
|
|
log_entry[:_MAX_LOG_LINE_LENGTH]
|
|
+ "... [truncated overly long log line]"
|
|
)
|
|
log_level = record.levelname.lower().replace("warning", "warn")
|
|
self.tx_log.emit(
|
|
twisted.logger.LogLevel.levelWithName(log_level), "{entry}", entry=log_entry
|
|
)
|
|
|
|
|
|
def setup_logging() -> None:
|
|
"""Configure the python logging appropriately for the tests.
|
|
|
|
(Logs will end up in _trial_temp.)
|
|
"""
|
|
root_logger = logging.getLogger()
|
|
|
|
# We exclude `%(asctime)s` from this format because the Twisted logger adds its own
|
|
# timestamp
|
|
log_format = "%(name)s - %(lineno)d - %(levelname)s - %(request)s - %(message)s"
|
|
|
|
handler = ToTwistedHandler()
|
|
formatter = logging.Formatter(log_format)
|
|
handler.setFormatter(formatter)
|
|
handler.addFilter(LoggingContextFilter())
|
|
root_logger.addHandler(handler)
|
|
|
|
log_level = os.environ.get("SYNAPSE_TEST_LOG_LEVEL", "ERROR")
|
|
root_logger.setLevel(log_level)
|
|
|
|
# In order to not add noise by default (since we only log ERROR messages for trial
|
|
# tests as configured above), we only enable this for developers for looking for
|
|
# more INFO or DEBUG.
|
|
if root_logger.isEnabledFor(logging.INFO):
|
|
# Log when events are (maybe unexpectedly) filtered out of responses in tests. It's
|
|
# just nice to be able to look at the CI log and figure out why an event isn't being
|
|
# returned.
|
|
logging.getLogger("synapse.visibility.filtered_event_debug").setLevel(
|
|
logging.DEBUG
|
|
)
|
|
|
|
# Blow away the pyo3-log cache so that it reloads the configuration.
|
|
reset_logging_config()
|