Ports the event class to Rust.
The main difference here are:
1. There is now a single event class
2. We now validate a lot more at event construction time than we
previously did (we basically checked nothing before). This required some
changes to the tests, including
https://github.com/matrix-org/sytest/pull/1423
Reviewable commit-by-commit.
### Overview of Event Rust structure
The format of the event struct in Rust is quite different than that in
Python.
The top-level looks like:
```rust
pub struct Event {
/// The parsed event JSON.
fields: FormattedEvent,
/// The event ID. For format v1 this is read directly from the JSON;
/// for v2+ it is computed from the canonical-JSON hash at
/// construction time and cached here.
event_id: Arc<str>,
/// Synapse-internal per-event state that lives outside the federated
/// JSON (e.g. outlier flag, soft-failure, stream positions).
#[pyo3(get)]
internal_metadata: EventInternalMetadata,
/// The room version this event was parsed for.
#[pyo3(get)]
room_version: &'static RoomVersion,
/// `None` for accepted events; otherwise a short reason set by auth
/// when the event was rejected.
rejected_reason: Option<Box<str>>,
}
```
which includes the actual parsed event in `FormattedEvent`, plus the
rest of the event metadata.
```rust
pub struct FormattedEvent<E = Arc<EventFormatEnum>> {
#[serde(default)]
pub signatures: Signatures,
#[serde(default)]
pub unsigned: Unsigned,
#[serde(flatten)]
pub specific_fields: E,
#[serde(flatten)]
pub common_fields: Arc<EventCommonFields>,
}
```
The struct is further split into the common fields, format specific
fields, plus the signatures and unsigned. We split out the signature and
unsigned fields as they are mutable, so when we clone the event we can
still share the common and specific fields and only copy signature and
unsigned.
The `specific_fields` are the fields that depend on the format version.
They can either be a specific format (e.g. `E = EventFormatV1`) or a
type-erased enum `EventFormatEnum` that is across all room versions:
```rust
pub enum EventFormatEnum {
V1(EventFormatV1),
V2V3(EventFormatV2V3),
V4(EventFormatV4),
VMSC4242(EventFormatVMSC4242),
}
```
For example:
```rust
/// Shared flat-list encoding of `auth_events` and `prev_events`, reused
/// by every format from v2/v3 onwards.
#[derive(Serialize, Deserialize)]
pub struct SimpleAuthPrevEvents {
pub auth_events: Vec<String>,
pub prev_events: Vec<String>,
}
/// Version-specific fields for room versions 3-10.
#[derive(Serialize, Deserialize)]
pub struct EventFormatV2V3 {
pub room_id: Box<str>,
#[serde(flatten)]
pub auth_prev_events: SimpleAuthPrevEvents,
}
```
### Dev notes
As discussed in
[`#element-backend-internal:matrix.org`](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$3gTjDO440GbAz57cXcCawwiyFLiD0crrarvS1uhzKOY?via=jki.re&via=element.io&via=matrix.org)
---------
Co-authored-by: Eric Eastwood <erice@element.io>
This is a simplification so that `unsigned` only includes "simple"
values, to make it easier to port to Rust.
Reviewable commit-by-commit
Summary:
1. **Add `recheck` column to `redactions` table**
A new boolean `recheck` column (default true) is added to the
`redactions` table. This captures whether a redaction needs its sender
domain checked at read time — required for room v3+ where redactions are
accepted speculatively and later validated. When persisting a new
redaction, `recheck` is set directly from
`event.internal_metadata.need_to_check_redaction()`.
It's fine if initially we recheck all redactions, as it only results in
a little more CPU overhead (as we always pull out the redaction event
regardless).
2. **Backfill `recheck` via background update**
A background update (`redactions_recheck`) backfills the new column for
existing rows by reading `recheck_redaction` from each event's
`internal_metadata` JSON. This avoids loading full event objects by
reading `event_json` directly via a SQL JOIN.
3. **Don't fetch confirmed redaction events from the DB**
Previously, when loading events, Synapse recursively fetched all
redaction events regardless of whether they needed domain rechecking.
Now `_fetch_event_rows` reads the `recheck` column and splits redactions
into two lists:
- `unconfirmed_redactions` — need fetching and domain validation
- `confirmed_redactions` — already validated, applied directly without
fetching the event
This avoids unnecessary DB reads for the common case of
already-confirmed redactions.
4. **Move `redacted_because` population to `EventClientSerializer`**
Previously, `redacted_because` (the full redaction event object) was
stored in `event.unsigned` at DB fetch time, coupling storage-layer code
to client serialization concerns. This is removed from
`_maybe_redact_event_row` and moved into
`EventClientSerializer.serialize_event`, which fetches the redaction
event on demand. The storage layer now only sets
`unsigned["redacted_by"]` (the redaction event ID).
5. **Always use `EventClientSerializer`**
The standalone `serialize_event` function was made private
(`_serialize_event`). All external callers — `rest/client/room.py`,
`rest/admin/events.py, appservice/api.py`, and `tests` — were updated to
use `EventClientSerializer.serialize_event` / `serialize_events`,
ensuring
`redacted_because` is always populated correctly via the serializer.
6. **Batch-fetch redaction events in `serialize_events`**
`serialize_events` now collects all `redacted_by` IDs from the event
batch upfront and fetches them in a single `get_events` call, passing
the result as a `redaction_map` to each `serialize_event` call. This
reduces N individual DB round-trips to one when serializing a batch of
events that includes redacted events.
---------
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Introduce `Clock.call_when_running(...)` to wrap startup code in a
logcontext, ensuring we can identify which server generated the logs.
Background:
> Ideally, nothing from the Synapse homeserver would be logged against the `sentinel`
> logcontext as we want to know which server the logs came from. In practice, this is not
> always the case yet especially outside of request handling.
>
> Global things outside of Synapse (e.g. Twisted reactor code) should run in the
> `sentinel` logcontext. It's only when it calls into application code that a logcontext
> gets activated. This means the reactor should be started in the `sentinel` logcontext,
> and any time an awaitable yields control back to the reactor, it should reset the
> logcontext to be the `sentinel` logcontext. This is important to avoid leaking the
> current logcontext to the reactor (which would then get picked up and associated with
> the next thing the reactor does).
>
> *-- `docs/log_contexts.md`
Also adds a lint to prefer `Clock.call_when_running(...)` over
`reactor.callWhenRunning(...)`
Part of https://github.com/element-hq/synapse/issues/18905
We don't necessarily have `instance_name` for old events (before we
support multiple event persisters). We treat those as if the
`instance_name` was "master".
---------
Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>
Use fully-qualified `PersistedEventPosition` (`instance_name` and `stream_ordering`) when returning `RoomsForUser` to facilitate proper comparisons and `RoomStreamToken` generation.
Spawning from https://github.com/element-hq/synapse/pull/17187 where we want to utilize this change
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.