Fixes: #8088
Previously we would perform OIDC discovery on startup,
which involves making HTTP requests to the identity provider(s).
If that took a long time, we would block startup.
If that failed, we would crash startup.
This commit:
- makes the loading happen in the background on startup
- makes an error in the 'preload' non-fatal (though it logs at CRITICAL
for visibility)
- adds a templated error page to show on failed redirects (for
unavailable providers), as otherwise you get a JSON response in your
navigator.
- This involves introducing 2 new exception types to mark other
exceptions and keep the error handling fine-grained.
The machinery was already there to load-on-demand the discovery config,
so when the identity provider
comes back up, the discovery is reattempted and login can succeed.
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
For reference, this PR used to include this whole `shared_config` block in the diff.
But https://github.com/element-hq/synapse/pull/19324 was merged first which introduced parts of it already.
Here is what this code used to look like: 566670c363/docker/configure_workers_and_start.py (L865-L868)
---
Original context for why it was changed this way:
https://github.com/matrix-org/synapse/pull/14921#discussion_r1126257933
Previously, this code made me question two things:
1. Do we actually use `worker_config["shared_extra_conf"]` in the
templates?
- At first glance, I couldn't see why we're updating `shared_extra_conf`
here. It's not used in the `worker.yaml.j2` template so all of this
seemed a bit pointless.
- Turns out, updating `shared_extra_conf` itself is pointless and it's
being done as a convenient place to mix the objects to get things right
in `shared_config` (confusing).
1. Does it actually do anything?
- Because `shared_config` starts out as an empty object, my first glance
made me think we we're just updating with an empty object and then just
re-assigning. But because we're in a loop, we actually accumulate the
`shared_extra_conf` from each worker.
I'm not sure whether I'm capturing my confusion well enough here but
basically, this made me spend time trying to figure out what/why we're
doing things this way and we can use a more clear pattern to accomplish
the same thing.
---
This change is spawning from looking at the
`docker/configure_workers_and_start.py` script in order to add a metrics
listener ([upcoming
PR](https://github.com/element-hq/synapse/pull/19324)).
Spawning from wanting to [run a load
test](https://github.com/element-hq/synapse-rust-apps/pull/397) against
the Complement Docker image of Synapse and see metrics from the
homeserver.
### Why not just provide your own homeserver config?
Probably possible but it gets tricky when you try to use the workers
variant of the Docker image (`docker/Dockerfile-workers`). The way to
workaround it would probably be to `yq` edit everything in a script and
change `/data/homeserver.yaml` and `/conf/workers/*.yaml` to add the
`metrics` listener. And then modify `/conf/workers/shared.yaml` to add
`enable_metrics: true`. Doesn't spark much joy.
Best reviewed commit by commit.
With the new dedicated MAS API
(https://github.com/element-hq/synapse/pull/18520), it's possible that
deactivation starts off the main process, which was not possible because
of a few calls.
I basically looked at everything that the deactivation handler was
doing, reviewed whether it could run on workers or not, and find a
workaround when possible
---------
Co-authored-by: Eric Eastwood <erice@element.io>
The main goal of this PR is to handle device list changes onto multiple
writers, off the main process, so that we can have logins happening
whilst Synapse is rolling-restarting.
This is quite an intrusive change, so I would advise to review this
commit by commit; I tried to keep the history as clean as possible.
There are a few things to consider:
- the `device_list_key` in stream tokens becomes a
`MultiWriterStreamToken`, which has a few implications in sync and on
the storage layer
- we had a split between `DeviceHandler` and `DeviceWorkerHandler` for
master vs. worker process. I've kept this split, but making it rather
writer vs. non-writer worker, using method overrides for doing
replication calls when needed
- there are a few operations that need to happen on a single worker at a
time. Instead of using cross-worker locks, for now I made them run on
the first writer on the list
---------
Co-authored-by: Eric Eastwood <erice@element.io>
Spawning from using this code elsewhere and not knowing why it's there.
Based on this article and @reivilibre's experience mentioning
`PYTHONUNBUFFERED=1`,
> #### programming languages where the default “print” statement buffers
>
> Also, here are a few programming language where the default print
statement will buffer output when writing to a pipe, and some ways to
disable buffering if you want:
>
> - Python (disable with `python -u`, or `PYTHONUNBUFFERED=1`, or
`sys.stdout.reconfigure(line_buffering=False)`, or `print(x,
flush=True)`)
>
> _--
https://jvns.ca/blog/2024/11/29/why-pipes-get-stuck-buffering/#programming-languages-where-the-default-print-statement-buffers_
Adds new environment variables that can be used with the Docker image
(`SYNAPSE_HTTP_PROXY`/`SYNAPSE_HTTPS_PROXY`/`SYNAPSE_NO_PROXY`)
Useful for things like the [Secure Border
Gateway](https://element.io/server-suite/secure-border-gateways)
### Why is this necessary?
You can already configure the `HTTP_PROXY`/`HTTPS_PROXY` environment
variables to proxy outbound requests but setting this globally in the
Docker image affects all processes which isn't always desirable or
workable in the case where the proxy is running in the Docker image
itself (because the Debian packages will fail to download because the
proxy isn't up and running yet) . Adding Synapse specific environment
variables
(`SYNAPSE_HTTP_PROXY`/`SYNAPSE_HTTPS_PROXY`/`SYNAPSE_NO_PROXY`) makes
things much more targetable.
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
Pulled out of #16803 since the drive-by cleanup was maybe not as
drive-by as I had hoped.
<!--
Fixes: # <!-- -->
<!--
Supersedes: # <!-- -->
<!--
Follows: # <!-- -->
<!--
Part of: # <!-- -->
Base: `develop` <!-- git-stack-base-branch:develop -->
<!--
This pull request is commit-by-commit review friendly. <!-- -->
<!--
This pull request is intended for commit-by-commit review. <!-- -->
Original commit schedule, with full messages:
<ol>
<li>
Add a --generate-only option
</li>
</ol>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Unix socket support for `federation` and `client` Listeners has existed now for a little while(since [1.81.0](https://github.com/matrix-org/synapse/pull/15353)), but there was one last hold out before it could be complete: HTTP Replication communication. This should finish it up. The Listeners would have always worked, but would have had no way to be talked to/at.
---------
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
Co-authored-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
See https://github.com/matrix-org/synapse/pull/14095#discussion_r990335492
This is useful because when see that a relevant event is an `outlier` or `soft-failed`, then that's a good unexpected indicator explaining why it's not showing up. `filter_events_for_client` is used in `/sync`, `/messages`, `/context` which are all common end-to-end assertion touch points (also notifications, relations).
* Add master to the instance_map as part of Complement, have ReplicationEndpoint look at instance_map for master.
* Fix typo in drive by.
* Remove unnecessary worker_replication_* bits from unit tests and add master to instance_map(hopefully in the right place)
* Several updates:
1. Switch from master to main for naming the main process in the instance_map. Add useful constants for easier adjustment of names in the future.
2. Add backwards compatibility for worker_replication_* to allow time to transition to new style. Make sure to prioritize declaring main directly on the instance_map.
3. Clean up old comments/commented out code.
4. Adjust unit tests to match with new code.
5. Adjust Complement setup infrastructure to only add main to the instance_map if workers are used and remove now unused options from the worker.yaml template.
* Initial Docs upload
* Changelog
* Missed some commented out code that can go now
* Remove TODO comment that no longer holds true.
* Fix links in docs
* More docs
* Remove debug logging
* Apply suggestions from code review
Co-authored-by: reivilibre <olivier@librepush.net>
* Apply suggestions from code review
Co-authored-by: reivilibre <olivier@librepush.net>
* Update version to latest, include completeish before/after examples in upgrade notes.
* Fix up and docs too
---------
Co-authored-by: reivilibre <olivier@librepush.net>
Stabilize the threads API (MSC3856) by supporting (only) the v1
path for the endpoint.
This also marks the API as safe for workers since it is a read-only
API.