mirror of
https://github.com/element-hq/synapse.git
synced 2026-04-05 11:56:07 +00:00
- Update `synapse_xxx` (server-level) metrics to use
`server_name="$server_name",` instead of `instance="$instance"`
- Add `synapse_server_name_info` metric to map Synapse `server_name`s to
the `instance`s they're hosted on.
- For process level metrics, update to use `xxx * on (instance, job,
index) group_left(server_name)
synapse_server_name_info{server_name="$server_name"}`
All of the changes here are backwards compatible with whatever people
were doing before with their Prometheus/Grafana dashboards.
Previously, the recommendation was to use the `instance` label to group
everything under the same server (803e4b4d88/docs/metrics-howto.md (L93-L147))
But the `instance` label actually has a special meaning and we're
actually abusing it by using it that way:
> `instance`: The `<host>:<port>` part of the target's URL that was
scraped.
>
> *--
https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series*
Since https://github.com/element-hq/synapse/issues/18592 (Synapse
`v1.139.0`), we now have the `server_name` label to use instead.
---
Additionally, the assumption that a single process is serving a single
server is no longer true with [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/).
Part of https://github.com/element-hq/synapse-small-hosts/issues/106
### Motivating use case
Although this change also benefits [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/)
(https://github.com/element-hq/synapse-small-hosts/issues/106), this is
actually spawning from adding Prometheus metrics to our workerized
Docker image (https://github.com/element-hq/synapse/pull/19324,
https://github.com/element-hq/synapse/pull/19336) with a more correct
label setup (without `instance`) and wanting the dashboard to be better.
### Testing strategy
1. Make sure your firewall allows the Docker containers to communicate
to the host (`host.docker.internal`) so they can access exposed ports of
other Docker containers. We want to allow Synapse to access the
Prometheus container and Grafana to access to the Prometheus container.
- `sudo ufw allow in on docker0 comment "Allow traffic from the default
Docker network to the host machine (host.docker.internal)"`
- `sudo ufw allow in on br-+ comment "(from Matrix Complement testing)
Allow traffic from custom Docker networks to the host machine
(host.docker.internal)"`
- [Complement firewall
docs](ee6acd9154/README.md (potential-conflict-with-firewall-software))
1. Build the Docker image for Synapse: `docker build -t
matrixdotorg/synapse -f docker/Dockerfile .`
([docs](7a24fafbc3/docker/README-testing.md (building-and-running-the-images-manually)))
1. Generate config for Synapse:
```
docker run -it --rm \
--mount type=volume,src=synapse-data,dst=/data \
-e SYNAPSE_SERVER_NAME=my.docker.synapse.server \
-e SYNAPSE_REPORT_STATS=yes \
-e SYNAPSE_ENABLE_METRICS=1 \
matrixdotorg/synapse:latest generate
```
1. Start Synapse:
```
docker run -d --name synapse \
--mount type=volume,src=synapse-data,dst=/data \
-p 8008:8008 \
-p 19090:19090 \
matrixdotorg/synapse:latest
```
1. You should be able to see metrics from Synapse at
http://localhost:19090/_synapse/metrics
1. Create a Prometheus config (`prometheus.yml`)
```yaml
global:
scrape_interval: 15s
scrape_timeout: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: prometheus
scrape_interval: 15s
metrics_path: /_synapse/metrics
scheme: http
static_configs:
- targets:
# This should point to the Synapse metrics listener (we're using
`host.docker.internal` because this is from within the Prometheus
container)
- host.docker.internal:19090
```
1. Start Prometheus (update the volume bind mount to the config you just
saved somewhere):
```
docker run \
--detach \
--name=prometheus \
--add-host host.docker.internal:host-gateway \
-p 9090:9090 \
-v
~/Documents/code/random/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml
\
prom/prometheus
```
1. Make sure you're seeing some data in Prometheus. On
http://localhost:9090/query, search for `synapse_build_info`
1. Start [Grafana](https://hub.docker.com/r/grafana/grafana)
```
docker run -d --name=grafana --add-host
host.docker.internal:host-gateway -p 3000:3000 grafana/grafana
```
1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials:
`admin`/`admin`)
1. **Connections** -> **Data Sources** -> **Add data source** ->
**Prometheus**
- Prometheus server URL: `http://host.docker.internal:9090`
1. Import the Synapse dashboard: `contrib/grafana/synapse.json`
To test workers, you can use the testing strategy from
https://github.com/element-hq/synapse/pull/19336 (assumes both changes
from this PR and the other PR are combined)
144 lines
4.2 KiB
Markdown
144 lines
4.2 KiB
Markdown
# How to monitor Synapse metrics using Prometheus
|
|
|
|
1. Install Prometheus:
|
|
|
|
Follow instructions at
|
|
<http://prometheus.io/docs/introduction/install/>
|
|
|
|
1. Enable Synapse metrics:
|
|
|
|
In `homeserver.yaml`, make sure `enable_metrics` is
|
|
set to `True`.
|
|
|
|
1. Enable the `/_synapse/metrics` Synapse endpoint that Prometheus uses to
|
|
collect data:
|
|
|
|
There are two methods of enabling the metrics endpoint in Synapse.
|
|
|
|
The first serves the metrics as a part of the usual web server and
|
|
can be enabled by adding the `metrics` resource to the existing
|
|
listener as such as in this example:
|
|
|
|
```yaml
|
|
listeners:
|
|
- port: 8008
|
|
tls: false
|
|
type: http
|
|
x_forwarded: true
|
|
bind_addresses: ['::1', '127.0.0.1']
|
|
|
|
resources:
|
|
# added "metrics" in this line
|
|
- names: [client, federation, metrics]
|
|
compress: false
|
|
```
|
|
|
|
This provides a simple way of adding metrics to your Synapse
|
|
installation, and serves under `/_synapse/metrics`. If you do not
|
|
wish your metrics be publicly exposed, you will need to either
|
|
filter it out at your load balancer, or use the second method.
|
|
|
|
The second method runs the metrics server on a different port, in a
|
|
different thread to Synapse. This can make it more resilient to
|
|
heavy load meaning metrics cannot be retrieved, and can be exposed
|
|
to just internal networks easier. The served metrics are available
|
|
over HTTP only, and will be available at `/_synapse/metrics`.
|
|
|
|
Add a new listener to homeserver.yaml as in this example:
|
|
|
|
```yaml
|
|
listeners:
|
|
- port: 8008
|
|
tls: false
|
|
type: http
|
|
x_forwarded: true
|
|
bind_addresses: ['::1', '127.0.0.1']
|
|
|
|
resources:
|
|
- names: [client, federation]
|
|
compress: false
|
|
|
|
# beginning of the new metrics listener
|
|
- port: 9000
|
|
type: metrics
|
|
bind_addresses: ['::1', '127.0.0.1']
|
|
```
|
|
|
|
1. Restart Synapse.
|
|
|
|
1. Add a Prometheus target for Synapse.
|
|
|
|
It needs to set the `metrics_path` to a non-default value (under
|
|
`scrape_configs`):
|
|
|
|
```yaml
|
|
- job_name: "synapse"
|
|
scrape_interval: 15s
|
|
metrics_path: "/_synapse/metrics"
|
|
static_configs:
|
|
- targets: ["my.server.here:port"]
|
|
```
|
|
|
|
where `my.server.here` is the IP address of Synapse, and `port` is
|
|
the listener port configured with the `metrics` resource.
|
|
|
|
If your prometheus is older than 1.5.2, you will need to replace
|
|
`static_configs` in the above with `target_groups`.
|
|
|
|
1. Restart Prometheus.
|
|
|
|
1. Consider using the [grafana dashboard](https://github.com/element-hq/synapse/tree/master/contrib/grafana/)
|
|
and required [recording rules](https://github.com/element-hq/synapse/tree/master/contrib/prometheus/)
|
|
|
|
## Monitoring workers
|
|
|
|
To monitor a Synapse installation using [workers](workers.md),
|
|
every worker needs to be monitored independently, in addition to
|
|
the main homeserver process. This is because workers don't send
|
|
their metrics to the main homeserver process, but expose them
|
|
directly (if they are configured to do so).
|
|
|
|
To allow collecting metrics from a worker, you need to add a
|
|
`metrics` listener to its configuration, by adding the following
|
|
under `worker_listeners`:
|
|
|
|
```yaml
|
|
- type: metrics
|
|
bind_address: ''
|
|
port: 9101
|
|
```
|
|
|
|
The `bind_address` and `port` parameters should be set so that
|
|
the resulting listener can be reached by prometheus, and they
|
|
don't clash with an existing worker.
|
|
With this example, the worker's metrics would then be available
|
|
on `http://127.0.0.1:9101`.
|
|
|
|
Example Prometheus target for Synapse with workers:
|
|
|
|
```yaml
|
|
- job_name: "synapse"
|
|
scrape_interval: 15s
|
|
metrics_path: "/_synapse/metrics"
|
|
static_configs:
|
|
- targets: ["my.server.here:port"]
|
|
labels:
|
|
job: "master"
|
|
index: 1
|
|
- targets: ["my.workerserver.here:port"]
|
|
labels:
|
|
job: "generic_worker"
|
|
index: 1
|
|
- targets: ["my.workerserver.here:port"]
|
|
labels:
|
|
job: "generic_worker"
|
|
index: 2
|
|
- targets: ["my.workerserver.here:port"]
|
|
labels:
|
|
job: "media_repository"
|
|
index: 1
|
|
```
|
|
|
|
Labels (`job`, `index`) can be defined as anything.
|
|
The labels are used to group graphs in grafana.
|