14 Commits

Author SHA1 Message Date
Eric Eastwood
58f59ffbcb Refactor Grafana dashboard to use server_name label (#19337)
- Update `synapse_xxx` (server-level) metrics to use
`server_name="$server_name",` instead of `instance="$instance"`
- Add `synapse_server_name_info` metric to map Synapse `server_name`s to
the `instance`s they're hosted on.
- For process level metrics, update to use `xxx * on (instance, job,
index) group_left(server_name)
synapse_server_name_info{server_name="$server_name"}`

All of the changes here are backwards compatible with whatever people
were doing before with their Prometheus/Grafana dashboards.

Previously, the recommendation was to use the `instance` label to group
everything under the same server (803e4b4d88/docs/metrics-howto.md (L93-L147))

But the `instance` label actually has a special meaning and we're
actually abusing it by using it that way:

> `instance`: The `<host>:<port>` part of the target's URL that was
scraped.
>
> *--
https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series*

Since https://github.com/element-hq/synapse/issues/18592 (Synapse
`v1.139.0`), we now have the `server_name` label to use instead.


---

Additionally, the assumption that a single process is serving a single
server is no longer true with [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/).

Part of https://github.com/element-hq/synapse-small-hosts/issues/106

### Motivating use case

Although this change also benefits [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/)
(https://github.com/element-hq/synapse-small-hosts/issues/106), this is
actually spawning from adding Prometheus metrics to our workerized
Docker image (https://github.com/element-hq/synapse/pull/19324,
https://github.com/element-hq/synapse/pull/19336) with a more correct
label setup (without `instance`) and wanting the dashboard to be better.



### Testing strategy

1. Make sure your firewall allows the Docker containers to communicate
to the host (`host.docker.internal`) so they can access exposed ports of
other Docker containers. We want to allow Synapse to access the
Prometheus container and Grafana to access to the Prometheus container.
- `sudo ufw allow in on docker0 comment "Allow traffic from the default
Docker network to the host machine (host.docker.internal)"`
- `sudo ufw allow in on br-+ comment "(from Matrix Complement testing)
Allow traffic from custom Docker networks to the host machine
(host.docker.internal)"`
- [Complement firewall
docs](ee6acd9154/README.md (potential-conflict-with-firewall-software))
1. Build the Docker image for Synapse: `docker build -t
matrixdotorg/synapse -f docker/Dockerfile .`
([docs](7a24fafbc3/docker/README-testing.md (building-and-running-the-images-manually)))
 1. Generate config for Synapse:
    ```
    docker run -it --rm \
        --mount type=volume,src=synapse-data,dst=/data \
        -e SYNAPSE_SERVER_NAME=my.docker.synapse.server \
        -e SYNAPSE_REPORT_STATS=yes \
        -e SYNAPSE_ENABLE_METRICS=1 \
        matrixdotorg/synapse:latest generate
    ```
 1. Start Synapse:
     ```
    docker run -d --name synapse \
        --mount type=volume,src=synapse-data,dst=/data \
        -p 8008:8008 \
        -p 19090:19090 \
        matrixdotorg/synapse:latest
    ```
1. You should be able to see metrics from Synapse at
http://localhost:19090/_synapse/metrics
 1. Create a Prometheus config (`prometheus.yml`)
    ```yaml
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
      evaluation_interval: 15s
    
    scrape_configs:
      - job_name: prometheus
        scrape_interval: 15s
        metrics_path: /_synapse/metrics
        scheme: http
        static_configs:
          - targets:
# This should point to the Synapse metrics listener (we're using
`host.docker.internal` because this is from within the Prometheus
container)
              - host.docker.internal:19090
    ```
1. Start Prometheus (update the volume bind mount to the config you just
saved somewhere):
    ```
    docker run \
        --detach \
        --name=prometheus \
        --add-host host.docker.internal:host-gateway \
        -p 9090:9090 \
-v
~/Documents/code/random/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml
\
        prom/prometheus
    ```
1. Make sure you're seeing some data in Prometheus. On
http://localhost:9090/query, search for `synapse_build_info`
 1. Start [Grafana](https://hub.docker.com/r/grafana/grafana)
    ```
docker run -d --name=grafana --add-host
host.docker.internal:host-gateway -p 3000:3000 grafana/grafana
    ```
1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials:
`admin`/`admin`)
1. **Connections** -> **Data Sources** -> **Add data source** ->
**Prometheus**
     - Prometheus server URL: `http://host.docker.internal:9090`
 1. Import the Synapse dashboard: `contrib/grafana/synapse.json`

To test workers, you can use the testing strategy from
https://github.com/element-hq/synapse/pull/19336 (assumes both changes
from this PR and the other PR are combined)
2026-01-14 17:57:42 -06:00
Eric Eastwood
ace2614fad Remove docs on dead legacy metric names (#19341)
These metrics were [removed completely from the
codebase](444bc56cda/docs/changelogs/CHANGES-2022.md (synapse-1730-2022-12-06))
in Synapse v1.73.0 (2022-12-06). 3-years is plenty enough time


The deprecation/removal is still in our [upgrade
notes](444bc56cda/docs/upgrade.md (deprecation-of-legacy-prometheus-metric-names))
which points to a durable versioned link with the info still available:
https://element-hq.github.io/synapse/v1.69/metrics-howto.html#renaming-of-metrics--deprecation-of-old-names-in-12
2026-01-08 10:03:15 -06:00
Erik Johnston
930dc9e2d3 Fix typo 2023-12-13 16:37:10 +00:00
Erik Johnston
8613f7693e More renaming 2023-12-13 15:41:11 +00:00
David Robertson
78909f5028 Include monthly active user metrics in the list of legacy metrics names (#14360) 2022-11-04 10:45:01 +00:00
Dirk Klimpel
c0e868e423 Improve the listener example on the metrics documentation (#14078)
Signed-off-by: Dirk Klimpel <dirk@klimpel.org>
2022-10-07 14:54:01 +01:00
reivilibre
b381701f8c Announce that legacy metric names are deprecated, will be turned off by default in Synapse v1.71.0 and removed altogether in Synapse v1.73.0. (#14024) 2022-10-03 17:16:15 +00:00
Eric Eastwood
36b184b782 Update docs to make enabling metrics more clear (#13678)
It was really easy to miss the `enable_metrics: True` step with the previous language.
2022-09-02 11:29:51 -05:00
Dirk Klimpel
0adc2882c1 Fix broken links in documentation (#10180)
* Fix broken links in documentation

* newsfile
2021-06-16 13:15:52 +01:00
Dirk Klimpel
14f81a6d24 Improve documentation how to configure prometheus for workers (#8822) 2020-11-26 10:42:55 +00:00
Michael Kaye
f49c2093b5 Cross-link documentation to the prometheus recording rules. (#8667) 2020-10-27 15:29:50 -04:00
Erik Johnston
aaf9ce72a0 Fix typo in metrics docs (#7966) 2020-07-28 10:03:18 +01:00
Brendan Abolivier
036fab5d8a Document monitoring workers (#7357)
It doesn't seem to be documented anywhere and means that you suddenly start losing metrics without any obvious reason when you go from monolith to workers (e.g. #7312).
2020-04-27 21:36:47 +02:00
dstipp
379d2a8c39 (#5849) Convert rst to markdown (#6040)
Converting some of the rst documentation to markdown.  Attempted to
preserve whitespace and line breaks to minimize cosmetic change.
2019-09-17 12:55:29 +01:00