Commit Graph

73 Commits

Author SHA1 Message Date
Raja Subramanian 9d5c351d36 Fix prom units for forwarding latency/jitter. (#4045) 2025-11-02 14:38:25 +05:30
Raja Subramanian e183657cff Add prom histogram for forwarding latency and jitter. (#4044)
* Add prom histogram for forwarding latency and jitter.

Using short term stats for histogram.

An example setting is
1s - short term
1m - long term

Using the 1s (short term) data for histogram. In that 1 second, all
packet forwarding latencies are averaged for latency and std. dev. of
the collection is used as jitter.

* try different staticcheck
2025-11-01 23:25:03 +05:30
Raja Subramanian ca0d5ee972 Count request/response packets on both client and server side. (#4001)
Currently, the signal requests are counted on media side and signal
responses are counted on controller side. This does not provide the
granularity to check how many response messages each media node is
sending.

Seeing some cases where track subscriptions are slow under load. This
would be good to see if the media node is doing a lot of signal response
messages.
2025-10-14 16:58:36 +05:30
Raja Subramanian 10103449c5 Add country label to edge prom stats. (#3816)
* Add country label to edge prom stats.

* data channel country stats

* test

* pub/sub time country
2025-07-24 13:23:05 +05:30
Raja Subramanian fc867c5b8e Webhook prom stats (#3697) 2025-06-04 14:31:28 -07:00
Raja Subramanian 1c8307c72c Use cgroup for memstats. (#3573)
* Use cgroup for memstats.

* deps
2025-04-05 11:54:36 +05:30
Raja Subramanian 3238ab8d77 Calculate rates for memory used and total. (#3570)
Calculating rate for total does seem odd, but keeping it consitent/lined
up with used memory calculation.
2025-04-02 10:23:38 +05:30
Raja Subramanian 8cc17f8f8b Rework node stats a bit. (#3555)
* Rework node stats a bit.

Related protocol PR - https://github.com/livekit/protocol/pull/1023

- Make a config for node stats measurements. Wanted to put the config in
  `routing` package, but a circular dependency forced me to put in
   config.go
- Make rate calculations explicit, i. e. requested via config.
  Previously, it had some odd checks to decide when to calculate rate
  and it would have been calculating over different windows.
- Report signal/data channel bytes every 5 seconds to stats collection
  module. Previously, it was doing it every 30 seconds and that meant
  some windows could have had a large spike
  NOTE: Still need to think about this for load calculations as a large
  number of participants leaving could flush in a small window and that
  could report a large spike in bytes/packets. Maybe need to ignore
  signal bytes for load calculation?

* deps

* use default node stats config if given config is nil

* split out node stats into a struct for re-use

* update config
2025-03-27 12:42:19 +05:30
Paul Wells 3167266495 add datapacket stream metrics (#3450)
* add datapacket stream metrics

* normalize mime type
2025-02-19 22:28:10 -08:00
Raja Subramanian 3b0077f2fe Log connection quality changes. (#3311)
Also remove the connection quality drop prom as it is unused and also
adds state/complexity.
2025-01-07 10:58:31 +05:30
Raja Subramanian 86383b2271 De-centralize some configs to where they are used. (#3162)
* De-centralize some configs to where they are used.

And make default variables.

Renaming a bit, but these are all internal config and have not been
added to documented config.

* Keep documented config as is.

* test

* typo
2024-11-08 12:47:30 +05:30
cnderrauber cf59267631 Add counter for pub&sub time metrics (#3084)
* Add counter for pub&sub time metrics

The pub&sub shows large value in migration related case like
muted/disabled migration, the subscription time depends on
the time when publisher unmute the track(sending rtp packet
after migration), add a counter to distinguish since we
can't control the time in such cases and the first subscription
attemps also is more meaningful than those cases.

* Add info log for high publish delay
2024-10-11 12:07:24 +08:00
cnderrauber 978db00034 Add sdk, participant_kind to pub sub metrics (#3023)
* exclude go client from track publication metric

* add sdk,participant_kind lables

* fix test
2024-09-19 10:42:47 +08:00
Raja Subramanian 787b8450e9 Record out-of-packet count/rate in prom. (#2980)
* Record out-of-packet count/rate in prom.

Adding a field to AnalyticsStream to make this easier to report.
Let me know if adding to AnalyticsStream is not ok.

Will set up a protocol PR if it is okay.

* deps
2024-09-07 00:19:54 +05:30
cnderrauber 947e8f5909 Speed up track publication (#2952)
* speed up track publication

Add metrics for track publication and subscription

Return EnabledCodecs in JoinResponse so client can
choose codec without server side codec fallback

Cache remote webrtc track without AddTrackRequest to
let client send publisher offer before AddTrackRequest response

* go mod

* clean code
2024-08-23 18:38:32 +08:00
Lukas Herman 8a229fda9d add participant session duration metric (#2801) 2024-06-17 17:52:08 -04:00
cnderrauber 7ed1284b96 report average forward metrics (#2737)
* report average forward metrics

* unused parameter
2024-05-28 17:03:18 +08:00
cnderrauber 2288e402ac register forward metrics (#2735) 2024-05-27 15:47:01 +08:00
Paul Wells 38470f378b add message bytes metric (#2731) 2024-05-26 14:01:13 -07:00
cnderrauber e6aa36fdd6 Add forward stats (#2725)
* Add forward metrics

* ignore packets was not forwarded

* rename
2024-05-24 17:43:28 +08:00
Mathew Kamkar 10c8582a6b get cpu stats from cgroup, remove env (#2636)
* get cpu stats from cgroup, remove env

* undo rand seed removal

* tests
2024-04-08 21:15:17 -07:00
Paul Wells e5b8e25064 use shared psrpc utils (#2506)
* use shared psrpc utils

* fix

* deps
2024-02-24 00:38:49 -08:00
Mathew Kamkar 7508560fde larger buckets for jitter prometheus histogram (#2468) 2024-02-09 12:09:51 -08:00
Paul Wells c726cbf2ba increase max session start time bin size (#2380) 2024-01-12 03:49:23 -08:00
Paul Wells 2fe2a9c9f2 add session start time metric (#2377) 2024-01-11 23:23:51 -08:00
Paul Wells f4a984d446 preallocate prometheus packet counters (#1942) 2023-08-08 01:06:14 -07:00
David Zhao 981fb7cac7 Adding license notices (#1913)
* Adding license notices

* remove from config
2023-07-27 16:43:19 -07:00
David Zhao 7e5a7ae79f Fixed windows build (#1768) 2023-06-04 00:17:25 -07:00
David Zhao 956735ae05 Fix node stats updates on Windows (#1748)
Because we aren't able to get CPU count/load info on Windows, they are
stubbed out to return placeholders. This restores compatibility to run
on Windows.
2023-05-29 10:53:08 -07:00
Raja Subramanian a085afc6ee Send quality stats to prometheus. (#1708) 2023-05-12 09:44:03 +05:30
David Colburn ab6c994db4 update protocol/psrpc (#1643)
* update protocol/psrpc

* metadata references
2023-04-21 12:43:20 -07:00
Paul Wells 6636e37664 add prometheus psrpc metrics observer (#1571)
* add prometheus psrpc metrics observer

* record rpc error counts

* update psrpc

* update protocol
2023-04-05 03:50:43 -07:00
Dan McFaul 1848a21eda add configurable environment value (#1421)
* add configurable prometheus env label

* Update pkg/config/config.go

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>

* Update cmd/server/main.go

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>

* Update config-sample.yaml

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>

* set config.Environment value to dev when in dev mode

* be more precise for config-sample

---------

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>
2023-02-15 14:41:44 -07:00
Mathew Kamkar 937256d89e don't error when get tc stats fails (#1386) 2023-02-10 10:05:45 -08:00
Paul Wells 52fd0a641b adjust jitter histogram buckets (#1347)
* adjust jitter histogram buckets

* typo
2023-01-29 22:45:32 -08:00
David Zhao 2fa46e2df4 Retry initial connection attempt should it fail (#1335)
Sometimes the initial selected node could fail. In that case, we'll give it a few more attempts to locate a media node for the session instead of failing it after the first try.
2023-01-25 22:59:57 -08:00
David Zhao cd6b8b80b9 feat: SubscriptionManager to consolidate subscription handling (#1317)
Added a new manager to handle all subscription needs. Implemented using reconciler pattern. The goals are:

improve subscription resilience by separating desired state and current state
reduce complexity of synchronous processing
better detect failures with the ability to trigger full reconnect
2023-01-24 23:06:16 -08:00
Dan McFaul 9e3ca1e989 adding rtc_init stat (#1316)
* adding rtc_initiated stat

* clean up signal and rtc init/connected

* update naming and break out stats update funcs

* update protocol dependency
2023-01-23 12:49:15 -07:00
Paul Wells 1ef7c46fd7 publish stream stats to prometheus (#1313)
* add prometheus stats for rtt/jitter/packet loss

* add track source to metrics

* better packet loss bins

* add track type to metrics

* remove source from AnalyticsStat

* regenerate telemetry service fake

* compute loss from per stream packet count
2023-01-19 19:37:15 -08:00
Benjamin Pracht edc39da0b1 Add TwirpRequestStatusReporter twirp server hook to count requests (#1309) 2023-01-18 11:53:20 -08:00
Dan McFaul 4d6f0cd0f7 Stats collect v2 (#1291)
* initial commit

* add correct label

* clean up

* more cleanup on adding stats

* cleanup

* move things to pub and sub monitors, ensure stats are correctly updated

* fix merge conflict

* Fix panic on MacOS (#1296)

* fixing last feedback

Co-authored-by: Raja Subramanian <raja.gobi@tutanota.com>
2023-01-11 14:49:50 -07:00
Raja Subramanian 1db218a5b1 Fix panic on MacOS (#1296) 2023-01-11 10:08:56 +05:30
Mathew Kamkar 7c970da974 add memory used and total to node stats (#1293)
* add memory used and total to node stats

* raja review: consistency

* update protocol
2023-01-10 12:32:04 -08:00
Mathew Kamkar caae389717 node type prometheus metric labels (#1197) 2022-11-29 20:36:35 -08:00
Raja Subramanian 1e8cc0dc76 Consolidate getMemoryStats (#1122)
* Consolidate getMemoryStats

* Avoid divide-by-0
2022-10-26 09:16:39 +05:30
Raja Subramanian 96a058b503 Populate memory load in node stats. (#1121) 2022-10-25 21:31:23 +05:30
cnderrauber c401ca58af turn packet and bytes stats used for telemetry and load control (#969)
* stats for turn

* add connections stats

* stats for standalone turn server only

* wire update
2022-08-31 11:00:27 +08:00
Mathew Kamkar 767d660809 Use LocalNode ID in Prometheus metrics (#959) 2022-08-25 22:16:20 -07:00
Mathew Kamkar e0676132d4 Packet stats from TC (#832)
* system level packet stats from tc

* drop percent

* test fix

* formatting

* formatting/wording

* prometheus metrics

* update livekit protocol go module
2022-07-15 10:41:40 -07:00
David Zhao b316698409 Release with GoReleaser. Allow start without key configuration (#788) 2022-06-26 12:27:43 -07:00