44 Commits

Author SHA1 Message Date
Paul Wells
c6e6c0215f add debug metric for tracking references (#4134) 2025-12-07 11:39:21 -08:00
Raja Subramanian
7f10e18bac Record join/publish/subscribe cancellations. (#4102)
To get better picture of success/failure rate.
2025-11-25 14:06:02 +05:30
Raja Subramanian
ca0d5ee972 Count request/response packets on both client and server side. (#4001)
Currently, the signal requests are counted on media side and signal
responses are counted on controller side. This does not provide the
granularity to check how many response messages each media node is
sending.

Seeing some cases where track subscriptions are slow under load. This
would be good to see if the media node is doing a lot of signal response
messages.
2025-10-14 16:58:36 +05:30
Raja Subramanian
fc867c5b8e Webhook prom stats (#3697) 2025-06-04 14:31:28 -07:00
Raja Subramanian
1c8307c72c Use cgroup for memstats. (#3573)
* Use cgroup for memstats.

* deps
2025-04-05 11:54:36 +05:30
Raja Subramanian
3238ab8d77 Calculate rates for memory used and total. (#3570)
Calculating rate for total does seem odd, but keeping it consitent/lined
up with used memory calculation.
2025-04-02 10:23:38 +05:30
Raja Subramanian
8cc17f8f8b Rework node stats a bit. (#3555)
* Rework node stats a bit.

Related protocol PR - https://github.com/livekit/protocol/pull/1023

- Make a config for node stats measurements. Wanted to put the config in
  `routing` package, but a circular dependency forced me to put in
   config.go
- Make rate calculations explicit, i. e. requested via config.
  Previously, it had some odd checks to decide when to calculate rate
  and it would have been calculating over different windows.
- Report signal/data channel bytes every 5 seconds to stats collection
  module. Previously, it was doing it every 30 seconds and that meant
  some windows could have had a large spike
  NOTE: Still need to think about this for load calculations as a large
  number of participants leaving could flush in a small window and that
  could report a large spike in bytes/packets. Maybe need to ignore
  signal bytes for load calculation?

* deps

* use default node stats config if given config is nil

* split out node stats into a struct for re-use

* update config
2025-03-27 12:42:19 +05:30
Paul Wells
3167266495 add datapacket stream metrics (#3450)
* add datapacket stream metrics

* normalize mime type
2025-02-19 22:28:10 -08:00
Raja Subramanian
86383b2271 De-centralize some configs to where they are used. (#3162)
* De-centralize some configs to where they are used.

And make default variables.

Renaming a bit, but these are all internal config and have not been
added to documented config.

* Keep documented config as is.

* test

* typo
2024-11-08 12:47:30 +05:30
Paul Wells
38470f378b add message bytes metric (#2731) 2024-05-26 14:01:13 -07:00
cnderrauber
e6aa36fdd6 Add forward stats (#2725)
* Add forward metrics

* ignore packets was not forwarded

* rename
2024-05-24 17:43:28 +08:00
Mathew Kamkar
10c8582a6b get cpu stats from cgroup, remove env (#2636)
* get cpu stats from cgroup, remove env

* undo rand seed removal

* tests
2024-04-08 21:15:17 -07:00
Paul Wells
e5b8e25064 use shared psrpc utils (#2506)
* use shared psrpc utils

* fix

* deps
2024-02-24 00:38:49 -08:00
David Zhao
981fb7cac7 Adding license notices (#1913)
* Adding license notices

* remove from config
2023-07-27 16:43:19 -07:00
David Zhao
956735ae05 Fix node stats updates on Windows (#1748)
Because we aren't able to get CPU count/load info on Windows, they are
stubbed out to return placeholders. This restores compatibility to run
on Windows.
2023-05-29 10:53:08 -07:00
Raja Subramanian
a085afc6ee Send quality stats to prometheus. (#1708) 2023-05-12 09:44:03 +05:30
Paul Wells
6636e37664 add prometheus psrpc metrics observer (#1571)
* add prometheus psrpc metrics observer

* record rpc error counts

* update psrpc

* update protocol
2023-04-05 03:50:43 -07:00
Dan McFaul
1848a21eda add configurable environment value (#1421)
* add configurable prometheus env label

* Update pkg/config/config.go

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>

* Update cmd/server/main.go

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>

* Update config-sample.yaml

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>

* set config.Environment value to dev when in dev mode

* be more precise for config-sample

---------

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>
2023-02-15 14:41:44 -07:00
Mathew Kamkar
937256d89e don't error when get tc stats fails (#1386) 2023-02-10 10:05:45 -08:00
David Zhao
cd6b8b80b9 feat: SubscriptionManager to consolidate subscription handling (#1317)
Added a new manager to handle all subscription needs. Implemented using reconciler pattern. The goals are:

improve subscription resilience by separating desired state and current state
reduce complexity of synchronous processing
better detect failures with the ability to trigger full reconnect
2023-01-24 23:06:16 -08:00
Dan McFaul
9e3ca1e989 adding rtc_init stat (#1316)
* adding rtc_initiated stat

* clean up signal and rtc init/connected

* update naming and break out stats update funcs

* update protocol dependency
2023-01-23 12:49:15 -07:00
Benjamin Pracht
edc39da0b1 Add TwirpRequestStatusReporter twirp server hook to count requests (#1309) 2023-01-18 11:53:20 -08:00
Dan McFaul
4d6f0cd0f7 Stats collect v2 (#1291)
* initial commit

* add correct label

* clean up

* more cleanup on adding stats

* cleanup

* move things to pub and sub monitors, ensure stats are correctly updated

* fix merge conflict

* Fix panic on MacOS (#1296)

* fixing last feedback

Co-authored-by: Raja Subramanian <raja.gobi@tutanota.com>
2023-01-11 14:49:50 -07:00
Raja Subramanian
1db218a5b1 Fix panic on MacOS (#1296) 2023-01-11 10:08:56 +05:30
Mathew Kamkar
7c970da974 add memory used and total to node stats (#1293)
* add memory used and total to node stats

* raja review: consistency

* update protocol
2023-01-10 12:32:04 -08:00
Mathew Kamkar
caae389717 node type prometheus metric labels (#1197) 2022-11-29 20:36:35 -08:00
Raja Subramanian
1e8cc0dc76 Consolidate getMemoryStats (#1122)
* Consolidate getMemoryStats

* Avoid divide-by-0
2022-10-26 09:16:39 +05:30
Raja Subramanian
96a058b503 Populate memory load in node stats. (#1121) 2022-10-25 21:31:23 +05:30
Mathew Kamkar
767d660809 Use LocalNode ID in Prometheus metrics (#959) 2022-08-25 22:16:20 -07:00
Mathew Kamkar
e0676132d4 Packet stats from TC (#832)
* system level packet stats from tc

* drop percent

* test fix

* formatting

* formatting/wording

* prometheus metrics

* update livekit protocol go module
2022-07-15 10:41:40 -07:00
Raja Subramanian
f19815754c Do not re-compute average on real time metric change (#743) 2022-05-31 10:33:17 +05:30
Raja Subramanian
508aa471a9 Track participant join total + rate in node stats (#741)
* Track participant join total + rate in node stats

* update protocol
2022-05-30 15:58:30 +05:30
David Zhao
7eb3362d0a Keep track of retransmissions in NodeStats (#677) 2022-05-10 15:25:24 -07:00
David Zhao
3c53b843c5 Fixes bps and pps average computation. (#639)
Exclude NACK count from being a trigger to refresh stats.
Since NACKs are updated instantaneously without having to wait for
Telemetry updates that occurs every 10s, having even a single NACK
could cause us to compute averages prematurely.
2022-04-20 19:17:02 -07:00
David Zhao
431069af95 Rename StatsUpdateFrequency -> StatsUpdateInterval 2022-04-19 22:22:58 -07:00
David Zhao
282e2aed49 Increase frequency of status updates and longer availability threshold (#628)
* Increase frequency of status updates and longer avail. threshold.

* better fix.

* fix room close test failure due to slow peer connection Close

* Perform avg computation more frequently if data has changed
2022-04-19 22:18:00 -07:00
Raja Subramanian
a19ca69f5f Prevent stats update if the deltas are empty (#619)
* Prevent stats update if the deltas are empty

* increase force interval

* static check

* Change max delay to 30 seconds
2022-04-18 22:51:34 +05:30
Mathew Kamkar
cac6d22a72 store cpu load in node stats (#524)
* store cpu load in node stats

* num cpus uint32

* cpu load selector test

* dep update
2022-03-16 14:51:22 -07:00
Raja Subramanian
2706dc130f Replace sync/atomic usage with uber/atomic (#471) 2022-02-28 09:57:17 +05:30
David Colburn
3d132730f9 replace entire nodeStats object (#393) 2022-01-31 17:09:36 -07:00
David Colburn
faa870de3d Move callbacks out of messageRouter (#269)
* move callbacks out of messageRouter

* OCD

* more OCD

* fix forwarder test

* even more OCD

* maximum OCD

* package name collision, copy lock by value
2021-12-17 13:19:23 -08:00
Mathew Kamkar
bd42a39117 Include node ID with Prometheus metrics (#251)
* include node id in prometheus metrics

* static prom init and nodeID

* update protocol dep
2021-12-10 15:49:14 -08:00
David Zhao
2d93ccd668 Updated protocol from protocol/proto -> protocol/livekit (#242)
* Updated protocol from protocol/proto -> protocol/livekit

* separate MediaTrack from PublishedTrack
2021-12-08 13:58:38 -08:00
David Colburn
289ebd32ff Telemetry refactor (#172)
* telemetry refactor

* fix imports

* update protocol
2021-11-08 20:00:34 -06:00