Commit Graph

121 Commits

Author SHA1 Message Date
David Zhao
5ff72a99b9 Report publish & subscribe RTPStats as Telemetry events (#1506) 2023-03-10 10:28:54 -08:00
shishirng
8856ce6422 Bump up interval for sending telemetry stats to 30 seconds (#1430)
Signed-off-by: shishir gowda <shishir@livekit.io>
2023-02-16 15:53:58 -05:00
Dan McFaul
1848a21eda add configurable environment value (#1421)
* add configurable prometheus env label

* Update pkg/config/config.go

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>

* Update cmd/server/main.go

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>

* Update config-sample.yaml

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>

* set config.Environment value to dev when in dev mode

* be more precise for config-sample

---------

Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>
2023-02-15 14:41:44 -07:00
Mathew Kamkar
937256d89e don't error when get tc stats fails (#1386) 2023-02-10 10:05:45 -08:00
David Zhao
2851a8ac98 Improved robustness of subscription stack (#1382)
UpdateSubscription had a shortcoming where when it couldn't find the
participant, it ignored the request.

This PR further removes the reliance of current publisher state from
subscribers.
- SubscribeToTrack only takes in a trackID
- Introduced RoomTrackManager to maintain all published tracks to a room
- Added TrackUnpublished event to clearly indicate when a track has been removed
- SubscribeRequested event no longer include information about the publisher
2023-02-06 18:08:26 -08:00
cnderrauber
8b6dab780c Add reconnect reason and signal rtt calculation (#1381)
* Add connect reason and signal rtt calculate

* Update protocol

* solve comment
2023-02-06 11:12:25 +08:00
David Zhao
b023c531c2 Fix incorrect unsubscribed track telemetry (#1350)
- only log unsubscribe on close if Track was actually bound
- update subscribed counters even if a failure had been logged before
2023-01-30 10:16:21 -08:00
Paul Wells
52fd0a641b adjust jitter histogram buckets (#1347)
* adjust jitter histogram buckets

* typo
2023-01-29 22:45:32 -08:00
David Zhao
2fa46e2df4 Retry initial connection attempt should it fail (#1335)
Sometimes the initial selected node could fail. In that case, we'll give it a few more attempts to locate a media node for the session instead of failing it after the first try.
2023-01-25 22:59:57 -08:00
David Zhao
cd6b8b80b9 feat: SubscriptionManager to consolidate subscription handling (#1317)
Added a new manager to handle all subscription needs. Implemented using reconciler pattern. The goals are:

improve subscription resilience by separating desired state and current state
reduce complexity of synchronous processing
better detect failures with the ability to trigger full reconnect
2023-01-24 23:06:16 -08:00
Dan McFaul
9e3ca1e989 adding rtc_init stat (#1316)
* adding rtc_initiated stat

* clean up signal and rtc init/connected

* update naming and break out stats update funcs

* update protocol dependency
2023-01-23 12:49:15 -07:00
Paul Wells
1ef7c46fd7 publish stream stats to prometheus (#1313)
* add prometheus stats for rtt/jitter/packet loss

* add track source to metrics

* better packet loss bins

* add track type to metrics

* remove source from AnalyticsStat

* regenerate telemetry service fake

* compute loss from per stream packet count
2023-01-19 19:37:15 -08:00
Benjamin Pracht
edc39da0b1 Add TwirpRequestStatusReporter twirp server hook to count requests (#1309) 2023-01-18 11:53:20 -08:00
David Zhao
732309a8c1 Added track success & muted events (#1308)
Related to livekit/protocol#273

This PR adds:
- ParticipantResumed - for when ICE restart or migration had occurred
- TrackPublishRequested - when we initiate a publication
- TrackSubscribeRequested - when we initiate a subscription
- TrackMuted - publisher muted track
- TrackUnmuted - publisher unmuted track
- TrackPublish/TrackSubcribe events will indicate when those actions have been successful, to differentiate.
2023-01-15 15:40:20 -08:00
Dan McFaul
4d6f0cd0f7 Stats collect v2 (#1291)
* initial commit

* add correct label

* clean up

* more cleanup on adding stats

* cleanup

* move things to pub and sub monitors, ensure stats are correctly updated

* fix merge conflict

* Fix panic on MacOS (#1296)

* fixing last feedback

Co-authored-by: Raja Subramanian <raja.gobi@tutanota.com>
2023-01-11 14:49:50 -07:00
Raja Subramanian
1db218a5b1 Fix panic on MacOS (#1296) 2023-01-11 10:08:56 +05:30
Mathew Kamkar
7c970da974 add memory used and total to node stats (#1293)
* add memory used and total to node stats

* raja review: consistency

* update protocol
2023-01-10 12:32:04 -08:00
shishirng
dceb624b83 Send participant object in telemetry on Active event (#1282)
Signed-off-by: shishir gowda <shishir@livekit.io>
2023-01-04 08:55:49 -05:00
David Zhao
c1d7dbd4fc Tweaks to prometheus participant counter (#1240)
* Tweaks to prometheus participant counter

Ensure that we don't miss adding a count in migration scenarios

* avoid nil ICEConfig
2022-12-19 14:30:14 -08:00
David Zhao
120335da00 Allow skipping of sending ParticipantJoined analytics event (#1236)
In certain scenarios such as migration, we do not want a duplicate event
to be sent when the participant is reconnecting. The Prometheus metric
should still be updated though.
2022-12-18 22:09:20 -08:00
David Zhao
33902a9f2a Do not send ParticipantLeft webhook event unless connected successfully. (#1234)
Fixes #1130
2022-12-18 17:37:55 -08:00
cnderrauber
3c907ed460 Add stats for data channel and signal (#1198)
* Add stats for data channel and signal

* Solve comment
2022-11-30 14:53:19 +08:00
Mathew Kamkar
caae389717 node type prometheus metric labels (#1197) 2022-11-29 20:36:35 -08:00
Raja Subramanian
1e8cc0dc76 Consolidate getMemoryStats (#1122)
* Consolidate getMemoryStats

* Avoid divide-by-0
2022-10-26 09:16:39 +05:30
Raja Subramanian
96a058b503 Populate memory load in node stats. (#1121) 2022-10-25 21:31:23 +05:30
shishirng
8dc5a899a9 Create stats worker for participant on Active if not exists (#1059)
on migration, participants don't send JOIN event, so create it in
PARTICIPANT_ACTIVE event
2022-09-29 17:26:43 -04:00
David Zhao
3da908302a Do not warn when notifier isn't configured (#1043)
By default there are no webhook URLs to notify, so a notifier isn't created.
2022-09-26 13:30:27 -07:00
David Colburn
803046b882 Auto egress (#1011)
* auto egress

* fix room service test

* reuse StartTrackEgress

* add timestamp

* update prefixed filename explicitly

* update protocol

* clean up telemetry

* fix telemetry tests

* separate room internal storage

* auto participant egress

* remove custom template url

* fix internal key

* use map for stats workers

* remove sync.Map

* remove participant composite
2022-09-21 12:04:19 -07:00
cnderrauber
c401ca58af turn packet and bytes stats used for telemetry and load control (#969)
* stats for turn

* add connections stats

* stats for standalone turn server only

* wire update
2022-08-31 11:00:27 +08:00
Mathew Kamkar
767d660809 Use LocalNode ID in Prometheus metrics (#959) 2022-08-25 22:16:20 -07:00
shishirng
79cf614783 Send egressInfo in telemetry event (#941)
Signed-off-by: shishir gowda <shishir@livekit.io>

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-08-23 08:18:12 -04:00
David Zhao
b8bda3f14b Separate calls to Telemetry vs Prometheus room lifecycle (#935)
* Separate calls to Telemetry vs Prometheus room lifecycle

* remove unused import
2022-08-20 20:22:16 -07:00
shishirng
a3e8304b56 send participant info/identity during track_published event (#846)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-07-21 17:34:52 -04:00
Raja Subramanian
29039b4e76 Use a go routine to clean up stats workers. (#836)
* Use a go routine to clean up stats workers.

It is possible that certain events (like TrackUnpublished) can
happen after the participant is closed. For webhooks pertaining
to those events, need details like room name/id. So,reap stats
workers a little while after the participant left event happens.

* handle data race report

* log analytics worker reap

* debug log
2022-07-18 11:47:43 +05:30
Mathew Kamkar
e0676132d4 Packet stats from TC (#832)
* system level packet stats from tc

* drop percent

* test fix

* formatting

* formatting/wording

* prometheus metrics

* update livekit protocol go module
2022-07-15 10:41:40 -07:00
David Colburn
fbbcbe77df Remove recording (#811)
* remove recorder service

* update protocol
2022-07-05 18:39:32 -07:00
David Zhao
b316698409 Release with GoReleaser. Allow start without key configuration (#788) 2022-06-26 12:27:43 -07:00
Raja Subramanian
4ed9b5f90e Revert "Using shadow pattern for stats workers (#742)" (#744)
This reverts commit 2b561d2bad.
2022-05-31 11:06:44 +05:30
Raja Subramanian
f19815754c Do not re-compute average on real time metric change (#743) 2022-05-31 10:33:17 +05:30
Raja Subramanian
2b561d2bad Using shadow pattern for stats workers (#742) 2022-05-31 10:32:54 +05:30
Raja Subramanian
508aa471a9 Track participant join total + rate in node stats (#741)
* Track participant join total + rate in node stats

* update protocol
2022-05-30 15:58:30 +05:30
cnderrauber
f958fbcc1c simulcast codecs support (#720)
simulcast codecs support 

Co-authored-by: David Zhao <dz@livekit.io>
2022-05-27 19:55:50 +08:00
Raja Subramanian
33032f6c4b Fix some test races and other things found with go test -race (#711) 2022-05-24 10:16:36 +05:30
Raja Subramanian
8ef53037eb Lock stats worker maps (#704) 2022-05-21 10:36:49 +05:30
David Zhao
79296d0939 Fixed concurrent modification to map (#702)
Synchronizes access to stats worker maps. Previously it was accessed
from both OpsQueue goroutine and run() worker
2022-05-20 13:45:13 -07:00
Raja Subramanian
012337c96a Fix sense of tranmission label (#692) 2022-05-18 12:52:05 +05:30
David Zhao
7eb3362d0a Keep track of retransmissions in NodeStats (#677) 2022-05-10 15:25:24 -07:00
David Zhao
bd7e3beda4 Improve frequency of stats update (#673)
* Improve frequency of stats update

Prometheus stats are updated as the data becomes available, instead of
aggregated along with telemetry batches. Node availability decisions can
now react much faster to these stats.

* use the same intervals for connection quality updates
2022-05-09 08:55:06 -07:00
Raja Subramanian
081b97142f Variable collision killed stats workers (#670) 2022-05-06 23:42:40 +05:30
Raja Subramanian
c6f895db15 Prevent concurrent access of stats worker map (#666) 2022-05-04 23:00:20 +05:30