UpdateSubscription had a shortcoming where when it couldn't find the
participant, it ignored the request.
This PR further removes the reliance of current publisher state from
subscribers.
- SubscribeToTrack only takes in a trackID
- Introduced RoomTrackManager to maintain all published tracks to a room
- Added TrackUnpublished event to clearly indicate when a track has been removed
- SubscribeRequested event no longer include information about the publisher
Sometimes the initial selected node could fail. In that case, we'll give it a few more attempts to locate a media node for the session instead of failing it after the first try.
Added a new manager to handle all subscription needs. Implemented using reconciler pattern. The goals are:
improve subscription resilience by separating desired state and current state
reduce complexity of synchronous processing
better detect failures with the ability to trigger full reconnect
* add prometheus stats for rtt/jitter/packet loss
* add track source to metrics
* better packet loss bins
* add track type to metrics
* remove source from AnalyticsStat
* regenerate telemetry service fake
* compute loss from per stream packet count
Related to livekit/protocol#273
This PR adds:
- ParticipantResumed - for when ICE restart or migration had occurred
- TrackPublishRequested - when we initiate a publication
- TrackSubscribeRequested - when we initiate a subscription
- TrackMuted - publisher muted track
- TrackUnmuted - publisher unmuted track
- TrackPublish/TrackSubcribe events will indicate when those actions have been successful, to differentiate.
* initial commit
* add correct label
* clean up
* more cleanup on adding stats
* cleanup
* move things to pub and sub monitors, ensure stats are correctly updated
* fix merge conflict
* Fix panic on MacOS (#1296)
* fixing last feedback
Co-authored-by: Raja Subramanian <raja.gobi@tutanota.com>
In certain scenarios such as migration, we do not want a duplicate event
to be sent when the participant is reconnecting. The Prometheus metric
should still be updated though.
* Use a go routine to clean up stats workers.
It is possible that certain events (like TrackUnpublished) can
happen after the participant is closed. For webhooks pertaining
to those events, need details like room name/id. So,reap stats
workers a little while after the participant left event happens.
* handle data race report
* log analytics worker reap
* debug log
* Improve frequency of stats update
Prometheus stats are updated as the data becomes available, instead of
aggregated along with telemetry batches. Node availability decisions can
now react much faster to these stats.
* use the same intervals for connection quality updates