UpdateSubscription had a shortcoming where when it couldn't find the
participant, it ignored the request.
This PR further removes the reliance of current publisher state from
subscribers.
- SubscribeToTrack only takes in a trackID
- Introduced RoomTrackManager to maintain all published tracks to a room
- Added TrackUnpublished event to clearly indicate when a track has been removed
- SubscribeRequested event no longer include information about the publisher
Added a new manager to handle all subscription needs. Implemented using reconciler pattern. The goals are:
improve subscription resilience by separating desired state and current state
reduce complexity of synchronous processing
better detect failures with the ability to trigger full reconnect
* add prometheus stats for rtt/jitter/packet loss
* add track source to metrics
* better packet loss bins
* add track type to metrics
* remove source from AnalyticsStat
* regenerate telemetry service fake
* compute loss from per stream packet count
Related to livekit/protocol#273
This PR adds:
- ParticipantResumed - for when ICE restart or migration had occurred
- TrackPublishRequested - when we initiate a publication
- TrackSubscribeRequested - when we initiate a subscription
- TrackMuted - publisher muted track
- TrackUnmuted - publisher unmuted track
- TrackPublish/TrackSubcribe events will indicate when those actions have been successful, to differentiate.
In certain scenarios such as migration, we do not want a duplicate event
to be sent when the participant is reconnecting. The Prometheus metric
should still be updated though.
* Use a go routine to clean up stats workers.
It is possible that certain events (like TrackUnpublished) can
happen after the participant is closed. For webhooks pertaining
to those events, need details like room name/id. So,reap stats
workers a little while after the participant left event happens.
* handle data race report
* log analytics worker reap
* debug log
* Increase frequency of status updates and longer avail. threshold.
* better fix.
* fix room close test failure due to slow peer connection Close
* Perform avg computation more frequently if data has changed
* Use delta stats throughout and avoid calculating deltas in telemetry
* Fix a few things after testing
* Remove debug
* Fix tests
* delete instead of setting to nil
* Point to the latest protocol
* Telemetry and webhook improvements.
* avoid blocking on telemetry channel - increase channel size and drop when full
* send ParticipantJoined webhook when fully joined (i.e. on ParticipantActive)
* send TrackPublished & TrackUnpublished webhooks
* increase number of parallel webhook workers to 50
* update protocol
* Store client meta on participant join
capture region, time_to_connect, ip, node
Signed-off-by: shishir gowda <shishir@livekit.io>
* Update proto dep
Signed-off-by: shishir gowda <shishir@livekit.io>
* Telemetry capture published track updates
Signed-off-by: shishir gowda <shishir@livekit.io>
* Updated OnVideoLayerUpdate to take slice of layers
Signed-off-by: shishir gowda <shishir@livekit.io>
* Update proto dep
Signed-off-by: shishir gowda <shishir@livekit.io>
It allows all actions/events to run in the same go routine.
Therefore no synchronization primitives are needed inside
telemetry service implementation.
* Make TelemetryService testable
Timer is extracted for better testability of telemetryservice.
Divided TelemetryService to internal part that:
- contains business logic
- does not contain timer
- and therefore testable
and external part that:
- does not contain business logic
- contains timer to send analytics every 10 seconds for all participants.
- does not need tests
* Add Test_AnalyticsSentWhenParticipantLeaves
* Fix test