* Rework receiver restart.
- Protect against concurrent restarts
- Clean up and consolidate code around restarts
- Use `RestartStream` of buffer rather than creating new buffers.
* fix test
This is not all of it as it is not possible (or at least I do not know
of a way) to get all suggestions for a repo/project. Did this via loop
searching mainly and taking the modernize suggestions.
* Add support for RTP stream restart.
When an unhandled packet is encountered, try a restart sequence.
Restart happens when 5 packets with contiguous sequence numbers and same
or increasing time stamps are received. Note that this does not work for
B-frame type of scenarios, but that is true for receive path handling
even before this. As WebRTC does not use B-frames, it is fine. But,
needs to be looked at again if B-frames are necessary.
It is controlled by a config that is disabled by default.
* clean up
* debug log
* Adjust stream allocator ping interval based on state.
In steady state, does a 15 second ping.
While deficient, to be able to react to probes faster, it pings at 100ms
interval.
* clean up
* log ops queue not able to wake up
* Use Seque in ops queue.
Standardizing some uses
- Change OpsQueue to use Deque so that it can grow/shrink as necessary and
need not worry about channel getting full and dropping events.
- Change StreamAllocator and TelemetryService to use OpsQueue so that
they also need not worry about channel size and overflows.
* Address feedback
* delete obvious comment
* clean up
* Log cleanup pass
Demoted a bunch of logs to DEBUG, consolidated logs.
* use context logger and fix context var usage
* moved common error types, fixed tests
* Integrate logger components
Dividing into the following components
* pub - publisher
* pub.sfu
* sub - subscriber
* transport
* transport.pion
* transport.cc
* api
* webhook
* update go modules
* send min/median connection score along with avg
* guard against divide by zero for avg score calculation
* update median calculation
Signed-off-by: shishir gowda <shishir@livekit.io>
Sometimes the initial selected node could fail. In that case, we'll give it a few more attempts to locate a media node for the session instead of failing it after the first try.
Added a new manager to handle all subscription needs. Implemented using reconciler pattern. The goals are:
improve subscription resilience by separating desired state and current state
reduce complexity of synchronous processing
better detect failures with the ability to trigger full reconnect
Related to livekit/protocol#273
This PR adds:
- ParticipantResumed - for when ICE restart or migration had occurred
- TrackPublishRequested - when we initiate a publication
- TrackSubscribeRequested - when we initiate a subscription
- TrackMuted - publisher muted track
- TrackUnmuted - publisher unmuted track
- TrackPublish/TrackSubcribe events will indicate when those actions have been successful, to differentiate.
* Fix handling of non-monotonic timestamps
Timed version is inspired by Hybrid Clock. We used to have a mixed behavior
by using time.Time:
* during local comparisons, it does increment monotonically
* when deserializing remote timestamps, we lose that attribute
So it's possible for two requests to be sent in the same microsecond, and
for the latter one to be dropped.
To fix that behavior, I'm switching it to keeping timestamps to consolidate
that behavior, and accepting multiple updates in the same ms by incrementing ticks.
Also using @paulwe's idea of a version generator.
* WIP commit
* WIP commit
* Remove debug
* Revert to reduce diff
* Fix tests
* Determine spatial layer from track info quality if non-simulcast
* Adjust for invalid layer on no rid, previously that function was returning 0 for no rid case
* Fall back to top level width/height if there are no layers
* Use duration from RTPDeltaInfo
* Use rtcscore-go to calculate audio/video score
Signed-off-by: shishir gowda <shishir@livekit.io>
* Get max expected layer and find max actual layer from stream
Signed-off-by: shishir gowda <shishir@livekit.io>
* Cleanup unused methods
Signed-off-by: shishir gowda <shishir@livekit.io>
* Cleanup code - address review comments
Signed-off-by: shishir gowda <shishir@livekit.io>
* get expected layer info instead of just quality
Signed-off-by: shishir gowda <shishir@livekit.io>
* Move SpatialLayerForQuality to utils/helpers
method is required in rtc,sfu and connectionstats pkg
Moved to utils/helpers.go to remove cyclic deps
Signed-off-by: shishir gowda <shishir@livekit.io>
* update tests
Signed-off-by: shishir gowda <shishir@livekit.io>
* Pick stream stats with max layer
Signed-off-by: shishir gowda <shishir@livekit.io>
* Update rtcscore-go pkg to make rtt/jitter optional
when passing 0, rtcscore-go was setting default values
Signed-off-by: shishir gowda <shishir@livekit.io>
* update score to rating
Signed-off-by: shishir gowda <shishir@livekit.io>
* Update rtcscore-go pkg to use simulcast layer info for score
Signed-off-by: shishir gowda <shishir@livekit.io>
* Update score ratings to reflect rtcscore range
Signed-off-by: shishir gowda <shishir@livekit.io>
* update test params for new rtcscore
Signed-off-by: shishir gowda <shishir@livekit.io>
* Delay sending scores to connections only till full data is available
first interval can have partial data leading to lower scores
Signed-off-by: shishir gowda <shishir@livekit.io>
* Check for inf values in quality params
Signed-off-by: shishir gowda <shishir@livekit.io>
* Clean up initial score calculation. Default to 5
Signed-off-by: shishir gowda <shishir@livekit.io>
Co-authored-by: David Zhao <dz@livekit.io>
* Do not post close callback in ops queue if not started.
Ops queue is started in `Bind()`. If `Close()` is called
when bind did not happen (because the underlying peer connection
closed before bind), the close callback does not run.
Check if ops queue is running before posting close callback
into the queue.
Not pretty, but covers this case. Need to think about it more.
* correct check
* Use delta stats throughout and avoid calculating deltas in telemetry
* Fix a few things after testing
* Remove debug
* Fix tests
* delete instead of setting to nil
* Point to the latest protocol
* Introducing OpsQueue
Creating a PR to get feedback on standardizing on this concept.
Can be used for callbacks.
Already a couple of places use this construct. Wondering if we
should standardize on this across the board.
Just changing one place to use the new struct. Another place
that I know of which uses this pattern is the telemetry package.
* atomic flag -> bool