1. When re-allocating for a track in DEFICIENT state, try to use
available headroom to accommodate change before trying to steal
bits from other tracks.
2. If the changing track gives back bits (because of muting or
moving to a lower layer subscription), use the returned bits
to try and boost deficient track(s).
* WIP commit
* WIP commit
* WIP commit
* Some clean up
- Removed a chatty debug log
- some spelling, punctuation correction in comments
- missed an `Abs` in check, add it.
* Close subscriptions promptly
Two things:
-----------
1. Because the desired is not changed, the notifiers are not notified
that the subscription is not observing any more. So, that holds
a refernce to the subscription manager.
Address the above by setting `setDesired` to false on all subscriptions
when subscription manager closes. That will remove observer from the
notifiers.
2. When subscription manager is closed, the down track close
is invoked which flows back (with onClose callback of downtrack) to
subscription manager "handleSubscribedTrackClose". That callback
handler sets the subscribed track to nil for that subscription.
A couple of scenarios here
a. Without the above change, desired could have been true and it would
have looked that the track needs to try subscription again because
`needsSubscribe == true` (desired == true && subscribedTrack == nil)
b. Even with the change above, there is a new condition of
`desired == false && subscribedTrack == nil` and there was no handler
for that condition in the reconciler.
Address this by adding a `needsCleanup` function and delete subscription
from the map. Note that the reconciler may not be running to execute
this action as subscription manager would have closed the `closeCh`, but
doing the code in the interest of proper clean up.
* clean up
* Delete down track from receiver in close always.
I think with the parallel close in goroutines, it so happens that
peer connection can get closed first and unbind the track.
The delete down track and RTCP reader close was inside if `bound` block.
So, they were not running leaving a dangling down track in the receiver.
* fix tests
* fix test
* Pacer interface to send packets
* notify outside lock
* use select
* use pass through pacer
* add error to OnSent
* Remove log which could get noisy
* Starting TWCC work (#1727)
* add packet time
* WIP commit
* WIP commit
* WIP commit
* minor comments
* Some measurements (#1736)
* WIP commit
* some notes
* WIP commit
* variable name change and do not post to closed channel
* unlock
* clean up
* comment
* Hooking up some more bits for TWCC (#1752)
* wake under lock
* Pacer in down stream path.
Splitting out only the pacer from a feature branch to
introduce the concept of pacer.
Currently, there should be no difference in functionality
as a pass through pacer is used.
Another implementation exists which is just put it in a queue and send
it from one goroutine.
A potential implementation to try would be data paced by bandwidth
estimate. That could include priority queues and such.
But, the main goal here is to introduce notion of pacer in the down
stream path and prepare for more congestion control possibilities down
the line.
* Don't need peak detector
* remove throttling of write IO errors
When migrating muted track, need to set potential codecs.
For audio, there may not be `simulcast_codecs` in `AddTrack`.
Hence when migrating a muted track, the potential codecs are not set.
That results in no receivers in relay up track (because all this
could happen before the audio track is unmuted).
So, look at MimeType in TrackInfo (this will be set in OnTrack) and
use that as potential codec.
* Do not process events after participant close.
Avoid processing transport events after participant/transport close.
It causes error logs which are not really errors, but distracting noise.
* correct comment
* Full reconnect on publication mismatch on resume.
It is possible that publications mismatch on resume. An example sequence
- Client sends `AddTrack` for `trackA`
- Server never receives it due to signalling connection breakage.
- Client could do a resume (reconnect=1) noticing signalling connection
breakage.
- Client's view thinks that `trackA` is known to server, but server does
not know about it.
- A subsequence offer containing `trackA` triggers `trackInfo not
available before track publish` and the track does not get published.
Detect the case of missing track and issue a full reconnect.
* UpdateSubscriptions from sync state a la cloud
* add missing shouldReconnect
* Close participant on full reconnect.
A full reconnect == irrecoverable error. Participant cannot continue.
So, close the participant when issuing a full reconnect.
That should prevent subscription manager reconcile till the participant
is finally closed down when participant is stale.
* format
A subscription in subscription manager could live till the source
track goes away even though the participant with that subscription
is long gone due to closure on source track removal. Handle it by using
trackID to look up on source track removal.
Also, logging SDPs when a negotiation failure happens to check
if there are any mismatches.
Till now only video was using simulated publish when migrating on mute.
But, with `pauseUpstream() + replaceTrack(null)`, it is possible that
client does not send any data when muted.
I do not think there is a problem to do this (even when cleint is
actually using mute which sends silence frames).
* Perform unsubscribe in parallel to avoid blocking
When unsubscribing from tracks, we flush a blank frame in order to prepare
the transceivers for re-use. This process is blocking for ~200ms. If
the unsubscribes are performed serially, it would prevent other subscribe
operation from continuing.
This PR parallelizes that operation, and ensures subsequent subscribe
operations could reuse the existing transceivers.
* also perform in parallel when uptrack close
* fix a few log fields
* Avoid reconnect loop for unsupported downtrack
If the client subscribes to a track which codec is unsupported by the
client, sfu will trigger negotiation failed and issue a full reconnect
after received client answer. If the client try to subscribe that track
then it will got full reconnect again. That will cause a infinite
reconnect loop until the client don't subscribe that track. This PR
will unsubscribe the error track for the client and send a
SubscriptionResponse that contain the reason to indicates the track's
codec is not supported to avoid the reconnect loop.
* Return max spatial layer from selectors.
With differing requirements of SVC and allowing overshoot in Simulcast,
selectors are best placed to indicate what is the max spatial layer when
they indicate a switch to max spatial layer.
* fix test
* prevent race
It is possible that publisher paces the media.
So, RTCP sender report from publisher could be ahead of
what is being fowarded by a good amount (have seen up to 2 seconds
ahead). Using the forwarded time stamp for RTCP sender report
in the down stream leads to jumps back and forth in the down track
RTCP sender report.
So, look at the publisher's RTCP sender report to check for it being
ahead and use the publisher rate as a guide.
Active TCP was added in pion/ice v2.3.4. This is causing a couple of issues for us.
Active TCP does not make sense for an SFU. Clients are expected to be behind NAT and we should not be dialing them. Instead, LiveKit exposes a TCP port so clients could dial in
Active TCP is causing all iOS clients to become disconnected immediately. This is impacting all version of libwebrtc-based iOS clients (tested from M104 to M111)
* RTCP sender reports every three seconds.
Ideally, we should be sending this based on data rate.
But, increasing frequency a little as a lost sender report
means the client may not have sender report for 10 seconds
and that could affect sync. We do receiver reports once a second.
Thought of setting this to that level too, but not making a big change
from existing rate.
Also, simplifying the RTCP send loop. Don't need to hold and
do the processing after collecting all reports.
* consistent use of GetSubscribedTracks
* Experimental flag to try time stamp adjustment to control drift.
There is a config to enable this.
Using a PID controller to try and keep the sample rate at expected
value. Need to be seen if this works well. Adjustment are limited
to 25 ms max at a time to ensure there are no large jumps.
And it is applied when doing RTCP sender report which happens
once in 5 seconds currently for both audio and video tracks.
A nice introduction to PID controllers - https://alphaville.github.io/qub/pid-101/#/
Implementation borrowed from - https://github.com/pms67/PID
A few things TODO
1. PID controller tuning is a process. Have picked values from test from
that implementation above. May not be the best. Need to try.
2. Can potentially run this more often. Rather than running it only when
running RTCP sender report (which is once in 5 seconds now), can
potentially run it every second and limit the amount of change to
something like 10 ms max.
* remove unused variable
* debug log a bit more
* Keep track of expected RTP time stamp and control drift.
- Use monotonic clock in RTCP Sender Report and packet times
- Keep the time stamp close to expected time stamp on layer/SSRC
switches
* clean up
* fix test compile
* more test compile failures
* anticipatory clean up
* further clean up
* add received sender report logging
With subscription manager, there is no need to tell a publisher
about a subscriber going away. Before subscription manager,
the up track manager of a participant (i. e. the publisher side)
was holding a list of pending subscriptions for its published tracks
and that had to be cleaned up if one of the subscriber goes away.
That is not the case any more.
Also set publisherID early so that subscription permission update has
the right publisherID. In fact, saw an empty ID in the logs and saw
that we still have the disallowed subscription handling which is not
necessary any more.
* hopefully more stable tests
* do eventual checks as some callbacks happen in go routines.
Needs a bit more work to ensure that some conditions do not happen.
But, with goroutines, the amount of wait is always tricky.``