It is possible that migration could trigger without migrating out node
knowing about it. So, when a migration started notification comes in,
set up migration timer if not already set.
* Support XR request/response for rtt calculation
* Update pkg/sfu/downtrack.go
Co-authored-by: David Zhao <dz@livekit.io>
---------
Co-authored-by: David Zhao <dz@livekit.io>
* Do not synthesise DISCONNECT on session change.
v12 clients can handle session change based on identity.
* change for testf
* Squelch participant update if close reason is DUPLICATE_IDENTITY.
* fix test
* comment
* Clean up participant close reason a bit
* fix test
* test
* Use a participant worker queue in room.
Removes selectively needing to call things in goroutine from
participant.
Also, a bit of drive-by clean up.
* spelling
* prevent race
* don't need to remove in goroutine as it is already running in the worker
* worker will get cleaned up in state change callback
* create participant worker only if not created already
* ref count participant worker
* maintain participant list
* clean up oldState
* Augment LeaveRequest with alternate regions to connect.
* update protocol and issue resume action on close if expected to resume
* use current protocol in tests
* address feedback
* Add a simulation scenario to disconnect signal channel on resume
- Requesting that scenario add that participant to a map with a timeout
of 5 seconds.
- If a resume (reconnect = 1) happens before the timeout, the signalling
channel is closed immediately on resume.
- There is a clean up worker which will remove entries from the map when
they timout.
- The participant is also removed from the map if the disconnect on
resume is invoked once.
* simulate disconnect signal on resume no messages
* comment
* comment
* Close all retries
* update deps
* abort resume only if simulation applied
* Revert SIP change
PacketsLost may not provide useful if repairs are discounting the loss.
So, out-of-order packets are an indication of loss and maybe subsequent
repair. Note that out-of-order could be just out-of-order by a short
amount of time, but a lot of that happening is not good either.
So, out-of-order could provide a decent view of link quality.
* Consolidate TrackInfo.
TrackInfo was spread across a bit. Consolidating it.
* TODO comments
* test
* update TrackInfo on SSRC change
* further consolidation
* log mimes only
* update receivers on SSRC set
* clone proto on return
* feedback: break loop on mime match
* prevent data race
* Participant traffic load.
Capturing information about participant traffic
- Upstream/Downstream
- Audio/Video/Data
- Packets/Bytes
This captures a notion of how much traffic load a participant is
generating.
Can be used to make allocation decisions.
* Clean up
* SIP patches
* reporter goroutine
* unlock
* move traffic stats from protocol
* check type
* Add option to issue full reconnect on data channel error.
There are situations where send data packet fails because of "stream
closed". It is unclear when that happens. Seems to be after an
ICERestart after ICE failed and connection type switching to TURN
from ICE.
Once the failure happens, it is not recoverable. Potentially, it is
recoverable, but unclear where the problem lies. Attempts to reproduce
looking at the pattern of failures has been unsuccesful.
In the mean time, adding an option to issue full reconnect
when send data packet fails.
* typo
* Add control of playout delay
Add config to enable playout delay. The delay will be limited by
[min,max] in the config option and calculated by upstream & downstream
RTT.
* check protocol version to enable playout delay
* Move config to room, limit playout-delay update interval, solve comments
* Remove adaptive playout-delay
* Remove unused config
* Ability to use trailer with server injected frames
A 32-byte trailer generated per room.
Trailer appended when track encryption is enabled.
* E2EE trailer for server injected packets.
- Generate a 32-byte per room trailer. Too reasons for longer length
o Laziness: utils generates a 32 byte string.
o Longer length random string reduces chances of colliding with real data.
- Trailer sent in JoinResponse
- Trailer added to server injected frames (not to padding only packets)
* generate
* add a length check
* pass trailer in as an argument
* Pacer interface to send packets
* notify outside lock
* use select
* use pass through pacer
* add error to OnSent
* Remove log which could get noisy
* Starting TWCC work (#1727)
* add packet time
* WIP commit
* WIP commit
* WIP commit
* minor comments
* Some measurements (#1736)
* WIP commit
* some notes
* WIP commit
* variable name change and do not post to closed channel
* unlock
* clean up
* comment
* Hooking up some more bits for TWCC (#1752)
* wake under lock
* Pacer in down stream path.
Splitting out only the pacer from a feature branch to
introduce the concept of pacer.
Currently, there should be no difference in functionality
as a pass through pacer is used.
Another implementation exists which is just put it in a queue and send
it from one goroutine.
A potential implementation to try would be data paced by bandwidth
estimate. That could include priority queues and such.
But, the main goal here is to introduce notion of pacer in the down
stream path and prepare for more congestion control possibilities down
the line.
* Don't need peak detector
* remove throttling of write IO errors
* Close participant on full reconnect.
A full reconnect == irrecoverable error. Participant cannot continue.
So, close the participant when issuing a full reconnect.
That should prevent subscription manager reconcile till the participant
is finally closed down when participant is stale.
* format
* Avoid reconnect loop for unsupported downtrack
If the client subscribes to a track which codec is unsupported by the
client, sfu will trigger negotiation failed and issue a full reconnect
after received client answer. If the client try to subscribe that track
then it will got full reconnect again. That will cause a infinite
reconnect loop until the client don't subscribe that track. This PR
will unsubscribe the error track for the client and send a
SubscriptionResponse that contain the reason to indicates the track's
codec is not supported to avoid the reconnect loop.
* Experimental flag to try time stamp adjustment to control drift.
There is a config to enable this.
Using a PID controller to try and keep the sample rate at expected
value. Need to be seen if this works well. Adjustment are limited
to 25 ms max at a time to ensure there are no large jumps.
And it is applied when doing RTCP sender report which happens
once in 5 seconds currently for both audio and video tracks.
A nice introduction to PID controllers - https://alphaville.github.io/qub/pid-101/#/
Implementation borrowed from - https://github.com/pms67/PID
A few things TODO
1. PID controller tuning is a process. Have picked values from test from
that implementation above. May not be the best. Need to try.
2. Can potentially run this more often. Rather than running it only when
running RTCP sender report (which is once in 5 seconds now), can
potentially run it every second and limit the amount of change to
something like 10 ms max.
* remove unused variable
* debug log a bit more
With subscription manager, there is no need to tell a publisher
about a subscriber going away. Before subscription manager,
the up track manager of a participant (i. e. the publisher side)
was holding a list of pending subscriptions for its published tracks
and that had to be cleaned up if one of the subscriber goes away.
That is not the case any more.
Also set publisherID early so that subscription permission update has
the right publisherID. In fact, saw an empty ID in the logs and saw
that we still have the disallowed subscription handling which is not
necessary any more.
* Support simualting subscriber bandwidth.
When non-zero, a full allocation is triggered.
Also, probes are stopped.
When set to zero, normal probing mechanism should catch up.
Adding `allowPause` override which can be a connection option.
* fix log
* allowPause in participant params