* Add control of playout delay
Add config to enable playout delay. The delay will be limited by
[min,max] in the config option and calculated by upstream & downstream
RTT.
* check protocol version to enable playout delay
* Move config to room, limit playout-delay update interval, solve comments
* Remove adaptive playout-delay
* Remove unused config
* Ability to use trailer with server injected frames
A 32-byte trailer generated per room.
Trailer appended when track encryption is enabled.
* E2EE trailer for server injected packets.
- Generate a 32-byte per room trailer. Too reasons for longer length
o Laziness: utils generates a 32 byte string.
o Longer length random string reduces chances of colliding with real data.
- Trailer sent in JoinResponse
- Trailer added to server injected frames (not to padding only packets)
* generate
* add a length check
* pass trailer in as an argument
* Pacer interface to send packets
* notify outside lock
* use select
* use pass through pacer
* add error to OnSent
* Remove log which could get noisy
* Starting TWCC work (#1727)
* add packet time
* WIP commit
* WIP commit
* WIP commit
* minor comments
* Some measurements (#1736)
* WIP commit
* some notes
* WIP commit
* variable name change and do not post to closed channel
* unlock
* clean up
* comment
* Hooking up some more bits for TWCC (#1752)
* wake under lock
* Pacer in down stream path.
Splitting out only the pacer from a feature branch to
introduce the concept of pacer.
Currently, there should be no difference in functionality
as a pass through pacer is used.
Another implementation exists which is just put it in a queue and send
it from one goroutine.
A potential implementation to try would be data paced by bandwidth
estimate. That could include priority queues and such.
But, the main goal here is to introduce notion of pacer in the down
stream path and prepare for more congestion control possibilities down
the line.
* Don't need peak detector
* remove throttling of write IO errors
* Close participant on full reconnect.
A full reconnect == irrecoverable error. Participant cannot continue.
So, close the participant when issuing a full reconnect.
That should prevent subscription manager reconcile till the participant
is finally closed down when participant is stale.
* format
* Avoid reconnect loop for unsupported downtrack
If the client subscribes to a track which codec is unsupported by the
client, sfu will trigger negotiation failed and issue a full reconnect
after received client answer. If the client try to subscribe that track
then it will got full reconnect again. That will cause a infinite
reconnect loop until the client don't subscribe that track. This PR
will unsubscribe the error track for the client and send a
SubscriptionResponse that contain the reason to indicates the track's
codec is not supported to avoid the reconnect loop.
* Experimental flag to try time stamp adjustment to control drift.
There is a config to enable this.
Using a PID controller to try and keep the sample rate at expected
value. Need to be seen if this works well. Adjustment are limited
to 25 ms max at a time to ensure there are no large jumps.
And it is applied when doing RTCP sender report which happens
once in 5 seconds currently for both audio and video tracks.
A nice introduction to PID controllers - https://alphaville.github.io/qub/pid-101/#/
Implementation borrowed from - https://github.com/pms67/PID
A few things TODO
1. PID controller tuning is a process. Have picked values from test from
that implementation above. May not be the best. Need to try.
2. Can potentially run this more often. Rather than running it only when
running RTCP sender report (which is once in 5 seconds now), can
potentially run it every second and limit the amount of change to
something like 10 ms max.
* remove unused variable
* debug log a bit more
With subscription manager, there is no need to tell a publisher
about a subscriber going away. Before subscription manager,
the up track manager of a participant (i. e. the publisher side)
was holding a list of pending subscriptions for its published tracks
and that had to be cleaned up if one of the subscriber goes away.
That is not the case any more.
Also set publisherID early so that subscription permission update has
the right publisherID. In fact, saw an empty ID in the logs and saw
that we still have the disallowed subscription handling which is not
necessary any more.
* Support simualting subscriber bandwidth.
When non-zero, a full allocation is triggered.
Also, probes are stopped.
When set to zero, normal probing mechanism should catch up.
Adding `allowPause` override which can be a connection option.
* fix log
* allowPause in participant params
* Make connection quality not too optimistic.
With score normalization, the quality indicator showed good
under conditions which should have normally showed some badness.
So, a few things in this PR
- Do not normalize scores
- Pick the weakest link as the representative score (moving away from
averaging)
- For down track direction, when reporting delta stats, take the number
of packets sent actually. If there are holes in the feed (upstream
packet loss), down tracks should not be penalised for that loss.
State of things in connection quality feature
- Audio uses rtcscore-go (with a change to accommodate RED codec). This
follows the E-model.
- Camera uses rtcscore-go. No change here. NOTE: THe rtscore here is
purely based on bits per pixel per frame (bpf). This has the following
existing issues (no change, these were already there)
o Does not take packet loss, jitter, rtt into account
o Expected frame rate is not available. So, measured frame rate is
used as expected frame rate also. If expected frame rate were available,
the score could be reduced for lower frame rates.
- Screen share tracks: No change. This uses the very old simple loss
based thresholding for scoring. As the bit rate varies a lot based on
content and rtcscore video algorithm used for camera relies on
bits per pixel per frame, this could produce a very low value
(large width/height encoded in a small number of bits because of static content)
and hence a low score. So, the old loss based thresholding is used.
* clean up
* update rtcscore pointer
* fix tests
* log lines reformat
* WIP commit
* WIP commit
* update mute of receiver
* WIP commit
* WIP commit
* start adding tests
* take min score if quality matches
* start adding bytes based scoring
* clean up
* more clean up
* Use Fuse
* log quality drop
* clean up debug log
* - Use number of windows for wait to make things simpler
- track no layer expected case
- always update transition
- always call updateScore
When we unsubscribe from a speaker, SendSpeakerUpdates will drop updates
from that speaker. This has the side effect of dropping the "clearing"
message that we are sending as well.
Due to the order of events in MediaTrackReceiver and friends, SubscribedTrack
will be closed before the track is removed from RoomTrackManager.
Because of this, when a track is unpublished, it's possible to be subscribed
to the track as it's closing.
By introducing a closing state, we'd prevent accidental subscription to
closing tracks.
Not a good design. There is not an easy way to filter messages
before it hits media node. Without that, there is not a lot
of advantage.
And there are sequences that are not handled correctly in this
deleted implementation.
So, deleting code to prevent use.
UpdateSubscription had a shortcoming where when it couldn't find the
participant, it ignored the request.
This PR further removes the reliance of current publisher state from
subscribers.
- SubscribeToTrack only takes in a trackID
- Introduced RoomTrackManager to maintain all published tracks to a room
- Added TrackUnpublished event to clearly indicate when a track has been removed
- SubscribeRequested event no longer include information about the publisher
* Add Timer to detect dtls failure quickly
* Fix pc state check in timeout after ice
* More strict conditions to switch candidate type
* log for signal interuppt
* typo
Added a new manager to handle all subscription needs. Implemented using reconciler pattern. The goals are:
improve subscription resilience by separating desired state and current state
reduce complexity of synchronous processing
better detect failures with the ability to trigger full reconnect
* Initial commit of signal deduper.
Idea is protect against signal storm from misbehaving clients.
Design:
- SignalDeduper interface with one method to handle a SignalRequest and
return if dupe or not.
- Signal specific deduper. Could have made a single de-duper which could
handle all signal message types, but making it per type so that the
code is cleaner.
- Some module (like the router) can instantiate whatever signal types
it wants to de-dupe. When a signal message is received, that module
can run the signal message through the list of de-dupers and
potentially drop the message if any of the de-dupers declare that the
message is a dupe. Making it a list makes things a little bit
inefficient, but keeps things cleaner. Hopefully, not many de-dupers
will be needed so that the inefficiency is not pronounced.
* re-arrange comments
* helper function
* add ParticipantClosed
* Add option to issue full reconnect on a publication error.
Leaving the publication error timeout at 30 seconds as there
are some publications taking long. Also, there are cases
where the peer connection fails after 30 seconds. The peer
connection failure happens after publication error is detected.
But, 30 seconds is a good amount of time for publication to establish.
* prevent recursive lock
* Fix rtcp lost for downtrack used incorrect buffer factory
In buffer factory change(#1173), every pariticipant has its own
buffer factory, can't use publisher's bufferfactory to create
DownTrack
* clean code
There were some failures with missing media. The only thing I could
see between working and non-working case is when media forwarding
starts. So, delay media forwarding till peer connection is connected.
Also, add a subscribe op only if a subscribe/unsubscribe queuing is
successful. There was a recent change to not queue a subscribe when
the participant is closed/disconnected. This got the subscribe op
counter out of whack.