1. When re-allocating for a track in DEFICIENT state, try to use
available headroom to accommodate change before trying to steal
bits from other tracks.
2. If the changing track gives back bits (because of muting or
moving to a lower layer subscription), use the returned bits
to try and boost deficient track(s).
* Plug a couple of holes in stream transitions.
1. Missed negative sign meant stealing bits from other tracks was not
working.
2. When a track change (mute, unmute, subscription change) cannot be
allocated, explicitly pause so that stream state update happens.
Refactor stream state update a bit to make it a bit cleaner.
* correct comment
* Add debug log for RTCP sender report.
Temporary to collect more data. Hitting scenarios under congestion
where the sender report gets off sync. Need some data to pore through
and understand and implement changes.
* Debugw
* Check for request layer lock only in the goroutine
* check before sending PLI
* max layer notifier worker
* test cleanup
* clean up
* do notification in the callback
* WIP commit
* WIP commit
* WIP commit
* Some clean up
- Removed a chatty debug log
- some spelling, punctuation correction in comments
- missed an `Abs` in check, add it.
* Mark active when switching to parked layer.
Parked layer lock is not a switch. It is just a restart at the same
layer.
* make explicit bool for switching
* Add ability to roll back video layer selection.
Not currently useful, but it is possible to do things like not
applying a layer switch if the switch point time stamp is too far back.
Add ability to roll back a layer switch and invoke rollback if
a packet was selected for forwarding, but a subsequent error or decision
to drop the packet can rollback layer switch if that was the switching
packet.
In current code, the paths where a packet can be dropped after selection
does not happen at switch points. So, it was okay to apply the selection
unconditionally. But, adding the call to rollback in the current code
also in all paths where packet is dropped after selection for consistent
code flow.
* separate switch for temporal layer
* Push track quality to poor on a bandwidth constrained pause.
* add tests
* scale distance by divisor
* fix test distance to desired
* wait longer for subscription manager to reconcile
* Prevenet anachronous sample reading.
Not so pretty way of solving this. Please let me know if you have
thoughts.
Passing in time allows testing easier. But, that also leads to
time reversal problems. Example scenario
1. Connection stats worker gets a time and initiates quality
calculation.
2. A layer transition is recorded after that.
3. By the time, scorer is called to calculate score with time from Step
1, there is time reversal and results in anachronous sample.
One option is to use a scorer lock in connection stats module and wrap
all calls to scorer in that lock, but that does not prevent the passed
in time stamps themselves getting out of order. Also, stand alond use
of scorer in some other context will be problematic.
Doing the hybrid thing of taking current time in scorer if passed in
time is zero so that scorer lock domain controls it.
* use zero time everywhere in normal flow
* make APIs with and without time passed in as Paul suggested
* Delete down track from receiver in close always.
I think with the parallel close in goroutines, it so happens that
peer connection can get closed first and unbind the track.
The delete down track and RTCP reader close was inside if `bound` block.
So, they were not running leaving a dangling down track in the receiver.
* fix tests
* fix test
* Make congestion controller probe config
* Wait for enough estimate samples
* fixes
* format
* limit number of times a packet is ACKed
* ramp up probe duration
* go format
* correct comment
* restore default
* add float64 type to generated CLI
* Pacer interface to send packets
* notify outside lock
* use select
* use pass through pacer
* add error to OnSent
* Remove log which could get noisy
* Starting TWCC work (#1727)
* add packet time
* WIP commit
* WIP commit
* WIP commit
* minor comments
* Some measurements (#1736)
* WIP commit
* some notes
* WIP commit
* variable name change and do not post to closed channel
* unlock
* clean up
* comment
* Hooking up some more bits for TWCC (#1752)
* wake under lock
* Pacer in down stream path.
Splitting out only the pacer from a feature branch to
introduce the concept of pacer.
Currently, there should be no difference in functionality
as a pass through pacer is used.
Another implementation exists which is just put it in a queue and send
it from one goroutine.
A potential implementation to try would be data paced by bandwidth
estimate. That could include priority queues and such.
But, the main goal here is to introduce notion of pacer in the down
stream path and prepare for more congestion control possibilities down
the line.
* Don't need peak detector
* remove throttling of write IO errors
- Increase max interval between probes to 2 minutes.
- Use a minimum probe rate of 200 kbps. This is to ensure that
the probe rate is decent and can produce a stronger signal.
* Don't update dependency info if unordered packet received
* Trace all active svc chains for downtrack
* Try to keep lower decode target decodable
* remove comments
* Test case
* clean code
* solve comments
* Simplify sliding window collapse.
Keep the same value collapsing simple.
Add it to sliding window as long as same value is received for longer
than collapse threshold.
But, add a prune with three conditions to process the siliding window
to ensure only valid samples are kept.
* flip the order of validity window and same value pruning
* increase collapse threshold to 0.5 seconds during non-probe
1. Probe end time needs to include the probe cluster running time also.
2. Apply collapse window only within the sliding window. This is to
prevent cases of some old data declaring congestion. For example,
an estimate could have fallen 15 seconds ago and there might have
been a bunch of estimates at that fallen value. And the whole
sliding window could have that value at some point. But, a further
drop may trigger congestion detection. But, that might be acting too
fast, i. e. on one instance of value fall. Change it so that we
detect if there is a fall within the sliding window and apply
collapse based on that.
If ref is coming in slow (due to pacing), it is possible that
expected is ahead. Pulling next too far towards expected causes
warps in a subsequent report. Keep switches closer to ref.
On a state change, it was possible an aborted probe was pending
finalize. When probe controller is reset, the probe channel
observer was not reset. Create a new non-probe channel observer
on state change to get a fresh start.
Also limit probe finalize wait to 10 seconds max. It is possible
that the estimate is very low and we have sent a bunch of probes.
Calculating wait based on that could lead to finalize waiting for
a long time (could be minutes).
* Simplify probe done handling.
Seeing a case where the channel abserver is not re-created after
an aborted probe. Simplifying probe done (no callbacks, making it
synchronous).
* log more
* Use receiver report stats for loss/rtt/jitter.
Reversing a bit of https://github.com/livekit/livekit/pull/1664.
That PR did two snapshots (one based on what SFU is sending
and one based on combination of what SFU is sending reconciled with
stats reported from client via RTCP Receiver Report). That PR
reported SFU only view to analytics. But, that view does not have
information about loss seen by client in the downstream.
Also, that does not have RTT/jitter information. The rationale behind
using SFU only view is that SFU should report what it sends irrespective
of client is receiving or not. But, that view did not have proper
loss/RTT/jitter.
So, switch back to reporting SFU + receiver report reconciled view.
The down side is that when receiver reports are not receiver,
packets sent/bytes sent will not be reported to analytics.
An option is to report SFU only view if there are no receiver reports.
But, it becomes complex because of the offset. Receiver report would
acknowledge certain range whereas SFU only view could be different
because of propagation delay. To simplify, just using the reconciled
view to report to analytics. Using the available view will require
a bunch more work to produce accurate data.
(NOTE: all this started due to a bug where RTCP was not restarted on
a track resume which killed receiver reports and we went on this path
to distinguish between publisher stopping vs RTCP receiver report not
happening)
One optimisation to here here concerns the check to see if publisher is sending data.
Using a full DeltaInfo for that is an overkill. Can do a lighter weight
for that later.
* return available streams
* fix test
* Perform unsubscribe in parallel to avoid blocking
When unsubscribing from tracks, we flush a blank frame in order to prepare
the transceivers for re-use. This process is blocking for ~200ms. If
the unsubscribes are performed serially, it would prevent other subscribe
operation from continuing.
This PR parallelizes that operation, and ensures subsequent subscribe
operations could reuse the existing transceivers.
* also perform in parallel when uptrack close
* fix a few log fields