Commit Graph

472 Commits

Author SHA1 Message Date
Raja Subramanian 84994b39ab Make the samples string more readable. (#1810) 2023-06-21 11:35:38 +05:30
Raja Subramanian 27051e9999 It is possible that pipe is closed before blank frame send, do not warn (#1807) 2023-06-20 11:58:01 +05:30
Raja Subramanian 2383234f6e Simplify sliding window collapse. (#1802)
* Simplify sliding window collapse.

Keep the same value collapsing simple.
Add it to sliding window as long as same value is received for longer
than collapse threshold.
But, add a prune with three conditions to process the siliding window
to ensure only valid samples are kept.

* flip the order of validity window and same value pruning

* increase collapse threshold to 0.5 seconds during non-probe
2023-06-17 18:56:38 +05:30
Raja Subramanian 395f403132 Small stream allocator tweaks. (#1800)
1. Probe end time needs to include the probe cluster running time also.
2. Apply collapse window only within the sliding window. This is to
   prevent cases of some old data declaring congestion. For example,
   an estimate could have fallen 15 seconds ago and there might have
   been a bunch of estimates at that fallen value. And the whole
   sliding window could have that value at some point. But, a further
   drop may trigger congestion detection. But, that might be acting too
   fast, i. e. on one instance of value fall. Change it so that we
   detect if there is a fall within the sliding window and apply
   collapse based on that.
2023-06-17 12:35:29 +05:30
Raja Subramanian 908b7a9bb1 Promote some migration logs to Infow (#1798) 2023-06-16 19:00:17 +05:30
Raja Subramanian 6946d0a3a1 Do not mute forwarder when paused to bandwidth congestion. (#1796)
* Do not mute forwarder when paused to bandwidth congestion.

Detailed notes in code.

* remove word
2023-06-16 12:08:01 +05:30
Raja Subramanian afa7733748 Promote switch logs to Infow. (#1790) 2023-06-12 17:30:56 +05:30
Raja Subramanian 9809b8bc3a Use nack queue params. (#1789)
* Use nack queue params.

* fix test
2023-06-12 13:01:02 +05:30
cnderrauber c91889edfd Add dependency descriptor stream tracker for svc codecs (#1788)
* Add dependency descriptor stream tracker for svc codecs

* Solve comments
2023-06-12 15:07:47 +08:00
Raja Subramanian 3d696ac39f Keep next timestamp on switch closer to ref. (#1784)
If ref is coming in slow (due to pacing), it is possible that
expected is ahead. Pulling next too far towards expected causes
warps in a subsequent report. Keep switches closer to ref.
2023-06-10 11:38:46 +05:30
Raja Subramanian 4805dec1f0 Create channel observer on probe reset. (#1783)
On a state change, it was possible an aborted probe was pending
finalize. When probe controller is reset, the probe channel
observer was not reset. Create a new non-probe channel observer
on state change to get a fresh start.

Also limit probe finalize wait to 10 seconds max. It is possible
that the estimate is very low and we have sent a bunch of probes.
Calculating wait based on that could lead to finalize waiting for
a long time (could be minutes).
2023-06-10 10:54:55 +05:30
Raja Subramanian 0e7bdeabcb Simplify probe done handling. (#1782)
* Simplify probe done handling.

Seeing a case where the channel abserver is not re-created after
an aborted probe. Simplifying probe done (no callbacks, making it
synchronous).

* log more
2023-06-10 02:07:28 +05:30
Raja Subramanian 72ed5b19f7 Use receiver report stats for loss/rtt/jitter. (#1781)
* Use receiver report stats for loss/rtt/jitter.

Reversing a bit of https://github.com/livekit/livekit/pull/1664.
That PR did two snapshots (one based on what SFU is sending
and one based on combination of what SFU is sending reconciled with
stats reported from client via RTCP Receiver Report). That PR
reported SFU only view to analytics. But, that view does not have
information about loss seen by client in the downstream.
Also, that does not have RTT/jitter information. The rationale behind
using SFU only view is that SFU should report what it sends irrespective
of client is receiving or not. But, that view did not have proper
loss/RTT/jitter.

So, switch back to reporting SFU + receiver report reconciled view.
The down side is that when receiver reports are not receiver,
packets sent/bytes sent will not be reported to analytics.

An option is to report SFU only view if there are no receiver reports.
But, it becomes complex because of the offset. Receiver report would
acknowledge certain range whereas SFU only view could be different
because of propagation delay. To simplify, just using the reconciled
view to report to analytics. Using the available view will require
a bunch more work to produce accurate data.
(NOTE: all this started due to a bug where RTCP was not restarted on
a track resume which killed receiver reports and we went on this path
to distinguish between publisher stopping vs RTCP receiver report not
happening)

One optimisation to here here concerns the check to see if publisher is sending data.
Using a full DeltaInfo for that is an overkill. Can do a lighter weight
for that later.

* return available streams

* fix test
2023-06-09 23:31:25 +05:30
Raja Subramanian f518f5d743 Log head SN when packet cannot be fetched (#1780) 2023-06-09 12:13:06 +05:30
Raja Subramanian 22813cd2be Recreate channel observer irrespective of probe success/fail. (#1778) 2023-06-08 01:40:07 +05:30
Raja Subramanian b591140d66 Ignore receiver report till initialized (#1773) 2023-06-06 21:43:49 +05:30
Raja Subramanian 7ed3af193a No proof that this helps (#1772) 2023-06-06 11:28:13 +05:30
Raja Subramanian 076d8cad73 Promote switch log to Infow (#1771) 2023-06-06 11:20:57 +05:30
Raja Subramanian f5c5d4e079 Wait for a more stable measurement of sample rate. (#1764) 2023-06-03 14:26:26 +05:30
Raja Subramanian c2ae34151c Enable some debug logs to debug freeze (#1761)
* Enable some debug logs to debug freeze

* log receiver sender report also
2023-06-02 16:31:19 +05:30
David Zhao b5c8fe5294 Perform unsubscribe in parallel to avoid blocking (#1760)
* Perform unsubscribe in parallel to avoid blocking

When unsubscribing from tracks, we flush a blank frame in order to prepare
the transceivers for re-use. This process is blocking for ~200ms. If
the unsubscribes are performed serially, it would prevent other subscribe
operation from continuing.

This PR parallelizes that operation, and ensures subsequent subscribe
operations could reuse the existing transceivers.

* also perform in parallel when uptrack close

* fix a few log fields
2023-06-02 00:13:18 -07:00
cnderrauber c1842cb54f Avoid reconnect loop for unsupported downtrack (#1754)
* Avoid reconnect loop for unsupported downtrack

If the client subscribes to a track which codec is unsupported by the
client, sfu will trigger negotiation failed and issue a full reconnect
after received client answer. If the client try to subscribe that track
then it will got full reconnect again. That will cause a infinite
reconnect loop until the client don't subscribe that track. This PR
will unsubscribe the error track for the client and send a
SubscriptionResponse that contain the reason to indicates the track's
codec is not supported to avoid the reconnect loop.
2023-05-31 11:41:22 +08:00
Raja Subramanian 13d599d2d9 Comment out noisy log. (#1757) 2023-05-31 06:35:25 +05:30
Raja Subramanian fdfd830394 Split probe controller from StreamAllocator. (#1751)
* Split probe controller from StreamAllocator.

With TWCC, there is a need to check for probe status
in a separate goroutine. So, probe specific stuff need
locking. Split out the probe controller to make that cleaner.

* remove defer
2023-05-29 14:41:44 +05:30
Raja Subramanian ea57e4f2c1 Ignore receiver reports that have a sequence number before first packet. (#1745) 2023-05-28 10:05:35 +05:30
Raja Subramanian 9dd2ebc960 Change too many packets log to error to get back trace. (#1744) 2023-05-27 12:19:30 +05:30
Raja Subramanian 1c920812d3 Return max spatial layer from selectors. (#1743)
* Return max spatial layer from selectors.

With differing requirements of SVC and allowing overshoot in Simulcast,
selectors are best placed to indicate what is the max spatial layer when
they indicate a switch to max spatial layer.

* fix test

* prevent race
2023-05-26 12:49:31 +05:30
cnderrauber fc8375f150 Fix dynacast for svc codec (#1742) 2023-05-26 14:34:35 +08:00
Raja Subramanian 0354626bfc Adjust sender report time stamp for slow publishers. (#1740)
It is possible that publisher paces the media.
So, RTCP sender report from publisher could be ahead of
what is being fowarded by a good amount (have seen up to 2 seconds
ahead). Using the forwarded time stamp for RTCP sender report
in the down stream leads to jumps back and forth in the down track
RTCP sender report.

So, look at the publisher's RTCP sender report to check for it being
ahead and use the publisher rate as a guide.
2023-05-25 21:55:54 +05:30
Raja Subramanian 11c5737e04 Filter another expected error. (#1738)
Actually, was not filtering the not last sender report error before.
Previous PR did that. This PR restores the old no last sender report
filter. Both are filterable errors.
2023-05-24 12:41:47 +05:30
Raja Subramanian 07252b7ce3 Filter not last SR error (#1737) 2023-05-24 12:32:12 +05:30
Raja Subramanian d9e682a0d2 Fix unwrap (#1729)
* Fix unwrap

An out-or-order packet wrapping back after a wrap around had already happened
 was not using proper cycle ounter to calculate unerapped value.

* update mediatransportutil
2023-05-22 18:46:56 +05:30
Raja Subramanian 0bb89575eb Fix min TS before first sender report (#1724) 2023-05-19 12:43:19 +05:30
Raja Subramanian 1d3faefc5e More scoring tweaks (#1719)
1. Completely removing RTT and jitter from score calculation.
   Need to do more work there.
   a. Jitter is slow moving (RFC 3550 formula is designed that way).
      But, we still get high values at times. Ideally, that should
      penalise the score, but due to jitter buffer, effect may not be
      too bad.
   b. Need to smooth RTT. It is based on receiver report and if one
      sample causes a high number, score could be penalised
      (this was being used in down track direction only). One option
      is to smooth it like the jitter formula above and try using it.
      But, for now, disabling that also.

2. When receiving lesser number of packets (for example DTX), reduce the
   weight of packet loss with a quadratic relationship to packet loss
   ratio. Previously using a square root and it was potentially
   weighting it too high. For example, if only 5 packets were received
   due to DTX instead of 50, we were still giving 30% weight
   (sqrt(0.1)). Now, it gets 1% weight. So, if one of those 5 packets
   were lost (20% packet loss ratio), it still does not get much weight
   as the number of packets is low.,

3. Slightly slower decrease in score (in EWMA)

4. When using RED, increase packet loss weight thresholds to be able to
   take more loss before penalizing score.
2023-05-18 20:16:43 +05:30
Raja Subramanian 9395f0b1fb More time stamp dance. (#1712)
Two things
- Somehow the publisher RTCP sender report time stamp goes back some
  times. Log it differently. Also, use signed type for logging so
  that negative is easy to see.
- On down track, because of silence frame injection on mute, the RTCP
  sender report time stamp might be ahead of timestamp we will use
  on unmute. If so, ensure that next timestamp is also not before
  what was sent in RTCP sender report.
2023-05-16 21:48:10 +05:30
Raja Subramanian 61102533ae Monitor and log RTP time stmap drifts (#1710)
The PID controller seems to be working well. But, it is unclear where
it can be applied as some of the data shows significant jumps
(either caused by BT devices or possibly noise cancellation/cpu
constraint) and although PID controller is slowly pulling things
to expected sample rate, it could be a bit slow.
Unfortunately, cannot munge too much in a middle box.
However leaving the controller in there as it is doing its job
for cases where things slip slowly.

Changing things to log significant jumps (more than 200 ms away
from expected) at Infow level.

Also, recording drift and sample rate in RTP stats proto and string
representation.
2023-05-13 18:41:09 +05:30
Raja Subramanian b61fad339f Handle time stamp increment across mute. (#1705)
* Handle time stamp increment across mute.

Two cases handled
1. Starting on mute could inject blank frame/padding packets.
These time stamps are randomly generated. So, when the publisher
unmutes, the time stamp was jumping ahead by only 1. Make it so
that they jump ahead by elapsed time since starting the blank frames/
padding packets.
2. When generating blank frames at the end of a down track, if
the track was muted at that time, the blank frame time stamps
could have been off (i. e. would have been pointing to time
after the last forwarded frame). Here also use current time
to adjust time stamp. Maybe, this could help in some cases where
we are seeing unflushed video buffer?

* remove unnecessary check

* address feedback and also maintain first synthesized time stamp
2023-05-10 18:31:49 +05:30
Raja Subramanian 4419cd56b8 Switch to rate since first time. (#1704)
With short term measurements, the adjustment itself was causing
some oscillations and drift tend to settle at some small value
and oscillated around it due to push/pull affecting small window
measurement.
2023-05-10 11:01:51 +05:30
Raja Subramanian 678cd06241 Infow -> Debugw (#1703) 2023-05-10 10:26:36 +05:30
Raja Subramanian f543e3f8d0 Send left over RTCP packets. (#1699) 2023-05-09 18:46:30 +05:30
Raja Subramanian cf2a078579 Apply time stamp adjustment only at the start of a frame. (#1698)
It was possible that the adjustment applied in the middle
of a frame resulting in the same frame having multiple time stamps.
That would have caused video to pause/jump.

Apply the offset only at the start of the frame so that all
packets of a frame get the same offset.
2023-05-09 12:39:11 +05:30
Raja Subramanian 0e582ec82a fix the negative sign scope (#1696) 2023-05-09 00:13:01 +05:30
Raja Subramanian 153f02091c Use measurement in window instead of since start. (#1695)
This captues chnages within a measurement window.
2023-05-08 19:51:23 +05:30
Raja Subramanian ddcb8342ef Fix Dervivative equation wrong brackets (#1693) 2023-05-07 18:36:26 +05:30
Raja Subramanian 3fb93135f5 Experimental flag to try time stamp adjustment to control drift. (#1687)
* Experimental flag to try time stamp adjustment to control drift.

There is a config to enable this.

Using a PID controller to try and keep the sample rate at expected
value. Need to be seen if this works well. Adjustment are limited
to 25 ms max at a time to ensure there are no large jumps.
And it is applied when doing RTCP sender report which happens
once in 5 seconds currently for both audio and video tracks.

A nice introduction to PID controllers - https://alphaville.github.io/qub/pid-101/#/
Implementation borrowed from - https://github.com/pms67/PID

A few things TODO
1. PID controller tuning is a process. Have picked values from test from
   that implementation above. May not be the best. Need to try.
2. Can potentially run this more often. Rather than running it only when
   running RTCP sender report (which is once in 5 seconds now), can
   potentially run it every second and limit the amount of change to
   something like 10 ms max.

* remove unused variable

* debug log a bit more
2023-05-06 11:52:57 +05:30
Raja Subramanian 25d6fd751f Cleaning up smoothed OWD calculation for sender report. (#1684)
* Keep track of expected RTP time stamp and control drift.

- Use monotonic clock in RTCP Sender Report and packet times
- Keep the time stamp close to expected time stamp on layer/SSRC
  switches

* clean up

* fix test compile

* more test compile failures

* anticipatory clean up

* further clean up

* add received sender report logging
2023-05-05 13:14:12 +05:30
Raja Subramanian 28a8a808f2 Do not add empty video layers in stats. (#1685) 2023-05-05 08:59:08 +05:30
Raja Subramanian 15078eb9f4 Keep track of expected RTP time stamp and control drift. (#1681)
* Keep track of expected RTP time stamp and control drift.

- Use monotonic clock in RTCP Sender Report and packet times
- Keep the time stamp close to expected time stamp on layer/SSRC
  switches

* clean up

* fix test compile

* more test compile failures
2023-05-04 13:00:57 +05:30
Raja Subramanian 00217c7af1 Logging delta of receiver report (#1676) 2023-05-02 22:43:37 +05:30
Raja Subramanian 3070e976c3 Log received sender report of audio for debugging (#1673)
* Log received sender report of audio for debugging

* log OWD also

* add some more bits
2023-05-02 00:22:33 +05:30