* Fix unwrap
An out-or-order packet wrapping back after a wrap around had already happened
was not using proper cycle ounter to calculate unerapped value.
* update mediatransportutil
Active TCP was added in pion/ice v2.3.4. This is causing a couple of issues for us.
Active TCP does not make sense for an SFU. Clients are expected to be behind NAT and we should not be dialing them. Instead, LiveKit exposes a TCP port so clients could dial in
Active TCP is causing all iOS clients to become disconnected immediately. This is impacting all version of libwebrtc-based iOS clients (tested from M104 to M111)
* send min/median connection score along with avg
* guard against divide by zero for avg score calculation
* update median calculation
Signed-off-by: shishir gowda <shishir@livekit.io>
1. Completely removing RTT and jitter from score calculation.
Need to do more work there.
a. Jitter is slow moving (RFC 3550 formula is designed that way).
But, we still get high values at times. Ideally, that should
penalise the score, but due to jitter buffer, effect may not be
too bad.
b. Need to smooth RTT. It is based on receiver report and if one
sample causes a high number, score could be penalised
(this was being used in down track direction only). One option
is to smooth it like the jitter formula above and try using it.
But, for now, disabling that also.
2. When receiving lesser number of packets (for example DTX), reduce the
weight of packet loss with a quadratic relationship to packet loss
ratio. Previously using a square root and it was potentially
weighting it too high. For example, if only 5 packets were received
due to DTX instead of 50, we were still giving 30% weight
(sqrt(0.1)). Now, it gets 1% weight. So, if one of those 5 packets
were lost (20% packet loss ratio), it still does not get much weight
as the number of packets is low.,
3. Slightly slower decrease in score (in EWMA)
4. When using RED, increase packet loss weight thresholds to be able to
take more loss before penalizing score.
Two things
- Somehow the publisher RTCP sender report time stamp goes back some
times. Log it differently. Also, use signed type for logging so
that negative is easy to see.
- On down track, because of silence frame injection on mute, the RTCP
sender report time stamp might be ahead of timestamp we will use
on unmute. If so, ensure that next timestamp is also not before
what was sent in RTCP sender report.
* Make signal close async.
Left notes about async close in code.
Also reducing retry config timeout
- Timeout to 7.5 seconds (making it 1/4th of current config)
- max retry to 4 seconds
- so, it can do 4 tries now in 7.5 seconds (with retries ending at 0.5
seconds, 1.5 seconds, 3.5 seconds, 7.5 seconds). The change of max to
4 seconds is not really needed, but it lined up with 7.5. So, made the
change.
* update comments a bit
The PID controller seems to be working well. But, it is unclear where
it can be applied as some of the data shows significant jumps
(either caused by BT devices or possibly noise cancellation/cpu
constraint) and although PID controller is slowly pulling things
to expected sample rate, it could be a bit slow.
Unfortunately, cannot munge too much in a middle box.
However leaving the controller in there as it is doing its job
for cases where things slip slowly.
Changing things to log significant jumps (more than 200 ms away
from expected) at Infow level.
Also, recording drift and sample rate in RTP stats proto and string
representation.
* Handle time stamp increment across mute.
Two cases handled
1. Starting on mute could inject blank frame/padding packets.
These time stamps are randomly generated. So, when the publisher
unmutes, the time stamp was jumping ahead by only 1. Make it so
that they jump ahead by elapsed time since starting the blank frames/
padding packets.
2. When generating blank frames at the end of a down track, if
the track was muted at that time, the blank frame time stamps
could have been off (i. e. would have been pointing to time
after the last forwarded frame). Here also use current time
to adjust time stamp. Maybe, this could help in some cases where
we are seeing unflushed video buffer?
* remove unnecessary check
* address feedback and also maintain first synthesized time stamp
With short term measurements, the adjustment itself was causing
some oscillations and drift tend to settle at some small value
and oscillated around it due to push/pull affecting small window
measurement.
It was possible that the adjustment applied in the middle
of a frame resulting in the same frame having multiple time stamps.
That would have caused video to pause/jump.
Apply the offset only at the start of the frame so that all
packets of a frame get the same offset.
* RTCP sender reports every three seconds.
Ideally, we should be sending this based on data rate.
But, increasing frequency a little as a lost sender report
means the client may not have sender report for 10 seconds
and that could affect sync. We do receiver reports once a second.
Thought of setting this to that level too, but not making a big change
from existing rate.
Also, simplifying the RTCP send loop. Don't need to hold and
do the processing after collecting all reports.
* consistent use of GetSubscribedTracks
* Experimental flag to try time stamp adjustment to control drift.
There is a config to enable this.
Using a PID controller to try and keep the sample rate at expected
value. Need to be seen if this works well. Adjustment are limited
to 25 ms max at a time to ensure there are no large jumps.
And it is applied when doing RTCP sender report which happens
once in 5 seconds currently for both audio and video tracks.
A nice introduction to PID controllers - https://alphaville.github.io/qub/pid-101/#/
Implementation borrowed from - https://github.com/pms67/PID
A few things TODO
1. PID controller tuning is a process. Have picked values from test from
that implementation above. May not be the best. Need to try.
2. Can potentially run this more often. Rather than running it only when
running RTCP sender report (which is once in 5 seconds now), can
potentially run it every second and limit the amount of change to
something like 10 ms max.
* remove unused variable
* debug log a bit more
* Keep track of expected RTP time stamp and control drift.
- Use monotonic clock in RTCP Sender Report and packet times
- Keep the time stamp close to expected time stamp on layer/SSRC
switches
* clean up
* fix test compile
* more test compile failures
* anticipatory clean up
* further clean up
* add received sender report logging
* Keep track of expected RTP time stamp and control drift.
- Use monotonic clock in RTCP Sender Report and packet times
- Keep the time stamp close to expected time stamp on layer/SSRC
switches
* clean up
* fix test compile
* more test compile failures