With RTX, some clients use very old packets for probing. Check for
duplicate before logging warning about old packet/negative sequence
number jump.
Also, double the history so that duplicate tracking is better. Adds
about 1/2 KB per RTP stream.
That is the main change. Changed variable name to `isExpectedToResume`
everywhere to be consistent.
Planning to use the callback value in relays to determine if the down
track should be closed or switched to a different up track.
Remove the return when encountering invalid packet.
Also, log more sparesely.
Proper error returns from util so that we can selectively drop packets
based on error type, for example SSRC mismatches are okay type of thing.
* Validate RTP packets.
Check version, payload type (if available) and SSRC (if available)
and drop bad packets. And let repair mechanisms take effect for those
packets.
* address data race reported by test
* fix an unlock and test packets
* Log first time adjustment total.
Seeing cases where the first time is 400ms+ before start time.
Possible it is getting that much adjustment, but would be good to see
how much total adjustment happens.
* log propagation delay
* Better lock for sender report TS offset.
It is possible that a resume has happened and new time stamp offset
calculated. But, a sender report from publisher comes with a time stamp
prior to the time stamp which was used for offset calculation. Using
that sender report in the forwarding path causes jumps.
Example
- Track forwarding, let us tsOffset = `a`
- Unmute/layer switch - one of those events happens, a new tsOffset will
be calculated, let us say that offset is `b` and it is based on
incoming time stmap of `c`.
- A sender report from publisher could arrive with timestamp = `d`.
o If `d` >= `c`, the offset `b` is correct and can be applied.
o But, it is possible that `d` < `c`, in that case, offset `a` should
be used and not `b`.
To address this, keep track of incoming extended timestamp at switch
point and accept incoming sender reports which have a timestamp >=
switch point timestamp.
* clean up
* log more details on invalid layer
Seeing some unexplained large jumps on remotes across relay. Unclear if
there was a jump on origin side at some point. Reducing threshold for
large jump so that we can catch unexpected jumps more.
Previously, the bit rate interval config was checked first. That would
have returned `!ok` for invalid layers. A recent change to prevent
duplicate tracker addition re-arranged the code and the tracker array
was accessed out-of-bounds.
Unclear why an invalid layer is passed in. Need to investigate that.
There are cases where the RTP time stamp does not increment acros
mute/unmute. Seems to happen fairly consistently with React Native
clients.
Something like the following happens
- Track is progressing
- Mute at `t = x`, assume RTP time stamp at the point is `y` and RTP clock
rate is `z`.
- Through mute, more RTCP sender reports come in from publisher and the
RTP time stamp in those reports are progressing at expected rate of
`z` RTP clock ticks per second.
- Forwarding path uses those sender reports from publisher to build the
sender report for subscribers.
- Unmute happens at `t = x + a` seconds.
- Ideally packets coming in after that, should have a time stamp of `y +
(a * z)`, but they tend to have something a little bit more than `y`.
- RTCP sender reports also have a time stamp that goes back. SFU ignores
these.
- Mean while the forwarding path has adjusted to the new RTP time stamp
base and it has calculated a TS offset (from publisher -> subscriber).
Effectively, that offset comes out close to `(a * z)`, i. e. jump
corresponding to the mute interval.
- When it is time to send a RTCP sender report to subscriber, the old
sender report from publisher is used (as intervening ones from
publisher were rejected because time stamp is moving backwards).
The problem is that the old report is used with new offset.
So, it looks like time stamp jumped ahead by `a` seconds.
Address it by storing time stamp offset at the time of receiving the
publisher side sender report. And use that while sending subscriber side
sender report. There are very edge cases where this can
get mismatched, but should be rare. Hopefully, this should prevent
unnecessary jumps in time stamp in RTCP sender report to subscribers.
Was hitting the edge case mentioned in the (now deleted in this PR)
comments. It is fine to reset and let it declare available again.
Available layer handler will ignore repeats.
When relaying buffers are stopped and restarted. On a restart,
the buffer adds a tracker. But, the tracker is not destroyed till the
end. So, the old tracker and new tracker for the same layer stomp on
each other and declare layer unavailable (the old tracker is not getting
any packets).
Fix by not creating a new tracker if one exists already.
With the move to forwarding NTP timestamp as is, we get a bunch more of
this error logged as the remote is basing it off of previous report and
local (i. e. server-side) bases it off of a more recent report.
Anyhow, this code has been around for a long time and there is nothing
new to learn from those errors. Just log it at Debugw in case we can
learn something from it for specific projects or environments where
Debugw is okay.