Commit Graph

55 Commits

Author SHA1 Message Date
Raja Subramanian faa66d1138 Make sender report pass through an option. (#2861)
Enabled by default.

Also, tweak the long term propagation delay a bit. The first propagation
delay itself was too high and the long term initialized with a high
value. Prevent that and also ensure large negtaives do not have an
effect by using a lower bound of 0. Lower bound of 0 is okay as the main
purpose is to track sustained high positive values.
2024-07-15 11:27:31 +05:30
Raja Subramanian 27f6794e77 Check sender report against media path. (#2843)
Seeing cases (mostly across relay) of large first packet time adjustment
getting ignored. From data, it looks like the first packet is extremely
delayed (some times of the order of minutes) which does not make sense.

Adding some checks against media path, i. e. compare RTP timestamp from
sender report against expected RTP timestamp based on media path
arrivals and log deviations more than 5 seconds.

Another puzzling case. Trying to understand more.

Also, refactoring SetRtcpSenderReportData() function as it was getting
unwieldy.
2024-07-09 09:20:27 +05:30
Raja Subramanian acbd4ea104 Handle cases of long mute/rollover of time stamp. (#2842)
* Handle cases of long mute/rollover of time stamp.

There are cases where the track is muted for long enough for timestamp
roll over to happen. There are no packets in that window (typically
there should be black frames (for video) or silence (for audio)). But,
maybe the pause based implementation of mute is causing this.

Anyhow, use time since last packet to gauge how much roll over should
have happened and use that to update time stamp. There will be really
edge cases where this could also fail (for e. g. packet time is affected
by propagation delay, so it could theoretically happen that mute/unmute
+ packet reception could happen exactly around that rollover point and
  miscalculate, but should be rare).

As this happen per packet on receive side, changing time to `UnixNano()`
to make it more efficient to check this.

* spelling

* tests

* test util

* tests
2024-07-08 11:07:20 +05:30
Raja Subramanian b4134edf40 Log rtp stats state on large jumps. (#2829)
Forgot to include in receiver.
2024-07-01 11:51:05 +05:30
Raja Subramanian 57980fcc36 fix logging ignored key (#2826) 2024-06-28 10:34:41 +05:30
Raja Subramanian fa490dd510 Log rtp stats more consistently. (#2816)
* Log rtp stats more consistently.

Thank you Paul for the logging tip.
Also update deps.

* remove duplicate logging field

* nil check
2024-06-25 14:55:42 +05:30
Raja Subramanian 6bb48dd6f1 Do not log duplicate on large negative on send side (#2815) 2024-06-24 12:51:35 +05:30
Raja Subramanian 091eab556d Update mediatransportutil (#2812) 2024-06-22 11:32:28 +05:30
Raja Subramanian d4e50b633f Do not log warns on duplicate. (#2807)
With RTX, some clients use very old packets for probing. Check for
duplicate before logging warning about old packet/negative sequence
number jump.

Also, double the history so that duplicate tracking is better. Adds
about 1/2 KB per RTP stream.
2024-06-20 10:52:12 +05:30
Raja Subramanian a31f59b689 Log first time adjustment total. (#2776)
* Log first time adjustment total.

Seeing cases where the first time is 400ms+ before start time.
Possible it is getting that much adjustment, but would be good to see
how much total adjustment happens.

* log propagation delay
2024-06-09 23:07:01 +05:30
Raja Subramanian 73852d0a13 Reduce large sequence number jump threshold for logging. (#2770)
Seeing some unexplained large jumps on remotes across relay. Unclear if
there was a jump on origin side at some point. Reducing threshold for
large jump so that we can catch unexpected jumps more.
2024-06-07 12:36:02 +05:30
Raja Subramanian 7d035deef8 Clean up logging fields a bit (#2767) 2024-06-06 23:03:21 +05:30
Raja Subramanian 9781d30611 Do not propagate RTCP if report is not processed. (#2739) 2024-05-28 19:29:54 +05:30
Raja Subramanian 8be2005e0f More detailed logging to understand old packets. (#2730) 2024-05-25 18:34:55 +05:30
Raja Subramanian 96cb829b84 Log more info when adjusting start timestamp. (#2722)
Seeing some large time stamp jump in relay down track once in a while.
Logging more details on time stamp switch to learn more.
2024-05-23 13:03:26 +05:30
Raja Subramanian a2a8810734 log the correct new propagation delay (#2694) 2024-04-30 08:29:59 +05:30
Raja Subramanian c8b289daa5 (Attempted) Simplify time stamp calculation on switches. (#2688)
* Simplify time stamp calculation on switches.

Trying to simplify time stamp calculation on restarts.
The additional checks take effect rarely and it not worth the extra
complication.

Also, doing the reference time stamp in extended range.
The challenge with that is when publisher migrates the extended
timestamp could change post migration (i. e. post migration would not
know about rollovers). To address that, maintain an offset that is
updated on resync.

* WIP

* Revert to resume threshold

* typo

* clean up
2024-04-28 12:13:52 +05:30
Raja Subramanian 2ad0efc28f Handle large jumps in RTCP sender report timestamp. (#2674)
* Handle large jumps in RTCP sender report timestamp.

Seeing cases of RTCP Sender Report spaced apart by more than half the
RTP Timestamp range. Maybe a case of laptop going to sleep and waking
up. Handle it using time diff from last report and calculating expected
timestamp.

* try go 1.22
2024-04-22 23:04:56 +05:30
Raja Subramanian af0b0c4734 Connection quality LOST only if RTCP is also not available. (#2670)
* Connection quality LOST only if RTCP is also not available.

It is possible that sender stops all layers of video due to some
constraint (CPU or bandwidth). Packet reception going dry due to
that should not trigger `LOST` quality.

Add last received RTCP time also to distinguish the case
of real `LOST` and sender stopping traffic.

Some bits to watch for
- With audio, RTCP reports could be more than 5 seconds apart (5 seconds
  is the default interval for connection quality scorer), but audio
  senders usually send silence packets even when there is no input.
  So audio completely stopping can be considered `LOST`.
- With video, have to observe if all clients continue to send RTCP even
  if all layers are stopped.
- RTCP bandwidth is not supposed to exceed the primary stream bandwidth.
  libwebrtc calculates that and spaces out RTCP reports accordingly.
  That is the reason why audio reports are that far apart. If a video
  stream is encoded at a very low bit rate, it could also be sending
  RTCP rarely. So, there is the case of LOST being indistinguishable
  from sender stopping all layers. But, this should be a rare case.

* typo
2024-04-21 23:35:24 +05:30
Raja Subramanian 21fbda3470 Silence some noisy debug logs (#2643) 2024-04-11 10:58:19 +05:30
Raja Subramanian 860702e9dc Prevent large spikes in propagation delay (#2615)
* Prevent large spikes in propagation delay

A few tweaks
- Large spike in propagation delay due to congested channel results in
  long term estimate getting high value. Ignore outliers in long term
  estimate.
- Introduce a new field for adjusted arrival time as adjusting the
  arrival time in place meant it got applied again across the relay and
  that caused different propagation delay on remote nodes.
- Reset path change counters as long as there is any sample that is not
  higher than the multiple of long term. There was a case of
  o Sample with high value that triggered path change start.
  o Then some samples with high enough delta, but did not meet the
    criteria for increasing counter further.
  o Some time later, another sample met the threshold and that triggered
    a path change re-init.

* do not adapt to large delta
2024-04-02 14:21:20 +05:30
Raja Subramanian 4c9e59dc25 Small tweaks to propagation delay adaptation. (#2607) 2024-03-30 21:53:18 +05:30
Raja Subramanian b5de646073 Remove redundant check. (#2605)
* Remove redundant check.

That check is already at the ouside check.

* print string

* space
2024-03-30 00:31:26 +05:30
Raja Subramanian 0480f99a83 Tweak adaptation to increase in propagation delay. (#2598)
* Tweak adaptation to increase in propagation delay.

A couple of issues
- RTCP Sender Reports rate will vary based on underying track bitrate.
  (at least in theory, not all entities will do it though, for example
  SFU does standard rate of one per three seconds irrespective of track
  bit rate). So, adapt the long term estimate of propagation delay delta
  based on spacing of reports.
- Re-init of propagation delay to adapt to path change was taking the
  last value before the switch. But, that one value could have been an
  outlier and accepting it is not great. So, adapt spike time
  propagation delay in a smoother fashion to ensure that all values
  during spike contribute to the final value.

* clean up
2024-03-26 17:33:24 +05:30
Raja Subramanian 7945c01dbe Reset sharp increase if received delta is small. (#2592) 2024-03-21 10:25:45 +05:30
Raja Subramanian 03ada9ba76 Proper RTCP report past mute. (#2588)
- When audio is muted, server injects silence frames which moves the
  time stamp forward and adjusts offset. That cannot be used against
  publisher side sender report. Use a pinned version.
- Ignore small changes to propagation delay even while checking for
  sharp increase. That is spamming a lot for small changes, i.e.
  existing delta is 100 micro seconds or so and the new one is 300 micro
  seconds. Also rename to `longTerm` from `smoothed` as it is a slow
  varying long term estimate of propagation delay delta. And slow down
  that adaptation more.
2024-03-19 11:59:24 +05:30
Raja Subramanian 4e96ad2e5b Missed clean up in the last PR (#2576)
* Missed clean up in the last PR

* Infow -> Debugw
2024-03-13 23:01:19 +05:30
Raja Subramanian 3e43f75143 Forward publisher sender report. (#2572)
* Forward publisher sender report.

Publisher side RTCP sernfer report is rebased to SFU time base
and used to send sender rerport to subscriber.

Will wait to merge till previous versions are out as this will require a
bunch of testing.

* - Add rebased report drift
- update protocol dep
- fix path change check, it has to check against delta of propagation
  delay and not propagation delay as the two side clocks could be way
  off.
2024-03-13 14:31:39 +05:30
Raja Subramanian 50c48ff29d Ignore out-of-order receiver side sender reports. (#2567) 2024-03-11 11:30:01 +05:30
Raja Subramanian bdbc9dcbc7 Use start time stamp to calculate down stream sender report. (#2564)
* Use start time stamp to calculate down stream sender report.

With first packet time adjustment, using the first time stamp is more
accurate.

This still suffers if the up stream clock rate changes (happens in cases
like noise suppression which is not well understood). Will be looking at
pass through of sender report from publisher to subscriber.

* similar log strings

* avoid early sender reports

* log messages

* Reduce first packet adjustment threshold to 15 seconds
2024-03-10 23:18:54 +05:30
Raja Subramanian a08b058abc Structured logging for sender report data. (#2563) 2024-03-10 01:29:37 +05:30
Raja Subramanian 0618cb39df Logging time and rtp diff for easier debugging (#2548) 2024-03-05 20:31:44 +05:30
Raja Subramanian d0128b19cd Reset sender reports before measuring clock skew. (#2437) 2024-02-02 21:52:43 +05:30
cnderrauber e1cc9d6b3c Fix log marshal error (#2295) 2023-12-06 00:08:48 +08:00
Raja Subramanian c62382c76e Clean up restart a bit. (#2247) 2023-11-17 00:40:00 +05:30
Raja Subramanian 12a9d74acb Do not restart on receiver side. (#2224)
* Do not restart on receiver side.

Restart with wrap back causes issues in the forwarding path
as the subscriber assumes the extended type from receiver side does
not restart.

Restart was an attempt to include as many packets as possible, but
in practice is not super useful. So, taking it out. Can clean up
a bit more stuff, but want to run this first and check for any oddities.

* fix test
2023-11-06 10:41:56 +05:30
Raja Subramanian fa01297d96 Slight sequencer tweaks. (#2184)
The buffer is not for padding packets. So, calculate
adjusted sequence numbers before comparing against size.

Also, it is possible that invalidated slot is accessed
due to not being able to exclude padding range. This was
causing time stamp reset to 0. Will remove the error log
after this goes out and the condition does not show up
for a few days.
2023-10-25 23:12:14 +05:30
Raja Subramanian f622fc2490 Sample clock skew down by an order of magnitude (#2173) 2023-10-23 16:58:02 +05:30
Raja Subramanian 3e9450c774 Log more details in warns. (#2166)
Logging more details in warns so that we do not have to enable Infow
for some logs later.
2023-10-21 11:02:34 +05:30
Raja Subramanian b591c56aa3 Logging reduction. (#2165)
Move some to Debugw and add sampling for a few.
2023-10-21 10:26:30 +05:30
Raja Subramanian 0d7477178e More fine grained filtering NACKs after a key frame. (#2159)
* More fine grained filtering NACKs after a key frame.

There are applications with periodic key frame.
So, a packet lost before a key frame will not be retransmitted.
But, decoder could wait (jitter buffer, play out time) and cause
a stutter.

Idea behind disabling NACKs after key frame was another knob to
throttle retransmission bit rate. But, with spaced out retransmissions
and max retransmissions per sequence number, there are throttles.
This would provide more throttling, but affects some applications.
So, disabling filtering NACKs after a key frame.

Introducing another flag to disallow layers. This would still be quite
useful, i. e. under congestion the stream allocator would move the
target lower. But, because of congestion, higher layer would have lost
a bunch of packets. Client would NACK those. Retransmitting those higher
layer packets would congest the channel more. The new flag (default
enabled) would disallow higher layers retransmission. This was happening
before this change also, just splitting out the flag for more control.

* split flag
2023-10-20 00:44:39 +05:30
Raja Subramanian e461e9cd79 Log skew in clock rate. (#2158)
* Log skew in clock rate.

Remember seeing sender report time stamp moving backward
across mute with replaceTrack(null). Not able to reproduce
it in JS sample app, but have seen it elsewhere.

Logging to understand it better. Wondering if the sender report
should be reset on time stamp moving backward or if we should drop
backwards moving reports.

* set threshold at 20%
2023-10-19 13:58:50 +05:30
Raja Subramanian f97242c8ba Use 32-bit time stamp to get reference time stamp on a switch. (#2153)
* Use 32-bit time stamp to get reference time stamp on a switch.

With relay and dyncast and migration, it is possible that different
layers of a simulcast get out of sync in terms of extended type,
i. e. layer 0 could keep running and its timestamp could have
wrapped around and bumped the extended timestamp. But, another layer
could start and stop.

One possible solution is sending the extended timestamp across relay.

But, that breaks down during migration if publisher has started afresh.
Subscriber could still be using extended range.

So, use 32-bit timestamp to infer reference timestamp and patch it with
expected extended time stamp to derive the extended reference.

* use calculated value

* make it test friendly
2023-10-18 21:48:41 +05:30
Raja Subramanian 3e4cd3a161 Accept more range for first packet time adjustment. (#2150) 2023-10-17 23:52:14 +05:30
Raja Subramanian 70b60101f4 do the proper large negative check (#2139) 2023-10-10 11:01:49 +05:30
Raja Subramanian 31b042ddce Log larga negative gap. (#2138)
Seeing a large positive gap which I am not able to explain.
Wondering if at some other time, a large negative is happening
and the large positive is just a correction.
2023-10-09 14:32:10 +05:30
Raja Subramanian ebe1470e46 Simplify (#2137) 2023-10-07 13:15:51 +05:30
Raja Subramanian 9fc276481a Increase accuracy of delay since last sender report. (#2136)
It is in 1 / 65536 seconds units. That is about 0.015 ms.
So, use microseconds to increase accuracy.
2023-10-07 12:25:47 +05:30
Raja Subramanian c710ab901e Log stream start at Infow (#2125)
Seeing some sequence number adjustment where timestamp is 0.
Want to check the stream start.
2023-10-05 00:12:41 +05:30
Raja Subramanian 180ad541fc Mark packet not handled if restart is rejected (#2115)
* Mark packet not handled if restart is rejected

* log both sn and ts on restart/rollback
2023-09-30 10:24:13 +05:30