Commit Graph

359 Commits

Author SHA1 Message Date
Raja Subramanian
27f6794e77 Check sender report against media path. (#2843)
Seeing cases (mostly across relay) of large first packet time adjustment
getting ignored. From data, it looks like the first packet is extremely
delayed (some times of the order of minutes) which does not make sense.

Adding some checks against media path, i. e. compare RTP timestamp from
sender report against expected RTP timestamp based on media path
arrivals and log deviations more than 5 seconds.

Another puzzling case. Trying to understand more.

Also, refactoring SetRtcpSenderReportData() function as it was getting
unwieldy.
2024-07-09 09:20:27 +05:30
Raja Subramanian
acbd4ea104 Handle cases of long mute/rollover of time stamp. (#2842)
* Handle cases of long mute/rollover of time stamp.

There are cases where the track is muted for long enough for timestamp
roll over to happen. There are no packets in that window (typically
there should be black frames (for video) or silence (for audio)). But,
maybe the pause based implementation of mute is causing this.

Anyhow, use time since last packet to gauge how much roll over should
have happened and use that to update time stamp. There will be really
edge cases where this could also fail (for e. g. packet time is affected
by propagation delay, so it could theoretically happen that mute/unmute
+ packet reception could happen exactly around that rollover point and
  miscalculate, but should be rare).

As this happen per packet on receive side, changing time to `UnixNano()`
to make it more efficient to check this.

* spelling

* tests

* test util

* tests
2024-07-08 11:07:20 +05:30
Raja Subramanian
39c59d913d Do not warn on padding (#2839) 2024-07-07 12:30:54 +05:30
Raja Subramanian
bfb7db2d91 RTP packet validity check. (#2833)
Adding some checks before packet is forwarded to check for anomalies.
Will remove after a round of debug.
2024-07-04 12:42:25 +05:30
Raja Subramanian
b4134edf40 Log rtp stats state on large jumps. (#2829)
Forgot to include in receiver.
2024-07-01 11:51:05 +05:30
Raja Subramanian
57980fcc36 fix logging ignored key (#2826) 2024-06-28 10:34:41 +05:30
Raja Subramanian
fa490dd510 Log rtp stats more consistently. (#2816)
* Log rtp stats more consistently.

Thank you Paul for the logging tip.
Also update deps.

* remove duplicate logging field

* nil check
2024-06-25 14:55:42 +05:30
Raja Subramanian
6bb48dd6f1 Do not log duplicate on large negative on send side (#2815) 2024-06-24 12:51:35 +05:30
Raja Subramanian
cdb5f3ed68 Log more around unexpected cases (#2813)
- too many padding packets
- also fix cse of snapshot not getting any packets
2024-06-23 00:33:56 +05:30
Raja Subramanian
091eab556d Update mediatransportutil (#2812) 2024-06-22 11:32:28 +05:30
Raja Subramanian
d4e50b633f Do not log warns on duplicate. (#2807)
With RTX, some clients use very old packets for probing. Check for
duplicate before logging warning about old packet/negative sequence
number jump.

Also, double the history so that duplicate tracking is better. Adds
about 1/2 KB per RTP stream.
2024-06-20 10:52:12 +05:30
Raja Subramanian
5d969ba35b remove some debug (#2797) 2024-06-17 12:57:04 +05:30
Raja Subramanian
ea60368100 Do not error out on invalid packet. (#2789)
Remove the return when encountering invalid packet.
Also, log more sparesely.
Proper error returns from util so that we can selectively drop packets
based on error type, for example SSRC mismatches are okay type of thing.
2024-06-14 11:10:57 +05:30
Raja Subramanian
129ba62d61 Validate RTP packets. (#2778)
* Validate RTP packets.

Check version, payload type (if available) and SSRC (if available)
and drop bad packets. And let repair mechanisms take effect for those
packets.

* address data race reported by test

* fix an unlock and test packets
2024-06-10 15:43:59 +05:30
Raja Subramanian
a31f59b689 Log first time adjustment total. (#2776)
* Log first time adjustment total.

Seeing cases where the first time is 400ms+ before start time.
Possible it is getting that much adjustment, but would be good to see
how much total adjustment happens.

* log propagation delay
2024-06-09 23:07:01 +05:30
Raja Subramanian
38d213ed10 Do not compare payload type before bind (#2775) 2024-06-09 01:03:38 +05:30
Raja Subramanian
b58db82254 Log invalid RTP packet (#2774) 2024-06-08 10:36:05 +05:30
Raja Subramanian
73852d0a13 Reduce large sequence number jump threshold for logging. (#2770)
Seeing some unexplained large jumps on remotes across relay. Unclear if
there was a jump on origin side at some point. Reducing threshold for
large jump so that we can catch unexpected jumps more.
2024-06-07 12:36:02 +05:30
Raja Subramanian
7d035deef8 Clean up logging fields a bit (#2767) 2024-06-06 23:03:21 +05:30
cnderrauber
908baeb942 initialize bucket size by publish bitrates (#2763) 2024-06-06 14:31:20 +08:00
Raja Subramanian
03bb468472 Log range map for debugging. (#2754)
* Log range map for debugging.

* log details on errors

* log details
2024-06-04 08:00:26 +05:30
Raja Subramanian
447793d077 Move RTT errors to Debugw. (#2742)
With the move to forwarding NTP timestamp as is, we get a bunch more of
this error logged as the remote is basing it off of previous report and
local (i. e. server-side) bases it off of a more recent report.

Anyhow, this code has been around for a long time and there is nothing
new to learn from those errors. Just log it at Debugw in case we can
learn something from it for specific projects or environments where
Debugw is okay.
2024-05-29 11:26:30 +05:30
Raja Subramanian
9781d30611 Do not propagate RTCP if report is not processed. (#2739) 2024-05-28 19:29:54 +05:30
Raja Subramanian
8be2005e0f More detailed logging to understand old packets. (#2730) 2024-05-25 18:34:55 +05:30
Raja Subramanian
96cb829b84 Log more info when adjusting start timestamp. (#2722)
Seeing some large time stamp jump in relay down track once in a while.
Logging more details on time stamp switch to learn more.
2024-05-23 13:03:26 +05:30
Raja Subramanian
ef6f205fcc Pass through timestamp in abs capture time (#2715) 2024-05-15 11:41:37 +05:30
Raja Subramanian
91520a36e0 Add a flag to pass through timestamp. (#2714)
* Add a flag to psss through timestamp.

Can make it a config later if needed.

* log both adjusted and non-adjusted
2024-05-13 15:11:28 +05:30
Raja Subramanian
66a3a8e028 cond broadcast always. (#2699)
With Read and ReadExtended waiting (they are two different goroutines),
use Broadcast always. In theory, they both should not be waiting at the
same time, but just being safe.
2024-05-10 12:32:28 +05:30
Raja Subramanian
674550ea16 Option to disable traffic load tracking. (#2698) 2024-05-02 19:09:12 +05:30
Raja Subramanian
a2a8810734 log the correct new propagation delay (#2694) 2024-04-30 08:29:59 +05:30
Raja Subramanian
c8b289daa5 (Attempted) Simplify time stamp calculation on switches. (#2688)
* Simplify time stamp calculation on switches.

Trying to simplify time stamp calculation on restarts.
The additional checks take effect rarely and it not worth the extra
complication.

Also, doing the reference time stamp in extended range.
The challenge with that is when publisher migrates the extended
timestamp could change post migration (i. e. post migration would not
know about rollovers). To address that, maintain an offset that is
updated on resync.

* WIP

* Revert to resume threshold

* typo

* clean up
2024-04-28 12:13:52 +05:30
Paul Wells
18b3b7b421 use readCond in buffer read (#2691) 2024-04-27 04:04:01 -07:00
Raja Subramanian
2ad0efc28f Handle large jumps in RTCP sender report timestamp. (#2674)
* Handle large jumps in RTCP sender report timestamp.

Seeing cases of RTCP Sender Report spaced apart by more than half the
RTP Timestamp range. Maybe a case of laptop going to sleep and waking
up. Handle it using time diff from last report and calculating expected
timestamp.

* try go 1.22
2024-04-22 23:04:56 +05:30
Raja Subramanian
af0b0c4734 Connection quality LOST only if RTCP is also not available. (#2670)
* Connection quality LOST only if RTCP is also not available.

It is possible that sender stops all layers of video due to some
constraint (CPU or bandwidth). Packet reception going dry due to
that should not trigger `LOST` quality.

Add last received RTCP time also to distinguish the case
of real `LOST` and sender stopping traffic.

Some bits to watch for
- With audio, RTCP reports could be more than 5 seconds apart (5 seconds
  is the default interval for connection quality scorer), but audio
  senders usually send silence packets even when there is no input.
  So audio completely stopping can be considered `LOST`.
- With video, have to observe if all clients continue to send RTCP even
  if all layers are stopped.
- RTCP bandwidth is not supposed to exceed the primary stream bandwidth.
  libwebrtc calculates that and spaces out RTCP reports accordingly.
  That is the reason why audio reports are that far apart. If a video
  stream is encoded at a very low bit rate, it could also be sending
  RTCP rarely. So, there is the case of LOST being indistinguishable
  from sender stopping all layers. But, this should be a rare case.

* typo
2024-04-21 23:35:24 +05:30
cnderrauber
8eb86f1077 Don't log dd invalid template index (#2664)
If the first packet of keyframe has template structure is lost then
subsequent packets rely on it will report invalid tempalte error which
is expected.
2024-04-19 16:48:36 +08:00
Raja Subramanian
04d193e0b2 Update mediatransportutil. (#2652)
Also, use adjusted time of sender report for drift logging.
2024-04-16 10:10:06 +05:30
Raja Subramanian
d55948f761 Add PropagationDelay API to sender report data (#2646) 2024-04-11 20:00:13 +05:30
Raja Subramanian
ad1f508680 Add support for "abs-capture-time" extension. (#2640)
* Add support for "abs-capture-time" extension.

Currently, it is just passed through from publisher -> subscriber side.

TODO: Need to store in sequencer and restore for retransmission.

* abs-capture-time in retransmissions

* clean up

* fix test

* more test fixes

* more test fixes

* more test fixes

* log only when size is non-zero

* log on both sides for debugging

* add marshal/unmarshal

* normalize abs capture time to SFU clock

* comment out adding abs-capture-time from registered extensions
2024-04-11 15:25:10 +05:30
Raja Subramanian
21fbda3470 Silence some noisy debug logs (#2643) 2024-04-11 10:58:19 +05:30
Raja Subramanian
ddece1fbb0 Use aarival time in cached packets. (#2633) 2024-04-08 11:29:55 +05:30
Raja Subramanian
8852d71a8a Disable audio loss proxying. (#2629)
* Disable audio loss proxying.

Added a config which is off by default.
With audio NACKs, that is the preferred repair mechanism.
With RED, repair is built in via packet redundancy to recover from
isolated losses.
So, proxying is not required. But, leaving it in there with a config
that is disabled by default.

* fix test
2024-04-06 11:28:04 +05:30
Raja Subramanian
e93611eafa Log sender reports. (#2625) 2024-04-05 18:21:38 +05:30
Raja Subramanian
63b1fba082 Add start/end time to AnalyticsStream. (#2618)
* Add start/end time to AnalyticsStream.

* fix test
2024-04-03 12:23:18 +05:30
Raja Subramanian
860702e9dc Prevent large spikes in propagation delay (#2615)
* Prevent large spikes in propagation delay

A few tweaks
- Large spike in propagation delay due to congested channel results in
  long term estimate getting high value. Ignore outliers in long term
  estimate.
- Introduce a new field for adjusted arrival time as adjusting the
  arrival time in place meant it got applied again across the relay and
  that caused different propagation delay on remote nodes.
- Reset path change counters as long as there is any sample that is not
  higher than the multiple of long term. There was a case of
  o Sample with high value that triggered path change start.
  o Then some samples with high enough delta, but did not meet the
    criteria for increasing counter further.
  o Some time later, another sample met the threshold and that triggered
    a path change re-init.

* do not adapt to large delta
2024-04-02 14:21:20 +05:30
Raja Subramanian
4c9e59dc25 Small tweaks to propagation delay adaptation. (#2607) 2024-03-30 21:53:18 +05:30
Raja Subramanian
b5de646073 Remove redundant check. (#2605)
* Remove redundant check.

That check is already at the ouside check.

* print string

* space
2024-03-30 00:31:26 +05:30
cnderrauber
0a35e59ebd Replace sleep with sync.Cond to reduce jitter (#2603) 2024-03-29 17:24:31 +08:00
Raja Subramanian
0480f99a83 Tweak adaptation to increase in propagation delay. (#2598)
* Tweak adaptation to increase in propagation delay.

A couple of issues
- RTCP Sender Reports rate will vary based on underying track bitrate.
  (at least in theory, not all entities will do it though, for example
  SFU does standard rate of one per three seconds irrespective of track
  bit rate). So, adapt the long term estimate of propagation delay delta
  based on spacing of reports.
- Re-init of propagation delay to adapt to path change was taking the
  last value before the switch. But, that one value could have been an
  outlier and accepting it is not great. So, adapt spike time
  propagation delay in a smoother fashion to ensure that all values
  during spike contribute to the final value.

* clean up
2024-03-26 17:33:24 +05:30
Raja Subramanian
7945c01dbe Reset sharp increase if received delta is small. (#2592) 2024-03-21 10:25:45 +05:30
Raja Subramanian
03ada9ba76 Proper RTCP report past mute. (#2588)
- When audio is muted, server injects silence frames which moves the
  time stamp forward and adjusts offset. That cannot be used against
  publisher side sender report. Use a pinned version.
- Ignore small changes to propagation delay even while checking for
  sharp increase. That is spamming a lot for small changes, i.e.
  existing delta is 100 micro seconds or so and the new one is 300 micro
  seconds. Also rename to `longTerm` from `smoothed` as it is a slow
  varying long term estimate of propagation delay delta. And slow down
  that adaptation more.
2024-03-19 11:59:24 +05:30