280 Commits

Author SHA1 Message Date
Raja Subramanian
1f335dd564 Convert to formatter string for lazy evaluation. (#2298) 2023-12-06 18:11:05 +05:30
Raja Subramanian
83efa9258e Bump up protocol for connection quality LOST. (#2297)
Also log trackID/trackInfo in layer mapping.
2023-12-06 16:59:05 +05:30
cnderrauber
e1cc9d6b3c Fix log marshal error (#2295) 2023-12-06 00:08:48 +08:00
David Zhao
3fe124c87f Log cleanup pass (#2285)
* Log cleanup pass

Demoted a bunch of logs to DEBUG, consolidated logs.

* use context logger and fix context var usage

* moved common error types, fixed tests
2023-12-02 15:07:31 -08:00
Raja Subramanian
2299a493de Throttle DD parse logs (#2281) 2023-12-01 12:42:12 +05:30
Raja Subramanian
2ee5aa7c98 Add optional supervisor disable. (#2277)
* Add optional supervisor disable.

Used `DisableSupervisor` so that default can be enabled and
it can be disabled explicity. But, open to defaulting to disable
(i. e. change param to `EnableSupervisor`).

* Move nil check to call site
2023-11-30 13:04:31 +05:30
cnderrauber
0f1c1ec224 clean dd log (#2275)
* clean dd log

* Implemented Raja's feedback

---------

Co-authored-by: David Zhao <dz@livekit.io>
2023-11-29 12:12:29 -08:00
Raja Subramanian
bfc4f19c74 Guard against bad quality in trackInfo (#2271) 2023-11-28 22:28:30 +05:30
Raja Subramanian
53542b09a0 Participant traffic load. (#2262)
* Participant traffic load.

Capturing information about participant traffic
- Upstream/Downstream
- Audio/Video/Data
- Packets/Bytes

This captures a notion of how much traffic load a participant is
generating.

Can be used to make allocation decisions.

* Clean up

* SIP patches

* reporter goroutine

* unlock

* move traffic stats from protocol

* check type
2023-11-26 23:05:00 +05:30
Raja Subramanian
8c3ec742e6 Use now for end time (#2248)
* Use now for end time

* less arithmetic
2023-11-17 12:00:47 +05:30
Raja Subramanian
c62382c76e Clean up restart a bit. (#2247) 2023-11-17 00:40:00 +05:30
Raja Subramanian
440f00bcac Declare audio inactive if stale. (#2229)
* Declare audio inactive if stale.

Stale samples were used to declare audio active.
Maintain last update time and declare inactive if samples are stale.

* correct comment

* spelling

* check level in test
2023-11-08 11:13:39 +05:30
Raja Subramanian
12a9d74acb Do not restart on receiver side. (#2224)
* Do not restart on receiver side.

Restart with wrap back causes issues in the forwarding path
as the subscriber assumes the extended type from receiver side does
not restart.

Restart was an attempt to include as many packets as possible, but
in practice is not super useful. So, taking it out. Can clean up
a bit more stuff, but want to run this first and check for any oddities.

* fix test
2023-11-06 10:41:56 +05:30
cnderrauber
f247b68ed6 Make sure dd selector uses correct keyframe to select packets (#2218)
* Make sure dd selector uses correct keyframe to select packets

* Fix test case

* remove unsed field
2023-11-03 17:49:02 +08:00
Raja Subramanian
0bdfdb0c49 Squelching DD reader error. (#2215)
Squelching Structure is nil error as it can happen on packets
received before a key frame is received.
2023-11-02 11:10:28 +05:30
Raja Subramanian
45346d7c76 Clean up condition that is not happening (#2207) 2023-11-01 15:17:09 +05:30
Raja Subramanian
c93a88bd9b Log starts on metadata cache overflow. (#2206) 2023-11-01 10:55:07 +05:30
cnderrauber
1f0ba21854 Fix svc: Drop frame is earlier than current keyframe (#2196)
* Fix svc: Drop frame is earlier than current keyframe

* Log detail of dependencydescriptor
2023-10-27 13:57:03 +08:00
Raja Subramanian
ce8f64176a Log correct time difference (#2192) 2023-10-26 15:34:49 +05:30
Raja Subramanian
490b9f4f4c No sync when starting from nothing (#2191)
When starting from scratch (like mute -> unmute), it is possible
that the check sync does not detect a broken chain. That results
in PLIs not being sent and the video frozen till a gratuitous key
frame arrives.

Unclear why there are not PLIs from client side. That is something else to
dig into.
2023-10-26 13:39:11 +05:30
Raja Subramanian
047a4ac870 Apply repair to the newest cached report (#2186) 2023-10-26 03:43:52 +05:30
Raja Subramanian
fa01297d96 Slight sequencer tweaks. (#2184)
The buffer is not for padding packets. So, calculate
adjusted sequence numbers before comparing against size.

Also, it is possible that invalidated slot is accessed
due to not being able to exclude padding range. This was
causing time stamp reset to 0. Will remove the error log
after this goes out and the condition does not show up
for a few days.
2023-10-25 23:12:14 +05:30
Raja Subramanian
f4a3618000 Log error on 0 time stamp. (#2174)
Need backtrace for source of it.
Also, do not reset start if 0, that is incorrect.
2023-10-23 23:00:03 +05:30
Raja Subramanian
f622fc2490 Sample clock skew down by an order of magnitude (#2173) 2023-10-23 16:58:02 +05:30
Raja Subramanian
3e9450c774 Log more details in warns. (#2166)
Logging more details in warns so that we do not have to enable Infow
for some logs later.
2023-10-21 11:02:34 +05:30
Raja Subramanian
b591c56aa3 Logging reduction. (#2165)
Move some to Debugw and add sampling for a few.
2023-10-21 10:26:30 +05:30
Raja Subramanian
0407eb4833 Log audio packets in forwarding path. (#2162)
Seeing a time stamp jump that I am not able to explain.
Basically, it looks like the time stamp doubles at some
point. There is no code which doubles the timestamp.
Can understand an erroneous roll over/wrap around, but
doubling is very strange.

So, logging only audio packets. Will disable as soon
as I have some smaples from canary.
2023-10-21 01:37:30 +05:30
Raja Subramanian
5bf2e5fd4a Log clock deviations in sender report. (#2161)
Seeing some unexplained jumps in sender report time stamp
in canary. Wonder if the calculated clock rate is way off
during some interval. Logging clock deviations to understand
better.
2023-10-20 23:06:34 +05:30
Raja Subramanian
43a0ca57b5 Clear flags in packet metadata cache before setting them. (#2160)
Not sure if this could have resulted in bad FPS calculation,
but could have contributed to it.
2023-10-20 12:13:29 +05:30
Raja Subramanian
0d7477178e More fine grained filtering NACKs after a key frame. (#2159)
* More fine grained filtering NACKs after a key frame.

There are applications with periodic key frame.
So, a packet lost before a key frame will not be retransmitted.
But, decoder could wait (jitter buffer, play out time) and cause
a stutter.

Idea behind disabling NACKs after key frame was another knob to
throttle retransmission bit rate. But, with spaced out retransmissions
and max retransmissions per sequence number, there are throttles.
This would provide more throttling, but affects some applications.
So, disabling filtering NACKs after a key frame.

Introducing another flag to disallow layers. This would still be quite
useful, i. e. under congestion the stream allocator would move the
target lower. But, because of congestion, higher layer would have lost
a bunch of packets. Client would NACK those. Retransmitting those higher
layer packets would congest the channel more. The new flag (default
enabled) would disallow higher layers retransmission. This was happening
before this change also, just splitting out the flag for more control.

* split flag
2023-10-20 00:44:39 +05:30
Raja Subramanian
e461e9cd79 Log skew in clock rate. (#2158)
* Log skew in clock rate.

Remember seeing sender report time stamp moving backward
across mute with replaceTrack(null). Not able to reproduce
it in JS sample app, but have seen it elsewhere.

Logging to understand it better. Wondering if the sender report
should be reset on time stamp moving backward or if we should drop
backwards moving reports.

* set threshold at 20%
2023-10-19 13:58:50 +05:30
Raja Subramanian
f653efcf10 Do not update highest time on padding packet. (#2157)
* Error log of padding updating highest time to get backtrace.

* Do not update highest time on padding packet.

Padding packets use time stamp of last packet sent.
Padding packets could be sent when probing much after last packet
was sent. Updating highest time on that screws up sender report
calculations. We have ways of making sure sender reports do not
get too out-of-whack, but it logs during that repair.
That repair should be unnecessary unless the source is behaving weird
(things like publisher sending all packets at the same time, publisher
sample rate is incorrect, etc.)
2023-10-19 12:01:48 +05:30
Raja Subramanian
7c830ea5b9 Log highest time update on padding packet. (#2154)
* Log highest time update on padding packet.

Seeing a strange case of what looks like highest time getting
updated on a padding packet. Can't see how it happens in code.
So, logging to check. Will be removing log after checking.

* log sequence number also
2023-10-19 00:53:50 +05:30
Raja Subramanian
f97242c8ba Use 32-bit time stamp to get reference time stamp on a switch. (#2153)
* Use 32-bit time stamp to get reference time stamp on a switch.

With relay and dyncast and migration, it is possible that different
layers of a simulcast get out of sync in terms of extended type,
i. e. layer 0 could keep running and its timestamp could have
wrapped around and bumped the extended timestamp. But, another layer
could start and stop.

One possible solution is sending the extended timestamp across relay.

But, that breaks down during migration if publisher has started afresh.
Subscriber could still be using extended range.

So, use 32-bit timestamp to infer reference timestamp and patch it with
expected extended time stamp to derive the extended reference.

* use calculated value

* make it test friendly
2023-10-18 21:48:41 +05:30
Raja Subramanian
3e4cd3a161 Accept more range for first packet time adjustment. (#2150) 2023-10-17 23:52:14 +05:30
Raja Subramanian
70b60101f4 do the proper large negative check (#2139) 2023-10-10 11:01:49 +05:30
Raja Subramanian
31b042ddce Log larga negative gap. (#2138)
Seeing a large positive gap which I am not able to explain.
Wondering if at some other time, a large negative is happening
and the large positive is just a correction.
2023-10-09 14:32:10 +05:30
Raja Subramanian
ebe1470e46 Simplify (#2137) 2023-10-07 13:15:51 +05:30
Raja Subramanian
9fc276481a Increase accuracy of delay since last sender report. (#2136)
It is in 1 / 65536 seconds units. That is about 0.015 ms.
So, use microseconds to increase accuracy.
2023-10-07 12:25:47 +05:30
Raja Subramanian
4ea284fae0 Log potential sequence number de-sync in receiver reports. (#2128)
* Log potential sequence number de-sync in receicer reports.

Seeing some cases of a roll over being missed. That ends up
as largish range to search in an interval and reports missing packets
in the packet metadata cache.

Logging some details.

* just log in one place
2023-10-05 13:10:13 +05:30
Raja Subramanian
5aa093f65d Log time when there are too many packets. (#2127)
Ideally, can remove the nil return when there are too many packets
as we have more information with extended sequence numbers, but
logging duration first to understand what is happening better.
2023-10-05 12:02:55 +05:30
Raja Subramanian
6c49d1a160 Logging a few bits at Infow (#2126)
Seeing sequencer errors with egress (related to dummy start).
So, logging a few bits at Infow to understand them better.
2023-10-05 11:16:31 +05:30
Raja Subramanian
c710ab901e Log stream start at Infow (#2125)
Seeing some sequence number adjustment where timestamp is 0.
Want to check the stream start.
2023-10-05 00:12:41 +05:30
Raja Subramanian
cb0a48c12d Handle RED extended sequence number. (#2123)
When converting from RED -> Opus, if there is a loss, SFU recovers
that loss if it can using a subsequent redundant packet. That path
was not setting the extended sequence number properly.

Also, ensuring use of monotonic clock for first packet time adjustment
also.
2023-10-03 19:38:55 +05:30
Raja Subramanian
96ccf696d3 Cap expected packets to padding diff. (#2122)
* Cap expected packets to padding diff.

On the receiver, no longer using packet metadata cache to calculate
interval stats. An optimisation to get rid of packet metadata cache
on receiver side.

Because of that, padding packets in an interval could be more than
expected packets. As padding packets is just a counter, out-of-order
padding packets will make the diff look larger than expected packets
in a window. Cap the expected to 0.

NOTE: This makes it so that the count is not accurate in a window,
but that is okay occasionally. It will affect reported stats and quality
calculations, but it should be rare. For example, if 30 packets were
received in a window and 60 out-of-order padding packets were received,
it would reported as 0 packets were received. One option is to not
increment padding packets when they are out-of-order, but that will mess
up overall stats. Will make that change if we see this happen a lot.

* log unexpected padding packets
2023-10-03 12:36:19 +05:30
Raja Subramanian
e6e3e2a729 sligtly easier readability (#2121) 2023-10-02 22:47:40 +05:30
Raja Subramanian
989d621c97 A small change to move out of order check before RTT calc. (#2117) 2023-10-02 11:51:46 +05:30
Raja Subramanian
180ad541fc Mark packet not handled if restart is rejected (#2115)
* Mark packet not handled if restart is rejected

* log both sn and ts on restart/rollback
2023-09-30 10:24:13 +05:30
Raja Subramanian
ee3a7c01bc Log resync at Infow. (#2114)
* Log resync at Infow.

Seeing potentially large sequence number jumps on a resync.
And it seems to happen on a lot of subscribe/unsubscribe.
Logging at Infow to understand better.

Probably need to find a way to avoid resync. But, logging for now to
check if I can catch one.

* Remove resync and log large sequence number jumps
2023-09-30 08:42:46 +05:30
Pingos
4f9467040e Bind() function fails when mime == "audio/red" (#2104) 2023-09-26 13:36:54 +08:00