* Introduce `DISCONNECTED` connection quality.
Currently, this state happens when any up stream track does not
send any packets in an analysis window when it is expected to send
packets.
This can be used by participants to know the quality of a potentially
disconnected participant. Previously, it took 20 - 30 seconds for
the stale timeout to kick in and disconnect the limbo participant which
triggered a participant update through which other participants knew
about it.
Previously, `POOR` quality was also overloaded to denote that the
up stream is not sending any packets. With this change, that is a
separate indicator, i. e. `DISCONNECTED`.
* clean up
* Update deps
* spelling
* Participant traffic load.
Capturing information about participant traffic
- Upstream/Downstream
- Audio/Video/Data
- Packets/Bytes
This captures a notion of how much traffic load a participant is
generating.
Can be used to make allocation decisions.
* Clean up
* SIP patches
* reporter goroutine
* unlock
* move traffic stats from protocol
* check type
* Reduce logging
1. Do not print rtp stats if nil. Means that some subscribed tracks may
not have any logs (very short subscriptions which end before any
packet is sent).
2. Log ICE candidates only at the end, not when ICE connects. That logs
the selected ICE candidate pair.
3. Log ICE candidates only if not empty.
* Update some deps
* Declare audio inactive if stale.
Stale samples were used to declare audio active.
Maintain last update time and declare inactive if samples are stale.
* correct comment
* spelling
* check level in test
* Do not restart on receiver side.
Restart with wrap back causes issues in the forwarding path
as the subscriber assumes the extended type from receiver side does
not restart.
Restart was an attempt to include as many packets as possible, but
in practice is not super useful. So, taking it out. Can clean up
a bit more stuff, but want to run this first and check for any oddities.
* fix test
* Prevent out-of-bounds access.
Don't know which codec causes a spatial layer three access.
Returning nil and also logging so that we know the trackID of offending
track.
* spelling
* Do server PLI when sync is required.
A few changes
- Run key frame requester goroutine always. Runs every 200 ms which is
not bad.
- Post a key frame request when server knows it needs one, like after an
allocation. This ensures that the initial request is not delayed.
- Periodic check will ensure PLI for cases like all frame chains of a
dependency descriptor being broken.
* simplify
A few things
- Log PLI requests from client.
- Pass in marker to RTP munger as SVC can insert marker.
- Adjusting first packet time should be aware of SVC as there is single
stream in SVC
When starting from scratch (like mute -> unmute), it is possible
that the check sync does not detect a broken chain. That results
in PLIs not being sent and the video frozen till a gratuitous key
frame arrives.
Unclear why there are not PLIs from client side. That is something else to
dig into.
SVC has only one stream and when calculating reference time stamp,
irrespective of reference layer, reference time stamp will be the
same as the given time stamp as there is only one stream and no offset.
TODO: Need better all around SVC handling.
The buffer is not for padding packets. So, calculate
adjusted sequence numbers before comparing against size.
Also, it is possible that invalidated slot is accessed
due to not being able to exclude padding range. This was
causing time stamp reset to 0. Will remove the error log
after this goes out and the condition does not show up
for a few days.
* Fix deadlock
My previous PR to wrap layer notifier post in bind lock was
problematic as `onBinding` callback happens within that lock
and that onBinding callback can call set max layer which will
post to channel. Use a separate mutex.
* RUnlock
* Do not post to closed channels.
Perils of atomics. Hard to imagine, but I guess it could happen.
The postMaxLayerNotifier checked for closed and down track was not
closed. But, between that check and posting to channel (which is
a very small window), the down track could have been closed and
the channel (maxLayerNotiferCh) is closed.
Protect that channel post + close with the bind lock.
* reduce the change
* Check for closed inside lock
The probing + munging has not been set up to drop packets that follow
a gap. Dropping such a packet leads to padding packet sequence numbers
overlapping with regular packets.
This change does two things though.
- The not relevant packet will still not be sent over the wire. That could
create holes in the sequence number leading to NACKs
- Would the hole cause decode issues? Unclear as making this condition is hard.
Simulating it is not showing issues, but that may not be producing the bad
sequence if any.
Will look at the ability to drop a packet after a gap later.
Seeing a time stamp jump that I am not able to explain.
Basically, it looks like the time stamp doubles at some
point. There is no code which doubles the timestamp.
Can understand an erroneous roll over/wrap around, but
doubling is very strange.
So, logging only audio packets. Will disable as soon
as I have some smaples from canary.
Seeing some unexplained jumps in sender report time stamp
in canary. Wonder if the calculated clock rate is way off
during some interval. Logging clock deviations to understand
better.
* More fine grained filtering NACKs after a key frame.
There are applications with periodic key frame.
So, a packet lost before a key frame will not be retransmitted.
But, decoder could wait (jitter buffer, play out time) and cause
a stutter.
Idea behind disabling NACKs after key frame was another knob to
throttle retransmission bit rate. But, with spaced out retransmissions
and max retransmissions per sequence number, there are throttles.
This would provide more throttling, but affects some applications.
So, disabling filtering NACKs after a key frame.
Introducing another flag to disallow layers. This would still be quite
useful, i. e. under congestion the stream allocator would move the
target lower. But, because of congestion, higher layer would have lost
a bunch of packets. Client would NACK those. Retransmitting those higher
layer packets would congest the channel more. The new flag (default
enabled) would disallow higher layers retransmission. This was happening
before this change also, just splitting out the flag for more control.
* split flag
* Log skew in clock rate.
Remember seeing sender report time stamp moving backward
across mute with replaceTrack(null). Not able to reproduce
it in JS sample app, but have seen it elsewhere.
Logging to understand it better. Wondering if the sender report
should be reset on time stamp moving backward or if we should drop
backwards moving reports.
* set threshold at 20%
* Error log of padding updating highest time to get backtrace.
* Do not update highest time on padding packet.
Padding packets use time stamp of last packet sent.
Padding packets could be sent when probing much after last packet
was sent. Updating highest time on that screws up sender report
calculations. We have ways of making sure sender reports do not
get too out-of-whack, but it logs during that repair.
That repair should be unnecessary unless the source is behaving weird
(things like publisher sending all packets at the same time, publisher
sample rate is incorrect, etc.)
* Log highest time update on padding packet.
Seeing a strange case of what looks like highest time getting
updated on a padding packet. Can't see how it happens in code.
So, logging to check. Will be removing log after checking.
* log sequence number also
* Use 32-bit time stamp to get reference time stamp on a switch.
With relay and dyncast and migration, it is possible that different
layers of a simulcast get out of sync in terms of extended type,
i. e. layer 0 could keep running and its timestamp could have
wrapped around and bumped the extended timestamp. But, another layer
could start and stop.
One possible solution is sending the extended timestamp across relay.
But, that breaks down during migration if publisher has started afresh.
Subscriber could still be using extended range.
So, use 32-bit timestamp to infer reference timestamp and patch it with
expected extended time stamp to derive the extended reference.
* use calculated value
* make it test friendly
Seeing a large positive gap which I am not able to explain.
Wondering if at some other time, a large negative is happening
and the large positive is just a correction.