The buffer is not for padding packets. So, calculate
adjusted sequence numbers before comparing against size.
Also, it is possible that invalidated slot is accessed
due to not being able to exclude padding range. This was
causing time stamp reset to 0. Will remove the error log
after this goes out and the condition does not show up
for a few days.
Seeing a time stamp jump that I am not able to explain.
Basically, it looks like the time stamp doubles at some
point. There is no code which doubles the timestamp.
Can understand an erroneous roll over/wrap around, but
doubling is very strange.
So, logging only audio packets. Will disable as soon
as I have some smaples from canary.
Seeing some unexplained jumps in sender report time stamp
in canary. Wonder if the calculated clock rate is way off
during some interval. Logging clock deviations to understand
better.
* More fine grained filtering NACKs after a key frame.
There are applications with periodic key frame.
So, a packet lost before a key frame will not be retransmitted.
But, decoder could wait (jitter buffer, play out time) and cause
a stutter.
Idea behind disabling NACKs after key frame was another knob to
throttle retransmission bit rate. But, with spaced out retransmissions
and max retransmissions per sequence number, there are throttles.
This would provide more throttling, but affects some applications.
So, disabling filtering NACKs after a key frame.
Introducing another flag to disallow layers. This would still be quite
useful, i. e. under congestion the stream allocator would move the
target lower. But, because of congestion, higher layer would have lost
a bunch of packets. Client would NACK those. Retransmitting those higher
layer packets would congest the channel more. The new flag (default
enabled) would disallow higher layers retransmission. This was happening
before this change also, just splitting out the flag for more control.
* split flag
* Log skew in clock rate.
Remember seeing sender report time stamp moving backward
across mute with replaceTrack(null). Not able to reproduce
it in JS sample app, but have seen it elsewhere.
Logging to understand it better. Wondering if the sender report
should be reset on time stamp moving backward or if we should drop
backwards moving reports.
* set threshold at 20%
* Error log of padding updating highest time to get backtrace.
* Do not update highest time on padding packet.
Padding packets use time stamp of last packet sent.
Padding packets could be sent when probing much after last packet
was sent. Updating highest time on that screws up sender report
calculations. We have ways of making sure sender reports do not
get too out-of-whack, but it logs during that repair.
That repair should be unnecessary unless the source is behaving weird
(things like publisher sending all packets at the same time, publisher
sample rate is incorrect, etc.)
* Log highest time update on padding packet.
Seeing a strange case of what looks like highest time getting
updated on a padding packet. Can't see how it happens in code.
So, logging to check. Will be removing log after checking.
* log sequence number also
* Use 32-bit time stamp to get reference time stamp on a switch.
With relay and dyncast and migration, it is possible that different
layers of a simulcast get out of sync in terms of extended type,
i. e. layer 0 could keep running and its timestamp could have
wrapped around and bumped the extended timestamp. But, another layer
could start and stop.
One possible solution is sending the extended timestamp across relay.
But, that breaks down during migration if publisher has started afresh.
Subscriber could still be using extended range.
So, use 32-bit timestamp to infer reference timestamp and patch it with
expected extended time stamp to derive the extended reference.
* use calculated value
* make it test friendly
Seeing a large positive gap which I am not able to explain.
Wondering if at some other time, a large negative is happening
and the large positive is just a correction.
* Log potential sequence number de-sync in receicer reports.
Seeing some cases of a roll over being missed. That ends up
as largish range to search in an interval and reports missing packets
in the packet metadata cache.
Logging some details.
* just log in one place
Ideally, can remove the nil return when there are too many packets
as we have more information with extended sequence numbers, but
logging duration first to understand what is happening better.
When converting from RED -> Opus, if there is a loss, SFU recovers
that loss if it can using a subsequent redundant packet. That path
was not setting the extended sequence number properly.
Also, ensuring use of monotonic clock for first packet time adjustment
also.
* Cap expected packets to padding diff.
On the receiver, no longer using packet metadata cache to calculate
interval stats. An optimisation to get rid of packet metadata cache
on receiver side.
Because of that, padding packets in an interval could be more than
expected packets. As padding packets is just a counter, out-of-order
padding packets will make the diff look larger than expected packets
in a window. Cap the expected to 0.
NOTE: This makes it so that the count is not accurate in a window,
but that is okay occasionally. It will affect reported stats and quality
calculations, but it should be rare. For example, if 30 packets were
received in a window and 60 out-of-order padding packets were received,
it would reported as 0 packets were received. One option is to not
increment padding packets when they are out-of-order, but that will mess
up overall stats. Will make that change if we see this happen a lot.
* log unexpected padding packets
* Log resync at Infow.
Seeing potentially large sequence number jumps on a resync.
And it seems to happen on a lot of subscribe/unsubscribe.
Logging at Infow to understand better.
Probably need to find a way to avoid resync. But, logging for now to
check if I can catch one.
* Remove resync and log large sequence number jumps
* Use bit map.
Also, duplicate packet detection is impoetant for dropping padding
only packets at the publisher side itself. In the last PR, mentioned
that it is only for stats.
* clean up
* Update deps
* Reduce packet meta data cache - part 1
Packet meta data cache takes a good amount of space.
That cache is 8K entries deep and each entry is 8 bytes.
So, that takes 64KB per RTP stream.
It is mostly needed for down stream to line up with receiver reports.
So, removing cache from up stream (RTPStatsReceiver) as part 1.
Will look at optimising the down stream in part 2.
* Remove caching from RTPStatsReceiver
* clean up a bit more
* maintain history and fix test
* Throttle packet errors/warns a bit.
In very bad network conditions, it is possible that packets
arrive out-of-order, have choppy behaviour.
Use some counters and temper logs.
* slight change in comment
Streaming could start after 16-bits has rolled over. So, have to add
that base back to what is received in receiver report.
Otherwise, it looks like there are not packets received in window
leading to poor quality.
* Split RTPStats into receiver and sender.
For receiver, short types are input and need to calculate extended type.
For sender (subscriber), it can operate only in extended type.
This makes the subscriber side a little simpler and should make it more
efficient as it can do simple comparisons in extended type space.
There was also an issue with subscriber using shorter type and
calculating extended type. When subscriber starts after the publisher
has already rolled over in sequence number OR timestamp, when
subsequent publisher side sender reports are used to adjust subscriber
time stamps, they were out of whack. Using extended type on subscriber
does not face that.
* fix test
* extended types from sequencer
* log
* Fix time stamp adjustment when starting with dmummy packets.
- Populated extended values in ExtPacket on dummy packet.
- Have to pass reference time stamp offset to first packet time
adjustment.
* display participant version info
* Sequencer small optimisations
1. Use range map to exclude padding only packets. Should take lesser
space as we are not using slice to hold pointer to actual data.
2. Avoid `time.Now()` when adding each packet. Just use the arrival time
as it should be close enough. `time.Now()` was showing up in
profile.
* remove debug
* correct comment
Profiling showed updating jitter going through the snapshot maps.
With the reduction of one, there should only be one snapshot
and hopefully that should gain some cycles back.
* Do NACKs and reports always.
With padding packet drops, it is possible that a lot of packets
go by without RTCP RR.
Do NACKs and RTCP RR always.
* remove local variable
Seeing the case of a stream starting with padding packets
on migration. As publisher in that case is always sending packets.
it is possible that the new node gets padding packets at the start.
Processing them in buffer leads to trying to drop that padding packet
and adding an exclusion range. That fails because the extended
sequence number is not available for unprocessed packets.
It is okay to drop them as they will be dropped anyway.
But, they are useful for bandwidth estimation. So, headers are processed
even if the packet is RTPStats unprocessed.
* Handle duplicate padding packet in the up stream.
The following sequence would have produce incorrect results
- Sequence number 39 - regular packet - offset = 0
- Sequence number 40 - padding only - drop - offset = 1
- Sequence number 40 - padding only duplicate - was not dropped (this is
the bug) - apply offet - sequence number becomes 39 and clashes with
previous packet
- Sequence number 41 - regular packet - apply offset - goes through as 40.
- Sequence number 40 again - does not get dropped - will pass through as 39.
* fix duplicate dropping
* fix tests
* accept repeat last value as padding injection could cause that
* use exclusion ranges
* more UT and more specific errors