* Some logging changes.
Trying to chase a case of large sequence number gap on subscriber side
where packets are sent after a long time.
* return values instead of logging
Packets not being forwarded were getting included in forwarding stats
calculation and skewing the measurement towards a smaller number.
The latency measurement does not include the batch IO of packets on
send. With a 2ms batching, that will add an average latency of 1ms.
Seeing cases of `ConnectionTimeout` and `ResponseTimeout`.
So, logging destination identity in RPC request and also logging ACK and
response. Will pare back logs/log level of these messages after gettnig
some data.
Also a small change I noticed and had sitting in my local tree to set
the previous RTP marker on a padding packet.
- When audio is muted, server injects silence frames which moves the
time stamp forward and adjusts offset. That cannot be used against
publisher side sender report. Use a pinned version.
- Ignore small changes to propagation delay even while checking for
sharp increase. That is spamming a lot for small changes, i.e.
existing delta is 100 micro seconds or so and the new one is 300 micro
seconds. Also rename to `longTerm` from `smoothed` as it is a slow
varying long term estimate of propagation delay delta. And slow down
that adaptation more.
* Forward publisher sender report.
Publisher side RTCP sernfer report is rebased to SFU time base
and used to send sender rerport to subscriber.
Will wait to merge till previous versions are out as this will require a
bunch of testing.
* - Add rebased report drift
- update protocol dep
- fix path change check, it has to check against delta of propagation
delay and not propagation delay as the two side clocks could be way
off.
A few things
- Log PLI requests from client.
- Pass in marker to RTP munger as SVC can insert marker.
- Adjusting first packet time should be aware of SVC as there is single
stream in SVC
Seeing a time stamp jump that I am not able to explain.
Basically, it looks like the time stamp doubles at some
point. There is no code which doubles the timestamp.
Can understand an erroneous roll over/wrap around, but
doubling is very strange.
So, logging only audio packets. Will disable as soon
as I have some smaples from canary.
* Error log of padding updating highest time to get backtrace.
* Do not update highest time on padding packet.
Padding packets use time stamp of last packet sent.
Padding packets could be sent when probing much after last packet
was sent. Updating highest time on that screws up sender report
calculations. We have ways of making sure sender reports do not
get too out-of-whack, but it logs during that repair.
That repair should be unnecessary unless the source is behaving weird
(things like publisher sending all packets at the same time, publisher
sample rate is incorrect, etc.)
* Use 32-bit time stamp to get reference time stamp on a switch.
With relay and dyncast and migration, it is possible that different
layers of a simulcast get out of sync in terms of extended type,
i. e. layer 0 could keep running and its timestamp could have
wrapped around and bumped the extended timestamp. But, another layer
could start and stop.
One possible solution is sending the extended timestamp across relay.
But, that breaks down during migration if publisher has started afresh.
Subscriber could still be using extended range.
So, use 32-bit timestamp to infer reference timestamp and patch it with
expected extended time stamp to derive the extended reference.
* use calculated value
* make it test friendly
* Prevent old packets resolution.
With range map, we are just looking up ranges and not exactly
which packets were missing. This caused the case of old packets
being resolved after layer switch.
For example,
- Packet 10 is layer switch, range map gets reset
- Packet 11, 12, 13 are forwarded
- Packet 9 comes, it should ideally be dropped as pre-layer switch old
packet. But, when looking up range map, it gets an offset and hence
gets re-mapped to something before layer switch. This was probably
okay as decoders would have had a key frame at the switch point and
moved ahead, but incorrect technically.
Fix is to reset the start point in the range map to the switch point
and not 0. So, when packet 9 comes, range map will return "key too old"
error and that packet will be dropped as missing from cache.
* fix tests
* Sequencer small optimisations
1. Use range map to exclude padding only packets. Should take lesser
space as we are not using slice to hold pointer to actual data.
2. Avoid `time.Now()` when adding each packet. Just use the arrival time
as it should be close enough. `time.Now()` was showing up in
profile.
* remove debug
* correct comment
* Handle duplicate padding packet in the up stream.
The following sequence would have produce incorrect results
- Sequence number 39 - regular packet - offset = 0
- Sequence number 40 - padding only - drop - offset = 1
- Sequence number 40 - padding only duplicate - was not dropped (this is
the bug) - apply offet - sequence number becomes 39 and clashes with
previous packet
- Sequence number 41 - regular packet - apply offset - goes through as 40.
- Sequence number 40 again - does not get dropped - will pass through as 39.
* fix duplicate dropping
* fix tests
* accept repeat last value as padding injection could cause that
* use exclusion ranges
* more UT and more specific errors
* Handle time stamp increment across mute.
Two cases handled
1. Starting on mute could inject blank frame/padding packets.
These time stamps are randomly generated. So, when the publisher
unmutes, the time stamp was jumping ahead by only 1. Make it so
that they jump ahead by elapsed time since starting the blank frames/
padding packets.
2. When generating blank frames at the end of a down track, if
the track was muted at that time, the blank frame time stamps
could have been off (i. e. would have been pointing to time
after the last forwarded frame). Here also use current time
to adjust time stamp. Maybe, this could help in some cases where
we are seeing unflushed video buffer?
* remove unnecessary check
* address feedback and also maintain first synthesized time stamp
It was possible that the adjustment applied in the middle
of a frame resulting in the same frame having multiple time stamps.
That would have caused video to pause/jump.
Apply the offset only at the start of the frame so that all
packets of a frame get the same offset.
* Experimental flag to try time stamp adjustment to control drift.
There is a config to enable this.
Using a PID controller to try and keep the sample rate at expected
value. Need to be seen if this works well. Adjustment are limited
to 25 ms max at a time to ensure there are no large jumps.
And it is applied when doing RTCP sender report which happens
once in 5 seconds currently for both audio and video tracks.
A nice introduction to PID controllers - https://alphaville.github.io/qub/pid-101/#/
Implementation borrowed from - https://github.com/pms67/PID
A few things TODO
1. PID controller tuning is a process. Have picked values from test from
that implementation above. May not be the best. Need to try.
2. Can potentially run this more often. Rather than running it only when
running RTCP sender report (which is once in 5 seconds now), can
potentially run it every second and limit the amount of change to
something like 10 ms max.
* remove unused variable
* debug log a bit more
* Ensure sequence number continuity
When using Go SDK (livekit-cli or egress) as a client,
SFU sends blank frames when audio track is muted to ensure that
Pion OnTrack fires on GoSDK side. That resulted in a huge sequence
number/time stamp jump when the real stream started.
Ensure continuity by creating random sequence number/time stamp when
starting with a blank frames. And when sequence number/time stamp is
initialized using SetLastSnTs, continue sequence if it was already
initialized.
* remove debug
* Notes on wht to do
- Should targetLayers be altered while doing opportunistic locking
- Should targetLayers be altered in any other path than stream allocator path?
- Lock to layer as long as it is <= opportunistic layer
- When not congested, opportunistic can be highest
- When congested, opportunistic could be nil or lowest if paused is not allowed
- When muting, can we hold on to current layers (or keep it as previous) and
restore on unmute.
- Store current/target in forwarder state and restore on seeding
- Watch for looking for targetLayers, etc. when looking to insert padding
packets. There may be an assumption about restarting on key frame and hence
okay to insert padding when target layers are invalid. This may not be true
any more when doing opportunistic forwarding.
- Can we distinguish between publisher mute or dynacast (i. e. publisher side
stopping) vs subscriber mute and do something useful? Publisher side mute
could mean continuity in sequence numbers on a restart (might be able to
catch it with opportunistic forwarding). But, there is the challenge of
unmute from publisher via signalling channel vs media. If media is arriving,
should subscribers do opportunistic forwarding before publisher mute state
update happens?
- Maybe introduce a mode where forwarding continues to a frame end (of course
with a time limit just in case the end of frame packet is lost) and then
insert silence/padding packets?
- Ensure that audio blank frame insertion does not suffer from frame boundary
issues.
* pub/sub mute separate + more notes on things to check
* WIP commit, more notes
* WIP commit
* WIP commit
* WIP commit
* WIP commit
* WIP commit
* WIP commit
* clean up
* slightly better comments
* Do not stop on unmute
* do not inject blank frames when pub muted
* do not forward on audio publisher mute
* Cache RTPStats and seed on re-use
When a cached down track is re-used, RTPStats was not cached.
This caused sender reports getting out-of-sync with the remote side.
Cache RTPStats and seed it on re-use.
* staticcheck
* WIP commit
* Clean up
* spelling mistake
* Run subscribed track onBind in a go routine
* Address comments and more safety net
* Cache and restore forwarder state on resume
* conflicts
* mage generate
* Inject silence opus frames on mute.
If not, under certain circumstances, residual noise
seems to trigger comfort noise generation at the decoder
which is not all that comfortable.
* inject silence only when muting
* Add comment
* augment comment
* Delete blank line
* Adjust TS offset on blank frames
* Remove debug
* Do not modify `lastTS` as it affects timestamp on next switch.
* Trying more stuff for DTX
* - Use a go routine to send blank frames
- Use duration instead of number of frames and calculate number of
frames
* augment comment
* Remove debug
* Return a chan from writeBlankFrameRTP
* Use a long duration for mute
* add comment
* Incorporate suggestion from Jie
* Remove `Head` field from `ExtPacket` structure.
Although we do not intend to, but if packets get out-of-order
in the forwarding path (maybe reading in multiple goroutines
or using some worker pool to distribute packets), the `Head`
indicator could lead to wrong behaviour. It is possible that
at the receiver, the order is
- Seq Num N, Head = true
- N + 1, Head = true
If the forwarding path sees `N + 1` first, the Head flag
when it sees `N` packet is incorrect and will lead to incorrect
behaviour.
The alternative check is very simple. So, remove `Head` flag.
* Remove unused field
* Use seq num offsets cache instead of missing seq num map.
Map operations can be costly. Use a fixed size array to store offsets.
4096 sequence numbers should be more than 16 seconds for 720p video
which should be plenty to look up offset of out-of-order packets.
Packets out-of-order more than that will probably be useless anyway.
* Move offset cache population to only when we are forwarding
* add some debug logs
* Remove debug