* Prevent old packets resolution.
With range map, we are just looking up ranges and not exactly
which packets were missing. This caused the case of old packets
being resolved after layer switch.
For example,
- Packet 10 is layer switch, range map gets reset
- Packet 11, 12, 13 are forwarded
- Packet 9 comes, it should ideally be dropped as pre-layer switch old
packet. But, when looking up range map, it gets an offset and hence
gets re-mapped to something before layer switch. This was probably
okay as decoders would have had a key frame at the switch point and
moved ahead, but incorrect technically.
Fix is to reset the start point in the range map to the switch point
and not 0. So, when packet 9 comes, range map will return "key too old"
error and that packet will be dropped as missing from cache.
* fix tests
* Sequencer small optimisations
1. Use range map to exclude padding only packets. Should take lesser
space as we are not using slice to hold pointer to actual data.
2. Avoid `time.Now()` when adding each packet. Just use the arrival time
as it should be close enough. `time.Now()` was showing up in
profile.
* remove debug
* correct comment
* Handle duplicate padding packet in the up stream.
The following sequence would have produce incorrect results
- Sequence number 39 - regular packet - offset = 0
- Sequence number 40 - padding only - drop - offset = 1
- Sequence number 40 - padding only duplicate - was not dropped (this is
the bug) - apply offet - sequence number becomes 39 and clashes with
previous packet
- Sequence number 41 - regular packet - apply offset - goes through as 40.
- Sequence number 40 again - does not get dropped - will pass through as 39.
* fix duplicate dropping
* fix tests
* accept repeat last value as padding injection could cause that
* use exclusion ranges
* more UT and more specific errors
* Remove parked layer feature.
Not worth the added complexity.
Several reasons
- Not seeing black frames on pub mute always.
- If they are there, it can consume more than 30kbps if the parked layer
is high res. That is wasted bandwidth downstream when pub is muted.
- On resume, client some time sends PLI and that triggers a key frame
request.
But, leaving the separate `PubMuted` flag in forwarder in case we can
use it for better handling.
* need the request spatial
* Plug a couple of holes in stream transitions.
1. Missed negative sign meant stealing bits from other tracks was not
working.
2. When a track change (mute, unmute, subscription change) cannot be
allocated, explicitly pause so that stream state update happens.
Refactor stream state update a bit to make it a bit cleaner.
* correct comment
* Check for request layer lock only in the goroutine
* check before sending PLI
* max layer notifier worker
* test cleanup
* clean up
* do notification in the callback
* Push track quality to poor on a bandwidth constrained pause.
* add tests
* scale distance by divisor
* fix test distance to desired
* wait longer for subscription manager to reconcile
* Return max spatial layer from selectors.
With differing requirements of SVC and allowing overshoot in Simulcast,
selectors are best placed to indicate what is the max spatial layer when
they indicate a switch to max spatial layer.
* fix test
* prevent race
* Handle time stamp increment across mute.
Two cases handled
1. Starting on mute could inject blank frame/padding packets.
These time stamps are randomly generated. So, when the publisher
unmutes, the time stamp was jumping ahead by only 1. Make it so
that they jump ahead by elapsed time since starting the blank frames/
padding packets.
2. When generating blank frames at the end of a down track, if
the track was muted at that time, the blank frame time stamps
could have been off (i. e. would have been pointing to time
after the last forwarded frame). Here also use current time
to adjust time stamp. Maybe, this could help in some cases where
we are seeing unflushed video buffer?
* remove unnecessary check
* address feedback and also maintain first synthesized time stamp
* Keep track of expected RTP time stamp and control drift.
- Use monotonic clock in RTCP Sender Report and packet times
- Keep the time stamp close to expected time stamp on layer/SSRC
switches
* clean up
* fix test compile
* more test compile failures
When current became unavailable, it was possible for
target to be set to opportunistic. Because of that,
the downgrade did not happen and PLI layer lock was
requested continuously.
* Do not let request layer overshoot available.
After a layer stopped on publisher side, an optimal allocation side
while initially adjusted to not request the stopped layer, a subsequent
allocation went back to the higher layer although it was stopped.
Prevent that.
* simplify
There are cases where the layer bit rate configuration is such that
the expected bitrate difference is very high. For example,
setting up layer 2 (f) layer for 1.7 Mbps and layer 1 (h) for 180 kbps.
With bitrate based quality, a layer drop results in going to `POOR`
quality rating. With layer based, it will drop one level only.
Also, cleaning up the distance to desired calculation a bit.
* Expected vs actual Layer based connection quality.
With VBR streams (like screen share), bit rate is not a good indicator
of whether desired layer (spatial/temporal) is achieved due to high
variance.
Using expected vs actual layer (i. e. distance to desired) can capture
any short fall and include it in quality scoring.
This PR uses distance to desired, i. e. how many steps it would take to
go from actual spatial/temporal -> desired spatial/temporal and that
distance is propotionally used (currently it is just linear) to decrease
score.
* wire up layer transitions for screen share tracks
* Make connection quality not too optimistic.
With score normalization, the quality indicator showed good
under conditions which should have normally showed some badness.
So, a few things in this PR
- Do not normalize scores
- Pick the weakest link as the representative score (moving away from
averaging)
- For down track direction, when reporting delta stats, take the number
of packets sent actually. If there are holes in the feed (upstream
packet loss), down tracks should not be penalised for that loss.
State of things in connection quality feature
- Audio uses rtcscore-go (with a change to accommodate RED codec). This
follows the E-model.
- Camera uses rtcscore-go. No change here. NOTE: THe rtscore here is
purely based on bits per pixel per frame (bpf). This has the following
existing issues (no change, these were already there)
o Does not take packet loss, jitter, rtt into account
o Expected frame rate is not available. So, measured frame rate is
used as expected frame rate also. If expected frame rate were available,
the score could be reduced for lower frame rates.
- Screen share tracks: No change. This uses the very old simple loss
based thresholding for scoring. As the bit rate varies a lot based on
content and rtcscore video algorithm used for camera relies on
bits per pixel per frame, this could produce a very low value
(large width/height encoded in a small number of bits because of static content)
and hence a low score. So, the old loss based thresholding is used.
* clean up
* update rtcscore pointer
* fix tests
* log lines reformat
* WIP commit
* WIP commit
* update mute of receiver
* WIP commit
* WIP commit
* start adding tests
* take min score if quality matches
* start adding bytes based scoring
* clean up
* more clean up
* Use Fuse
* log quality drop
* Periodically report bitrate to down track.
For connection quality based on bitrate for down tracks,
the measured rate should be used. That is to ensure that
down track quality measurement does not get affected by
publisher side changes negatively (or positively).
Report the optimal bit rate to connection quality scorer
every second so that scorer has a continuously updating
picture of the stream and can compare the actual bit rate
against expected optimal bitrate more reliably.
Doing it at time like allocation, the bitrate may not be
accurate (or may not even be available). So, a periodic update
is necessary.
* add transition at allocation times
* clean up debug log
* - Use number of windows for wait to make things simpler
- track no layer expected case
- always update transition
- always call updateScore
In the following order, got the wrong layer
- Max layer is 0, max published is 0, request layer is 0
- Current locks to 0.
- Max changes to 1. Nothing changes as 1 is not published yet.
- Max published changes to 1.
- As curernt layer is valid, available and locked to request layer, it
was kept. But, it should have checked if the request layer changed
and updated accordingly.
* Do not overshoot when layer is locked.
One more challenging case. When current layer is already locked,
should not set up for overshoot.
* set target to current
Missed this in the last commit. Sorry.
The case of
- locking to a layer
- that layer stops
- re-allocation
This should trigger a key frame request to the max available layer.
So, have to set the request layer to max available.
Addressing edge case where a layer stopped before bitrate could be
measured. Purely bit rate based change deduction missed this as
the before and after did not have bit rates.
Use available layers to look for changes, especially currently
forwarding layer going away.
Also, simplifying bits. Only in the optimal allocation path,
these things are required. When congested, bitrate is always needed.
So, for optimal path, just look at available layer changes and adjust.
Don't need to look for bitrate based layer changes. Clean up that code.
* WIP commit
* Send stream state paused only when it is paused due to bandwidth limitation.
When stream is resumed after a stream is paused, an active update is
sent. Note that this means if there are intervening events like
mute/unmute between pause and resume, resume will be sent.
* WIP commit
* fix compile
* WIP commit
* fixing tests
* clean up exempted layers
* clean up unused stuff
* correct comment
* Don't need ops queue as order is not important now
* static check
* kick off allocation when callbacks are set up, calling from receiver means callbacks may not be set up
* WIP commit
* Send stream state paused only when it is paused due to bandwidth limitation.
When stream is resumed after a stream is paused, an active update is
sent. Note that this means if there are intervening events like
mute/unmute between pause and resume, resume will be sent.
* Notes on wht to do
- Should targetLayers be altered while doing opportunistic locking
- Should targetLayers be altered in any other path than stream allocator path?
- Lock to layer as long as it is <= opportunistic layer
- When not congested, opportunistic can be highest
- When congested, opportunistic could be nil or lowest if paused is not allowed
- When muting, can we hold on to current layers (or keep it as previous) and
restore on unmute.
- Store current/target in forwarder state and restore on seeding
- Watch for looking for targetLayers, etc. when looking to insert padding
packets. There may be an assumption about restarting on key frame and hence
okay to insert padding when target layers are invalid. This may not be true
any more when doing opportunistic forwarding.
- Can we distinguish between publisher mute or dynacast (i. e. publisher side
stopping) vs subscriber mute and do something useful? Publisher side mute
could mean continuity in sequence numbers on a restart (might be able to
catch it with opportunistic forwarding). But, there is the challenge of
unmute from publisher via signalling channel vs media. If media is arriving,
should subscribers do opportunistic forwarding before publisher mute state
update happens?
- Maybe introduce a mode where forwarding continues to a frame end (of course
with a time limit just in case the end of frame packet is lost) and then
insert silence/padding packets?
- Ensure that audio blank frame insertion does not suffer from frame boundary
issues.
* pub/sub mute separate + more notes on things to check
* WIP commit, more notes
* WIP commit
* WIP commit
* WIP commit
* WIP commit
* WIP commit
* WIP commit
* clean up
* slightly better comments
* Do not stop on unmute
* do not inject blank frames when pub muted
* do not forward on audio publisher mute
* WIP commit
* comment
* clean up
* remove unused stuff
* cleaner comment
* remove unused stuff
* remove unused stuff
* more comments
* TrackSender method to handle RTCP sender report data
* fix test
* push rtcp sender report data to down tracks
* Need payload type for codec id mapping in relay protocol
* rename variable a bit
* Notify max layer taking into account overshoot.
An attempt to handle case of FF stopping layer 0, but not layer 1.
When max subscribed layer is layer 0, server allows overshoot to layer 1
to ensure continued streaming when the channel is not congested.
But, dynacast could have reported maximum subscribed layer as layer 0.
This is a very simple attempt to address that by taking overshoot
into account. Needs testing if this works well or not.
NOTE: When subsriber/down track is unmuted, it will report
the max subscribed layer as the max required layer. In those cases,
if the client does not start layer 0, there will still be an issue.
IOW, server is not keeping track of client behaviour that the client
has stopped layer 0 and publishing only layer 1. Server is just
accommodating overshoot with this change.
* Fix a couple of tests and change reflect.DeepEqual -> require.Equal as
much as possible
* Allow overshooting maximum when there are no bandwidth constraints.
Some clients prioritize sending higher layer of video (for e.g.
Firefox). They may get into a state where congestion does not allow
sending lower layers.
That combined with adaptive streaming capping the max layer at a
certain level, it is possible to get into a state where video is
not streamed.
To make this experience better, allow overshoot beyond max layers
in case of optimal (i. e. no congestion) allocation.
NOTE: This means that the video could freeze when there is congestion
even if `AllowPause` is not disabled as in congested state, we do
not overshoot the max layer.
* log about allowing overshoot
* Inject silence opus frames on mute.
If not, under certain circumstances, residual noise
seems to trigger comfort noise generation at the decoder
which is not all that comfortable.
* inject silence only when muting
* Add comment
* augment comment
* Delete blank line
* Adjust TS offset on blank frames
* Remove debug
* Do not modify `lastTS` as it affects timestamp on next switch.
* Trying more stuff for DTX
* - Use a go routine to send blank frames
- Use duration instead of number of frames and calculate number of
frames
* augment comment
* Remove debug
* Return a chan from writeBlankFrameRTP
* Use a long duration for mute
* add comment
* Incorporate suggestion from Jie
Should be okay as that is usually only three elements max.
Stream tracker + manager send available layers. As stream trackers
run in different go routines, need to think some more about chances
of out-of-order operations. But, making a copy in forwarder so
that it is not referncing moving data.
Also, when setting up provisional allocation, make a copy
(that was the intention, i. e. a provisional allocation should have
stable data to work with).
* Inject silence opus frames on mute.
If not, under certain circumstances, residual noise
seems to trigger comfort noise generation at the decoder
which is not all that comfortable.
* inject silence only when muting
* Add comment
* augment comment
* Delete blank line
* Adjust TS offset on blank frames
* Remove debug
* Do not modify `lastTS` as it affects timestamp on next switch.
* use duration to calculate number of blank frames
* log errors in blank frames write path
* Remove `Head` field from `ExtPacket` structure.
Although we do not intend to, but if packets get out-of-order
in the forwarding path (maybe reading in multiple goroutines
or using some worker pool to distribute packets), the `Head`
indicator could lead to wrong behaviour. It is possible that
at the receiver, the order is
- Seq Num N, Head = true
- N + 1, Head = true
If the forwarding path sees `N + 1` first, the Head flag
when it sees `N` packet is incorrect and will lead to incorrect
behaviour.
The alternative check is very simple. So, remove `Head` flag.
* Remove unused field