Commit Graph

224 Commits

Author SHA1 Message Date
Raja Subramanian 9032db857c Connection quality clean up (#766)
* WIP commit

* WIP commit

* Remove debug

* Revert to reduce diff

* Fix tests

* Determine spatial layer from track info quality if non-simulcast

* Adjust for invalid layer on no rid, previously that function was returning 0 for no rid case

* Fall back to top level width/height if there are no layers

* Use duration from RTPDeltaInfo
2022-06-18 21:58:47 +05:30
Raja Subramanian bbd5d3739f Use logger with context (#764) 2022-06-17 11:12:29 +05:30
Raja Subramanian 568d6c5432 Do not munge VP8 header in place. (#763)
As the buffer could be used to send to remote nodes, the publisher
side buffer should be treated as read-only.
2022-06-16 20:25:26 +05:30
cnderrauber d15725c662 publish new codec when subscriber needed (#762)
* publish new codec as needed
2022-06-16 18:16:34 +08:00
Raja Subramanian 83b71fce50 Set SSRC for RTCP (#760)
- SenderSSRC was not set for NACK, RTCP_RR - so SRTP context
was using SSRC = 0 which is not bad, but let us set the SSRC properly.
- PLI was using a random SSRC on every PLI. So, that would have
created a new SRTP stream (not bad as that stream context is small)
on every PLI. It is wasteful. So, set the SenderSSRC to the mediaSSRC.
- Reduce re-start of higher layers to 10 seconds. That is long enough
to declare that a stream layer has restarted.
2022-06-16 12:07:21 +05:30
cnderrauber 15da445fd7 fix subscription update on codec change (#757) 2022-06-10 12:31:34 +08:00
Raja Subramanian 4701119885 Proto clone VideoLayer (#756)
Otherwise, there are warnings about copying locks.
2022-06-09 09:32:57 +05:30
Raja Subramanian e72fd80ca0 Move rtp stats log to info (#752) 2022-06-06 22:20:48 +05:30
shishirng 2927521b8b Send publisher max layer change info to telemetry (#748)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-06-02 14:15:23 -04:00
cnderrauber 6ba034feae shorten the time cost for subscriber get media tracks (#747)
* set DownTrack's initial codec to first codec of potential codecs

* faststart on subscribe
2022-06-02 10:00:42 +08:00
cnderrauber 62308b2ce0 set DownTrack's initial codec to first codec of potential codecs (#740) 2022-05-30 16:34:55 +08:00
Raja Subramanian 10b0c9b9ff Allow overshooting maximum when there are no bandwidth constraints. (#739)
* Allow overshooting maximum when there are no bandwidth constraints.

Some clients prioritize sending higher layer of video (for e.g.
Firefox). They may get into a state where congestion does not allow
sending lower layers.

That combined with adaptive streaming capping the max layer at a
certain level, it is possible to get into a state where video is
not streamed.

To make this experience better, allow overshoot beyond max layers
in case of optimal (i. e. no congestion) allocation.

NOTE: This means that the video could freeze when there is congestion
even if `AllowPause` is not disabled as in congested state, we do
not overshoot the max layer.

* log about allowing overshoot
2022-05-30 13:51:59 +05:30
David Zhao 73808a1623 Remove TrackSender.ParticipantIdentity since it's unused (#738)
Just introduced this in #736 and realized the interface change comes
with a bunch of unintended consequences, so reverting this addition.
2022-05-29 23:22:03 -07:00
David Zhao 7ad51f49f1 Fixed unclean DownTrack close when removed before bound. (#736)
* Fixed unclean DownTrack close when removed before bound.

When a DownTrack is closed before it had a chance to be bound to a
transceiver, we'd skip close and leave it hanging. This is unlikely in
normal operations. However, it can be seen with permissions and
subscription APIs.

* remove remaining peerID references
2022-05-29 22:09:02 -07:00
shishirng cb9f0d37c2 Use rtcscore-go to calculate audio/video score (#689)
* Use rtcscore-go to calculate audio/video score

Signed-off-by: shishir gowda <shishir@livekit.io>

* Get max expected layer and find max actual layer from stream

Signed-off-by: shishir gowda <shishir@livekit.io>

* Cleanup unused methods

Signed-off-by: shishir gowda <shishir@livekit.io>

* Cleanup code - address review comments

Signed-off-by: shishir gowda <shishir@livekit.io>

* get expected layer info instead of just quality

Signed-off-by: shishir gowda <shishir@livekit.io>

* Move SpatialLayerForQuality to utils/helpers

method is required in rtc,sfu and connectionstats pkg
Moved to utils/helpers.go to remove cyclic deps

Signed-off-by: shishir gowda <shishir@livekit.io>

* update tests

Signed-off-by: shishir gowda <shishir@livekit.io>

* Pick stream stats with max layer

Signed-off-by: shishir gowda <shishir@livekit.io>

* Update rtcscore-go pkg to make rtt/jitter optional

when passing 0, rtcscore-go was setting default values

Signed-off-by: shishir gowda <shishir@livekit.io>

* update score to rating

Signed-off-by: shishir gowda <shishir@livekit.io>

* Update rtcscore-go pkg to use simulcast layer info for score

Signed-off-by: shishir gowda <shishir@livekit.io>

* Update score ratings to reflect rtcscore range

Signed-off-by: shishir gowda <shishir@livekit.io>

* update test params for new rtcscore

Signed-off-by: shishir gowda <shishir@livekit.io>

* Delay sending scores to connections only till full data is available

first interval can have partial data leading to lower scores

Signed-off-by: shishir gowda <shishir@livekit.io>

* Check for inf values in quality params

Signed-off-by: shishir gowda <shishir@livekit.io>

* Clean up initial score calculation. Default to 5

Signed-off-by: shishir gowda <shishir@livekit.io>

Co-authored-by: David Zhao <dz@livekit.io>
2022-05-27 14:58:26 -04:00
Raja Subramanian a1c7332943 Use a shadow for receivers for safe range (#725)
* Use a shadow for receivers for safe range

* RLock layerSSRCs

* access max subscribed quality inside lock

* Fix go vet errors

* use defer
2022-05-27 22:11:34 +05:30
cnderrauber f958fbcc1c simulcast codecs support (#720)
simulcast codecs support 

Co-authored-by: David Zhao <dz@livekit.io>
2022-05-27 19:55:50 +08:00
Raja Subramanian d77a8ad916 Use proper type cast (#719) 2022-05-26 13:58:46 +05:30
Raja Subramanian 77b8e9eecb Fix a couple of more races (#717)
* Use grants clone

* Fix a couple of more races

Use a shadow copy of down tracks in DownTrackSpreader.
Read always uses the shadow.
On Add/Delete of down track, make a new copy.
Copying is done only on add/delete.
If somebody is holding reference to a shadow, it will be in tact as Add/Delete create a new slice.

With this, not seeing any more races in test. So, enabling CI tests with `-race`.

Also fixing another race reported in #603

There are a couple of more races in that bug report that needs to be
chased down.

* Use env suggested in https://lifesaver.codes/answer/runtime-race-detector-sigabrt-or-sigsegv-on-macos-monterey-49138

* staticcheck, did not fail locally, but reported by CI

* use API to get down tracks
2022-05-25 16:14:07 +05:30
Raja Subramanian 11fb079a4f Catching a few more races reported by go test -race ./... (#713)
* Fix a few more races

* Make sure room store on leave has correct count

* Revert logger change to reduce diff
2022-05-25 07:32:25 +05:30
Raja Subramanian 9d0e4375f2 Use blank frames at frame rate (#710)
* Inject silence opus frames on mute.

If not, under certain circumstances, residual noise
seems to trigger comfort noise generation at the decoder
which is not all that comfortable.

* inject silence only when muting

* Add comment

* augment comment

* Delete blank line

* Adjust TS offset on blank frames

* Remove debug

* Do not modify `lastTS` as it affects timestamp on next switch.

* Trying more stuff for DTX

* - Use a go routine to send blank frames
- Use duration instead of number of frames and calculate number of
  frames

* augment comment

* Remove debug

* Return a chan from writeBlankFrameRTP

* Use a long duration for mute

* add comment

* Incorporate suggestion from Jie
2022-05-24 13:25:19 +05:30
Raja Subramanian 33032f6c4b Fix some test races and other things found with go test -race (#711) 2022-05-24 10:16:36 +05:30
Raja Subramanian a1e264bf34 Make a copy of available layers in forwarder. (#709)
Should be okay as that is usually only three elements max.

Stream tracker + manager send available layers. As stream trackers
run in different go routines, need to think some more about chances
of out-of-order operations. But, making a copy in forwarder so
that it is not referncing moving data.

Also, when setting up provisional allocation, make a copy
(that was the intention, i. e. a provisional allocation should have
stable data to work with).
2022-05-23 17:41:19 +05:30
Raja Subramanian 96b095504b Fix some races reported by go -race (#706)
* Fix some races reported by go -race

* Avoid copy
2022-05-23 11:51:13 +05:30
Raja Subramanian 547ab908f5 Coalesce track events into stream allocator. (#701) 2022-05-21 00:40:02 +05:30
Raja Subramanian 8d0998ed1d Bump up stream allocator event queue size (#700)
This probably needs more work
- Maybe some events should not be queued
- Reduce number of events

Bump up the size any way so that the queue does not overflow
when subscribing to a lot of tracks.
2022-05-20 09:54:10 +05:30
Raja Subramanian d828e0fbd8 Inject silence opus frames on mute (#682)
* Inject silence opus frames on mute.

If not, under certain circumstances, residual noise
seems to trigger comfort noise generation at the decoder
which is not all that comfortable.

* inject silence only when muting

* Add comment

* augment comment

* Delete blank line

* Adjust TS offset on blank frames

* Remove debug

* Do not modify `lastTS` as it affects timestamp on next switch.

* use duration to calculate number of blank frames

* log errors in blank frames write path
2022-05-15 11:29:40 +05:30
cnderrauber fa53da18e7 update pion, export vp8munger header (#684)
* update pion, export vp8munger header

* fix test
2022-05-13 17:20:52 +08:00
Raja Subramanian 710e06b7da Tweak screen share stream tracker. (#683)
* Tweak screen share stream tracker.

Screen share uses extremely low frame rate when content is static.
This leads to stream tracker changes and layer switches.
Use a longer measurement interval for screen share tracks.
Setting it to 4 seconds. The screen share frame rate is
approx. 1 fps and with two temporal layers, it should get
a frame every 2 seconds or so in every layer. But, that
does not work. Even three second measurement window was
reporting 0 rate. So, it is getting clumped. 4 seconds works
fine.

Also modifying the bit rate availability change to trigger
on a temporal layer becoming available or going away.

With these changes, on a local test, not seeing any unnecessary
PLIs from client and hence no unnecessary key frames from publisher.

* Fix test
2022-05-13 10:18:18 +05:30
David Zhao bd7e3beda4 Improve frequency of stats update (#673)
* Improve frequency of stats update

Prometheus stats are updated as the data becomes available, instead of
aggregated along with telemetry batches. Node availability decisions can
now react much faster to these stats.

* use the same intervals for connection quality updates
2022-05-09 08:55:06 -07:00
Raja Subramanian 16407ea180 Change state to JOINED before sending JoinResponse (#674)
* Send room metadata as long as participant is not disconnected

* Change state to JOINED before sending join response
2022-05-09 13:28:56 +05:30
cnderrauber 07b93e2e5b add support for av1 svc (#669)
* add support for av1 svc
2022-05-06 18:24:29 +08:00
cnderrauber 7f38164ef6 fix twcc panic on packet lost (#668) 2022-05-05 17:04:02 +08:00
Raja Subramanian 0d8848cfcd Do not count padding packets in stream tracker. (#667)
* Do not count padding packets in stream tracker.

There are cases where publisher uses padding only packets
in a layer to probe the channel. Treating those as valid
layer packets makes stream tracker declare that the layer
is active where in reality, it is not.

Also, removing check for out-of-order packets. Out-of-order
packets can happen and should be counted in bitrate calculation.

There is the extreme case of heavy loss which might skew it, but
that is an extreme case.

* Fix test
2022-05-05 13:09:59 +05:30
Raja Subramanian 1665f51bd0 Stats of NACKs acked and number of repeated NACKs. (#664)
* Stats of NACKs acked and number of repeated NACKs.

Also making a change in delta stat to drop negative packet loss
counts to 0. Because of windowing it is a legitimate case.
The receiver could have seen a loss in window we are measuring
and in the subsequent window, the receiver could have gotten
a retransmission and reduced the packet loss count resulting
in a negative delta. When we report negative delta, it could
get dropped by analytics validator. That will be lost data.
Avoid that.

* Remove unused code

* Pick up latest protocol
2022-05-02 22:43:21 +05:30
Raja Subramanian d863b45dc1 Remove Head field from ExtPacket structure. (#662)
* Remove `Head` field from `ExtPacket` structure.

Although we do not intend to, but if packets get out-of-order
in the forwarding path (maybe reading in multiple goroutines
or using some worker pool to distribute packets), the `Head`
indicator could lead to wrong behaviour. It is possible that
at the receiver, the order is
- Seq Num N, Head = true
- N + 1, Head = true

If the forwarding path sees `N + 1` first, the Head flag
when it sees `N` packet is incorrect and will lead to incorrect
behaviour.

The alternative check is very simple. So, remove `Head` flag.

* Remove unused field
2022-05-02 10:16:17 +05:30
David Zhao 289d63ac53 Fix node ip parameter not being used (#661)
* Use node-ip if provided

* formatting
2022-04-29 15:26:37 -07:00
Raja Subramanian ea61b588a2 Simplifying SN info cache in RTPStats module (#660)
* Simplifying SN info cache in RTPStats module

* Remove unnecessary field
2022-04-28 18:48:59 +05:30
Raja Subramanian ca6ad7ec7a Use seq num offsets cache instead of missing seq num map. (#658)
* Use seq num offsets cache instead of missing seq num map.

Map operations can be costly. Use a fixed size array to store offsets.
4096 sequence numbers should be more than 16 seconds for 720p video
which should be plenty to look up offset of out-of-order packets.
Packets out-of-order more than that will probably be useless anyway.

* Move offset cache population to only when we are forwarding

* add some debug logs

* Remove debug
2022-04-28 15:16:42 +05:30
Raja Subramanian a83bd5c2f6 Split out load balancer into a separate module (#657) 2022-04-26 15:46:30 +05:30
Raja Subramanian 2f902a3edc Remove callbacks queue from sfu/DownTrack (#655)
* Remove callbacks queue from sfu/DownTrack

- Connection stats callback was happening in connection stats go routine
- RTT update launches a goroutine on the receive side as it affects the
  subscriber. So, no need to queue it.
- Changed two things
  o Move close handler to goroutine. It is better that way as it touches
the subscriber as well
  o Move max layer handling into a goroutine also so that the callback
does minimal work.

With this all the send side callback queues are removed.

* small clean up
2022-04-25 13:28:24 +05:30
Raja Subramanian f0f66a2d75 Remove ops queue from stream tracker. (#652)
It is not needed as the callback happens in a goroutine.
Also, had missed included bitrate availability change in the queue
which was inconsistent.
2022-04-25 12:51:29 +05:30
Raja Subramanian 4d7950ec2f Do not send down track callbacks to streamallocator via queue (#651) 2022-04-24 21:41:58 +05:30
Raja Subramanian 0a802077dd Remove callbacks queue from sfu/buffer. (#650) 2022-04-24 21:39:28 +05:30
Raja Subramanian d38566850a Do not post close callback in ops queue if not started. (#649)
* Do not post close callback in ops queue if not started.

Ops queue is started in `Bind()`. If `Close()` is called
when bind did not happen (because the underlying peer connection
closed before bind), the close callback does not run.

Check if ops queue is running before posting close callback
into the queue.

Not pretty, but covers this case. Need to think about it more.

* correct check
2022-04-24 11:40:22 +05:30
Raja Subramanian 2e182afb61 Reduce memory used by RTPStats. (#645)
Keep track of last 2048 sequence numbers instead of full range
to reduce memory usage.
2022-04-22 17:17:37 +05:30
Raja Subramanian 2b6a304b27 Increase size of RTCP channel just to be safe. (#641)
Also, using select to log warnings when queue is full.
2022-04-21 12:21:21 +05:30
Raja Subramanian 43d0573693 Moving smoothing into the audio level module. (#636) 2022-04-20 23:59:51 +05:30
Raja Subramanian 6a53891f9f Process header extensions in line (#635)
* WIP commit

* Pass audio config

* Fix test compile
2022-04-20 18:20:28 +05:30
Raja Subramanian 1fe485ef0c Use cache for packet level processing. (#632)
* Use cache for packet level processing.

Do not overload ops queue for packet level operations.

* byte -> uint8
2022-04-20 13:19:10 +05:30