Commit Graph

1329 Commits

Author SHA1 Message Date
David Zhao f3d05a9068 Do not sample per participant to reduce memory usage 2023-04-03 15:08:27 -07:00
Paul Wells 6107c002ae fix signal client message buffer size (#1561)
* fix signal client message buffer size

* update psrpc dep
2023-03-29 16:39:44 -07:00
David Colburn 191a9e8014 update core to 0.0.5 (#1540)
* update core

* sort imports

* fix typos

* redundant types
2023-03-22 16:53:23 -07:00
Raja Subramanian 0ea88e4025 Ensure sequence number continuity (#1539)
* Ensure sequence number continuity

When using Go SDK (livekit-cli or egress) as a client,
SFU sends blank frames when audio track is muted to ensure that
Pion OnTrack fires on GoSDK side. That resulted in a huge sequence
number/time stamp jump when the real stream started.

Ensure continuity by creating random sequence number/time stamp when
starting with a blank frames. And when sequence number/time stamp is
initialized using SetLastSnTs, continue sequence if it was already
initialized.

* remove debug
2023-03-22 23:08:26 +05:30
Raja Subramanian 23c03f6add Fix av1 forwarding. (#1538) 2023-03-22 15:35:24 +05:30
Raja Subramanian e7c5872758 Dependent RTT/jitter control. (#1537) 2023-03-22 11:59:32 +05:30
Raja Subramanian f782c8956d Extend range of GOOD scores. (#1536)
Empirically, the experience is not bad for a larger range.
So, triggering POOR too early causes confusion.
2023-03-22 11:36:30 +05:30
Raja Subramanian c76c35474c Init RTT/jitter in snapshot, else get 0 some times (#1534) 2023-03-21 11:50:47 +05:30
cnderrauber 1a78dba3e0 Detect client short ice connection (#1532) 2023-03-21 09:50:43 +08:00
David Colburn e8c7506d60 update deprecated egress client warning (#1533) 2023-03-20 13:46:47 -07:00
Raja Subramanian 65ad4b2c43 Doing a pass at demoting logs (#1531)
A few more candidates to think about demoting
- Publisher mute changes
- Forwarder -> layer lock/upgrade/downgrade/overshoot adjusting
- StreamAllocator
2023-03-20 12:22:08 +05:30
Raja Subramanian f770f0cb67 Use pointer to struct in logging (#1530) 2023-03-19 21:57:35 +05:30
Raja Subramanian aeefbb080e Account for time before measurement available in connection quality. (#1528) 2023-03-19 18:34:56 +05:30
Raja Subramanian bbba3f8168 With opportunistic forwarding, no need to not remove layer 0 (#1529) 2023-03-19 18:19:15 +05:30
Raja Subramanian 7d857c9557 Discount upstream + processing jitter from down stream jitter. (#1527)
* Discount upstream + processing jitter from down stream jitter.

Jitter in RTCP Receiver Report from down stream tracks includes
jitter from up stream tracks and any processing in forwarding path.
As packets are forwarded without any buffering (i. e. no de-jittering)
in the SFU, any up stream jitter will carry forward.

While taking delta stats (which is used for connection quality and
reporting to analytics), discount the up stream + processing jitter so
that connection quality score of down stream track is not penalized
due to up stream + processing jitter.

NOTE: Not discounting it in RTP stats ToString/ToProto methods as
that information is useful to have for analysis/debugging.

* fix typo
2023-03-18 10:25:20 +05:30
Raja Subramanian 8635b0652f Start bit rate worker only for video tracks (#1523) 2023-03-15 17:40:09 +05:30
Raja Subramanian ed2eaaabb2 Add layer mute notification (#1522)
* Layer mute

* clean up

* clean up

* set max temporal layer seen on down track add
2023-03-15 15:24:17 +05:30
Raja Subramanian 582adda97c Fix connection quality in constrained up stream (#1521)
A few things
1. Have to use expected layer in upstream distance to desired. Using
min(published, expected) means if expected is higher than published, it was not caught as a missed layer.
2. Forgot to remove layer transition update in one place. It was still constrained to screen share.
This caused quality to not pick up after constraint is released.
3. Switching to max layer cannot be marked on max published. Same as point #1 above. Otherwise,
dynacast would kick in and turn off highest layer.
2023-03-15 13:27:27 +05:30
Raja Subramanian 5bef98dc2a Switching to layer based quality for camera tracks also. (#1520)
There are cases where the layer bit rate configuration is such that
the expected bitrate difference is very high. For example,
setting up layer 2 (f) layer for 1.7 Mbps and layer 1 (h) for 180 kbps.
With bitrate based quality, a layer drop results in going to `POOR`
quality rating. With layer based, it will drop one level only.

Also, cleaning up the distance to desired calculation a bit.
2023-03-15 11:51:14 +05:30
Paul Wells 04150c044b count active signal sessions (#1519)
* count active signal sessions

* fix

* generate fake
2023-03-14 17:35:32 -07:00
David Colburn b23a0e7f39 add active filter to ListEgress (#1517)
* add active filter to ListEgress

* update test

* missed a filter
2023-03-14 13:07:00 -07:00
Raja Subramanian c2335968de Prevent evaluation over small wkndow. (#1516)
With push model (i. e. connection quality evaluation triggered
by reception of RTCP receiver report), it is possible that a report
is received quickly after a track is started (especially with video).
Those should not trigger a quality evaluation.

Set `lastStatsAt` in `Start` routine and ensure that start has been
called and enough time has passed since last stats time to avoid
small windows.
2023-03-14 16:27:39 +05:30
Raja Subramanian e0495f6cab Do not calculate distance if max layers are not valid (#1515) 2023-03-14 15:19:50 +05:30
Raja Subramanian 75eb0e01ec Missed return after adding layer transition for screen share (#1514) 2023-03-14 15:06:59 +05:30
Raja Subramanian fd27a70fe2 stream allocator <-> down track misc changes/clean up (#1512) 2023-03-13 07:45:59 +05:30
David Zhao 5ff72a99b9 Report publish & subscribe RTPStats as Telemetry events (#1506) 2023-03-10 10:28:54 -08:00
Raja Subramanian e7e8bbe72c Use an interface instead of a lot of callbacks. (#1510) 2023-03-10 23:22:22 +05:30
Raja Subramanian c70aa616a9 Expected vs actual Layer based connection quality. (#1509)
* Expected vs actual Layer based connection quality.

With VBR streams (like screen share), bit rate is not a good indicator
of whether desired layer (spatial/temporal) is achieved due to high
variance.

Using expected vs actual layer (i. e. distance to desired) can capture
any short fall and include it in quality scoring.

This PR uses distance to desired, i. e. how many steps it would take to
go from actual spatial/temporal -> desired spatial/temporal and that
distance is propotionally used (currently it is just linear) to decrease
score.

* wire up layer transitions for screen share tracks
2023-03-10 13:08:36 +05:30
Raja Subramanian e893d30fd0 Use EWMA (Exponentially Weighted Moving Average) for score updates. (#1507)
* Use EWMA (Exponentially Weighted Moving Average) for score updates.

Makes code simpler, but makes it harder to test as the inflection points
are not exact.

Score falls a bit slower to be conservative on dropping quality too
quickly. Still fall factor is higher (i. e. newer scores get more
weight) than rise factor (i. e. newer scores get lower weight).
Slower rise factor to introduce hysteresis on things climibing back too
quickly.

In the extreme case, asympttotic conditions could cause unexpected
results. For example, having 4% loss of video continously will never
drop quality to `POOR`. It will get close to 60, but it will always
stay above 60 forever and hence quality will never drop to POOR.
Maybe, need some sort of variable thresholding to deal with that. But,
that is an extreme case and may not happen in real life.

* remove unused stuff
2023-03-09 13:52:01 +05:30
Raja Subramanian 14b0b48b15 Push/pull for connection stats/quality scoring. (#1505)
* Push/pull for connection stats/quality scoring.

Was not happy with pure pull method missing a window because
of RTCP RR timing is slightly off for audio and using a much
larger window of data in the next update.

That also resulted in RTP stats getting some bits of code.
As that is per-packet processing, was not a good idea.

Switching to push-pull method.
For up track, it is pull, i. e. connection stats worker will pull stats.

For down track, there is a new notification about receiver report
reception. Using this to check for time to run stats. And adding a bit
of tolerance for processing window (currently set so that as long as it
is > 95% of usual processing interval). This allows two things
- for video, RTCP RR are more frequent, but we will still not process
  till enough time has passed
- for audio, RTCP RR could be once in 5 seconds or so. Can process when
  it is available rather than miss a window and use a much larger window
  later.

* uber atomic
2023-03-09 11:51:20 +05:30
Paul Wells 54bf7e0dac allow configuring signal message buffer size (#1504)
* allow configuring signal message buffer size

* update psrpc
2023-03-08 17:34:14 -08:00
Paul Wells 2c93d55e5c add stream retry middleware for signalling (#1503) 2023-03-08 00:51:19 -08:00
imcdd 1f4fd6aafe 1. Fix wrong atomic pkg from go1.19 std sync/atomic to go.uber.org/atomic (#1479)
2. Fix CI buildtest config '>=1.18' to '1.18',ensure compatibility with go1.18
2023-03-07 23:27:26 -08:00
cnderrauber 11ae7fdbb6 Don't switch candidate if signal closed when pc failed (#1498)
* Don't switch candidate if signal closed when pc failed

* change comment

* test case
2023-03-08 15:16:40 +08:00
lukasIO 958d2f8284 Add topics to data channel messages (#1489)
* Add topics to data channel messages

* update protocol
2023-03-07 10:41:37 +01:00
Raja Subramanian 99601e6d41 Handle the case of no packets in down stream tracks better. (#1500) 2023-03-07 14:32:43 +05:30
cnderrauber 38deab8991 Send room update while client reconnecting (#1499) 2023-03-07 15:59:40 +08:00
Raja Subramanian d2e7818eca Do not enable bitrate based scoring for screen share. (#1497) 2023-03-07 09:59:10 +05:30
Raja Subramanian 04269c100c Connection quality misc changes (#1496)
* Connectino quality misc changes

1. Call scorer.Update() with nil stat when no data available so that
   scorer can synthesise window with proper window time.
2. Substract out loss in interval to account for packets not sent at
   all.
3. Fix `packetsNotFound` variable in `getIntervalStats`. I remember this
   working at some point. Not sure if I fat fingered in another PR and
   deleted the increment line.
4. Logging a bit more when no packets expected. Those can get noisy
   especially when track is muted. But, seeing some unexplained
   instances of no packets leading to quality drop. So, temporary logging
   to get a bit more information.

* correct spelling

* Limit packet score minimum to 0.0
2023-03-07 09:08:19 +05:30
cnderrauber 48cf30ba23 Send disconnected participant update for reconnecting user (#1495)
* Send disconnected participant update for reconnecting user

* clean code
2023-03-07 09:13:15 +08:00
Raja Subramanian e2ebb22b3a Do not log TURN errors with prefix "error when handling datagram" (#1494)
These could happen normally in a poor network.
2023-03-06 12:12:42 +05:30
Raja Subramanian c3b9849328 Return high quality when there are no tracks. (#1493) 2023-03-06 09:08:02 +05:30
Raja Subramanian 15eae2119c prevent data race (#1492) 2023-03-05 17:32:56 +05:30
Raja Subramanian ea1a467191 Bitrate based quality tracking for DownTrack (#1491)
* Make connection quality not too optimistic.

With score normalization, the quality indicator showed good
under conditions which should have normally showed some badness.

So, a few things in this PR
- Do not normalize scores
- Pick the weakest link as the representative score (moving away from
  averaging)
- For down track direction, when reporting delta stats, take the number
  of packets sent actually. If there are holes in the feed (upstream
  packet loss), down tracks should not be penalised for that loss.

State of things in connection quality feature
- Audio uses rtcscore-go (with a change to accommodate RED codec). This
  follows the E-model.
- Camera uses rtcscore-go. No change here. NOTE: THe rtscore here is
  purely based on bits per pixel per frame (bpf). This has the following
  existing issues (no change, these were already there)
  o Does not take packet loss, jitter, rtt into account
  o Expected frame rate is not available. So, measured frame rate is
    used as expected frame rate also. If expected frame rate were available,
    the score could be reduced for lower frame rates.
- Screen share tracks: No change. This uses the very old simple loss
  based thresholding for scoring. As the bit rate varies a lot based on
  content and rtcscore video algorithm used for camera relies on
  bits per pixel per frame, this could produce a very low value
  (large width/height encoded in a small number of bits because of static content)
  and hence a low score. So, the old loss based thresholding is used.

* clean up

* update rtcscore pointer

* fix tests

* log lines reformat

* WIP commit

* WIP commit

* update mute of receiver

* WIP commit

* WIP commit

* start adding tests

* take min score if quality matches

* start adding bytes based scoring

* clean up

* more clean up

* Use Fuse

* log quality drop

* Periodically report bitrate to down track.

For connection quality based on bitrate for down tracks,
the measured rate should be used. That is to ensure that
down track quality measurement does not get affected by
publisher side changes negatively (or positively).
Report the optimal bit rate to connection quality scorer
every second so that scorer has a continuously updating
picture of the stream and can compare the actual bit rate
against expected optimal bitrate more reliably.

Doing it at time like allocation, the bitrate may not be
accurate (or may not even be available). So, a periodic update
is necessary.

* add transition at allocation times

* clean up debug log

* - Use number of windows for wait to make things simpler
- track no layer expected case
- always update transition
- always call updateScore
2023-03-05 14:10:19 +05:30
Raja Subramanian 9e327b1f3c Connection quality (#1490)
* Make connection quality not too optimistic.

With score normalization, the quality indicator showed good
under conditions which should have normally showed some badness.

So, a few things in this PR
- Do not normalize scores
- Pick the weakest link as the representative score (moving away from
  averaging)
- For down track direction, when reporting delta stats, take the number
  of packets sent actually. If there are holes in the feed (upstream
  packet loss), down tracks should not be penalised for that loss.

State of things in connection quality feature
- Audio uses rtcscore-go (with a change to accommodate RED codec). This
  follows the E-model.
- Camera uses rtcscore-go. No change here. NOTE: THe rtscore here is
  purely based on bits per pixel per frame (bpf). This has the following
  existing issues (no change, these were already there)
  o Does not take packet loss, jitter, rtt into account
  o Expected frame rate is not available. So, measured frame rate is
    used as expected frame rate also. If expected frame rate were available,
    the score could be reduced for lower frame rates.
- Screen share tracks: No change. This uses the very old simple loss
  based thresholding for scoring. As the bit rate varies a lot based on
  content and rtcscore video algorithm used for camera relies on
  bits per pixel per frame, this could produce a very low value
  (large width/height encoded in a small number of bits because of static content)
  and hence a low score. So, the old loss based thresholding is used.

* clean up

* update rtcscore pointer

* fix tests

* log lines reformat

* WIP commit

* WIP commit

* update mute of receiver

* WIP commit

* WIP commit

* start adding tests

* take min score if quality matches

* start adding bytes based scoring

* clean up

* more clean up

* Use Fuse

* log quality drop

* clean up debug log

* - Use number of windows for wait to make things simpler
- track no layer expected case
- always update transition
- always call updateScore
2023-03-05 12:55:04 +05:30
Paul Wells e22de045ba add signal psrpc service (#1485)
* add signal psrpc service

* update protocol dep

* refactor for cloud

* update psrpc

* pr feedback
2023-03-03 15:49:46 -08:00
Raja Subramanian e48c818532 Resync on pub muted for audio to avoid jump in sequence numbers on (#1487)
unmute.
2023-03-03 12:18:25 +05:30
cnderrauber 4277699600 Add option to enable skip tcp ice if tcp rtt is high (#1484)
* Add option to switch tcp ice only if tcp works well

* solve comment

* rename and remove config change
2023-03-01 16:45:39 +08:00
Raja Subramanian a35eecd03d Fix a case of changing video quality not succeeding. (#1483)
In the following order, got the wrong layer
- Max layer is 0, max published is 0, request layer is 0
- Current locks to 0.
- Max changes to 1. Nothing changes as 1 is not published yet.
- Max published changes to 1.
- As curernt layer is valid, available and locked to request layer, it
  was kept. But, it should have checked if the request layer changed
  and updated accordingly.
2023-03-01 11:12:54 +05:30
Raja Subramanian ab098d951e Prevent PLI layer lock getting stuck. (#1481)
In the following scenario, PLI layer lock got stuck at the wrong layer
- target is at 2 to allow overshoot
- current gets to 1, but can't get higher because publisher is not
  publishing higher layer
- max layer changed to 0

Because of adjusting for overshoot only when current == target, it never
happened and layer lock PLI kept asking for layer 0. Although, key
frames were received, switch did not happen.

Always check for overshoot adjustment possibility against current layer.
2023-03-01 06:35:41 +05:30