Commit Graph

211 Commits

Author SHA1 Message Date
cnderrauber
f7a1776f4c Add control of playout delay (#1838)
* Add control of playout delay

Add config to enable playout delay. The delay will be limited by
[min,max] in the config option and calculated by upstream & downstream
RTT.

* check protocol version to enable playout delay

* Move config to room, limit playout-delay update interval, solve comments

* Remove adaptive playout-delay

* Remove unused config
2023-08-02 16:12:23 +08:00
David Zhao
981fb7cac7 Adding license notices (#1913)
* Adding license notices

* remove from config
2023-07-27 16:43:19 -07:00
Raja Subramanian
fc7d4bd01e E2EE trailer for server injected packets. (#1908)
* Ability to use trailer with server injected frames

A 32-byte trailer generated per room.
Trailer appended when track encryption is enabled.

* E2EE trailer for server injected packets.

- Generate a 32-byte per room trailer. Too reasons for longer length
  o Laziness: utils generates a 32 byte string.
  o Longer length random string reduces chances of colliding with real data.
- Trailer sent in JoinResponse
- Trailer added to server injected frames (not to padding only packets)

* generate

* add a length check

* pass trailer in as an argument
2023-07-27 16:50:18 +05:30
Paul Wells
3980d049c9 close disconnected participants when signal channel fails (#1895)
* close disconnected participants when signal channel fails

* fix typefake

* update reason
2023-07-20 19:23:35 -07:00
Raja Subramanian
eaf70d5549 Pacer in down stream path. (#1835)
* Pacer interface to send packets

* notify outside lock

* use select

* use pass through pacer

* add error to OnSent

* Remove log which could get noisy

* Starting TWCC work (#1727)

* add packet time

* WIP commit

* WIP commit

* WIP commit

* minor comments

* Some measurements (#1736)

* WIP commit

* some notes

* WIP commit

* variable name change and do not post to closed channel

* unlock

* clean up

* comment

* Hooking up some more bits for TWCC (#1752)

* wake under lock

* Pacer in down stream path.

Splitting out only the pacer from a feature branch to
introduce the concept of pacer.

Currently, there should be no difference in functionality
as a pass through pacer is used.

Another implementation exists which is just put it in a queue and send
it from one goroutine.

A potential implementation to try would be data paced by bandwidth
estimate. That could include priority queues and such.

But, the main goal here is to introduce notion of pacer in the down
stream path and prepare for more congestion control possibilities down
the line.

* Don't need peak detector

* remove throttling of write IO errors
2023-06-28 13:22:44 +05:30
Raja Subramanian
352bb1d204 Add GetClientInfo interface, to be used to decide migration vs full-reconenct (#1827) 2023-06-26 23:15:53 +05:30
Raja Subramanian
00558dee5c Close participant on full reconnect. (#1818)
* Close participant on full reconnect.

A full reconnect == irrecoverable error. Participant cannot continue.
So, close the participant when issuing a full reconnect.
That should prevent subscription manager reconcile till the participant
is finally closed down when participant is stale.

* format
2023-06-22 10:09:10 +05:30
Raja Subramanian
12db469297 Better tracking of signalling connection. (#1794)
* Better tracking of signalling connection.

- Reason for closing signaling channel.
- ConnectionID attached to request source/response sink

* Tests
2023-06-15 12:53:34 +05:30
Raja Subramanian
7ed3af193a No proof that this helps (#1772) 2023-06-06 11:28:13 +05:30
cnderrauber
c1842cb54f Avoid reconnect loop for unsupported downtrack (#1754)
* Avoid reconnect loop for unsupported downtrack

If the client subscribes to a track which codec is unsupported by the
client, sfu will trigger negotiation failed and issue a full reconnect
after received client answer. If the client try to subscribe that track
then it will got full reconnect again. That will cause a infinite
reconnect loop until the client don't subscribe that track. This PR
will unsubscribe the error track for the client and send a
SubscriptionResponse that contain the reason to indicates the track's
codec is not supported to avoid the reconnect loop.
2023-05-31 11:41:22 +08:00
Raja Subramanian
3fb93135f5 Experimental flag to try time stamp adjustment to control drift. (#1687)
* Experimental flag to try time stamp adjustment to control drift.

There is a config to enable this.

Using a PID controller to try and keep the sample rate at expected
value. Need to be seen if this works well. Adjustment are limited
to 25 ms max at a time to ensure there are no large jumps.
And it is applied when doing RTCP sender report which happens
once in 5 seconds currently for both audio and video tracks.

A nice introduction to PID controllers - https://alphaville.github.io/qub/pid-101/#/
Implementation borrowed from - https://github.com/pms67/PID

A few things TODO
1. PID controller tuning is a process. Have picked values from test from
   that implementation above. May not be the best. Need to try.
2. Can potentially run this more often. Rather than running it only when
   running RTCP sender report (which is once in 5 seconds now), can
   potentially run it every second and limit the amount of change to
   something like 10 ms max.

* remove unused variable

* debug log a bit more
2023-05-06 11:52:57 +05:30
David Zhao
5fcd682fb0 Refactor participant metadata updates to avoid duplication (#1679)
* Refactor participant metadata updates to avoid duplication

* generated fakes
2023-05-03 13:50:45 -07:00
Raja Subramanian
35b8319b08 Remove disallowed subscriptions on close. (#1668)
With subscription manager, there is no need to tell a publisher
about a subscriber going away. Before subscription manager,
the up track manager of a participant (i. e. the publisher side)
was holding a list of pending subscriptions for its published tracks
and that had to be cleaned up if one of the subscriber goes away.
That is not the case any more.

Also set publisherID early so that subscription permission update has
the right publisherID. In fact, saw an empty ID in the logs and saw
that we still have the disallowed subscription handling which is not
necessary any more.
2023-04-29 09:18:07 +05:30
Paul Wells
11eedf4514 update participant to support signal broadcast skipping (#1657)
* update participant to support signal broadcast skipping

* cleanup

* lock

* feedback

* order

* update requireBroadcast in SetPermissions
2023-04-26 17:11:33 -07:00
David Zhao
3f64828a77 Send Room updates when participant counts change (#1647)
Reduces the number of unneeded generation with ProtoProxy
2023-04-22 21:08:59 -07:00
Paul Wells
745410bd69 only increment participant version after updates (#1646)
* only increment participant version after updates

* fix test util

* cleanup

* test uptrackmanager permission update version check
2023-04-22 17:48:10 -07:00
Raja Subramanian
d2bf8f0ba1 Support simulating subscriber bandwidth. (#1609)
* Support simualting subscriber bandwidth.

When non-zero, a full allocation is triggered.
Also, probes are stopped.

When set to zero, normal probing mechanism should catch up.

Adding `allowPause` override which can be a connection option.

* fix log

* allowPause in participant params
2023-04-13 13:59:24 +05:30
cnderrauber
c70a5c831f Refine transport fallback for client resuming (#1597)
* reset fallback after ice restart

* Configure ice for reconnect before send response
2023-04-10 15:12:05 +08:00
David Zhao
e03f75d6a1 Implements source-specific permissions and client-driven metadata updates (#1590)
Closes #1565
2023-04-07 23:47:49 -07:00
cnderrauber
11ae7fdbb6 Don't switch candidate if signal closed when pc failed (#1498)
* Don't switch candidate if signal closed when pc failed

* change comment

* test case
2023-03-08 15:16:40 +08:00
cnderrauber
48cf30ba23 Send disconnected participant update for reconnecting user (#1495)
* Send disconnected participant update for reconnecting user

* clean code
2023-03-07 09:13:15 +08:00
Raja Subramanian
9e327b1f3c Connection quality (#1490)
* Make connection quality not too optimistic.

With score normalization, the quality indicator showed good
under conditions which should have normally showed some badness.

So, a few things in this PR
- Do not normalize scores
- Pick the weakest link as the representative score (moving away from
  averaging)
- For down track direction, when reporting delta stats, take the number
  of packets sent actually. If there are holes in the feed (upstream
  packet loss), down tracks should not be penalised for that loss.

State of things in connection quality feature
- Audio uses rtcscore-go (with a change to accommodate RED codec). This
  follows the E-model.
- Camera uses rtcscore-go. No change here. NOTE: THe rtscore here is
  purely based on bits per pixel per frame (bpf). This has the following
  existing issues (no change, these were already there)
  o Does not take packet loss, jitter, rtt into account
  o Expected frame rate is not available. So, measured frame rate is
    used as expected frame rate also. If expected frame rate were available,
    the score could be reduced for lower frame rates.
- Screen share tracks: No change. This uses the very old simple loss
  based thresholding for scoring. As the bit rate varies a lot based on
  content and rtcscore video algorithm used for camera relies on
  bits per pixel per frame, this could produce a very low value
  (large width/height encoded in a small number of bits because of static content)
  and hence a low score. So, the old loss based thresholding is used.

* clean up

* update rtcscore pointer

* fix tests

* log lines reformat

* WIP commit

* WIP commit

* update mute of receiver

* WIP commit

* WIP commit

* start adding tests

* take min score if quality matches

* start adding bytes based scoring

* clean up

* more clean up

* Use Fuse

* log quality drop

* clean up debug log

* - Use number of windows for wait to make things simpler
- track no layer expected case
- always update transition
- always call updateScore
2023-03-05 12:55:04 +05:30
David Zhao
8c43b7b48f Fix unsubscribed speakers stuck as speaking to clients (#1475)
When we unsubscribe from a speaker, SendSpeakerUpdates will drop updates
from that speaker. This has the side effect of dropping the "clearing"
message that we are sending as well.
2023-02-26 23:56:09 -08:00
David Zhao
e855620379 Prevent subscribing to track that's closing (#1454)
Due to the order of events in MediaTrackReceiver and friends, SubscribedTrack
will be closed before the track is removed from RoomTrackManager.

Because of this, when a track is unpublished, it's possible to be subscribed
to the track as it's closing.

By introducing a closing state, we'd prevent accidental subscription to
closing tracks.
2023-02-22 01:14:49 -08:00
Raja Subramanian
9f94fc8347 Callback support for migrate state change. (#1435)
This can be used to detect changes in migrate state and signal
migration completion to remote nodes.
2023-02-17 13:13:01 +05:30
Raja Subramanian
6cb46107c8 Delete signal de-duper. (#1427)
Not a good design. There is not an easy way to filter messages
before it hits media node. Without that, there is not a lot
of advantage.

And there are sequences that are not handled correctly in this
deleted implementation.

So, deleting code to prevent use.
2023-02-16 09:32:48 +05:30
cnderrauber
4367e93855 parallel writing for data packet broadcast (#1425) 2023-02-15 17:18:43 +08:00
David Zhao
2851a8ac98 Improved robustness of subscription stack (#1382)
UpdateSubscription had a shortcoming where when it couldn't find the
participant, it ignored the request.

This PR further removes the reliance of current publisher state from
subscribers.
- SubscribeToTrack only takes in a trackID
- Introduced RoomTrackManager to maintain all published tracks to a room
- Added TrackUnpublished event to clearly indicate when a track has been removed
- SubscribeRequested event no longer include information about the publisher
2023-02-06 18:08:26 -08:00
cnderrauber
8b6dab780c Add reconnect reason and signal rtt calculation (#1381)
* Add connect reason and signal rtt calculate

* Update protocol

* solve comment
2023-02-06 11:12:25 +08:00
David Zhao
be4764b93b Improve panic recovery to use participant logger. (#1375)
Also made IssueFullReconnect public
2023-02-02 14:55:50 -08:00
cnderrauber
7e5ba6a3b0 Improve connectivity check (#1366)
* Add Timer to detect dtls failure quickly

* Fix pc state check in timeout after ice

* More strict conditions to switch candidate type

* log for signal interuppt

* typo
2023-02-01 20:00:34 +08:00
David Zhao
cd6b8b80b9 feat: SubscriptionManager to consolidate subscription handling (#1317)
Added a new manager to handle all subscription needs. Implemented using reconciler pattern. The goals are:

improve subscription resilience by separating desired state and current state
reduce complexity of synchronous processing
better detect failures with the ability to trigger full reconnect
2023-01-24 23:06:16 -08:00
cnderrauber
55962e300c enable track level audo nack config (#1306) 2023-01-13 17:07:06 +08:00
cnderrauber
81fb1c5ef0 Add idle check for participant (#1303) 2023-01-12 17:26:53 +08:00
cnderrauber
25debc6d35 add reconnect response to update configuration while reconnecting (#1300)
* add reconnect response to update configuration while reconnecting

* fix test
2023-01-11 17:40:12 +08:00
Raja Subramanian
4ba7e57683 Make an IsDisconnected interface and use it (#1278) 2022-12-31 12:53:02 +05:30
Raja Subramanian
1a48cc6a8b Track subscription operations per source track. (#1248) 2022-12-23 12:23:26 +05:30
Raja Subramanian
f24c1b95c2 Initial commit of signal deduper. (#1243)
* Initial commit of signal deduper.

Idea is protect against signal storm from misbehaving clients.

Design:
- SignalDeduper interface with one method to handle a SignalRequest and
  return if dupe or not.
- Signal specific deduper. Could have made a single de-duper which could
  handle all signal message types, but making it per type so that the
  code is cleaner.
- Some module (like the router) can instantiate whatever signal types
  it wants to de-dupe. When a signal message is received, that module
  can run the signal message through the list of de-dupers and
  potentially drop the message if any of the de-dupers declare that the
  message is a dupe. Making it a list makes things a little bit
  inefficient, but keeps things cleaner. Hopefully, not many de-dupers
  will be needed so that the inefficiency is not pronounced.

* re-arrange comments

* helper function

* add ParticipantClosed
2022-12-21 09:29:56 +05:30
Raja Subramanian
50e39b9985 Check participant SID also while removing a participant. (#1237) 2022-12-19 22:53:11 +05:30
Raja Subramanian
241a7120f5 ICE config using protocol model (#1233)
* ICE config using protocol model

* use pointers consistently

* protocol pointer

* mage generate
2022-12-19 10:25:08 +05:30
David Zhao
33902a9f2a Do not send ParticipantLeft webhook event unless connected successfully. (#1234)
Fixes #1130
2022-12-18 17:37:55 -08:00
Haibo Chen
8a6c6de1db update name of participant (#1213) 2022-12-15 22:03:59 -08:00
Raja Subramanian
6bd5504bff Add option to issue full reconnect on a publication error. (#1214)
* Add option to issue full reconnect on a publication error.

Leaving the publication error timeout at 30 seconds as there
are some publications taking long. Also, there are cases
where the peer connection fails after 30 seconds. The peer
connection failure happens after publication error is detected.
But, 30 seconds is a good amount of time for publication to establish.

* prevent recursive lock
2022-12-06 14:46:59 +05:30
cnderrauber
3c907ed460 Add stats for data channel and signal (#1198)
* Add stats for data channel and signal

* Solve comment
2022-11-30 14:53:19 +08:00
cnderrauber
aaeb3c933c Fix rtcp lost for downtrack used incorrect buffer factory (#1195)
* Fix rtcp lost for downtrack used incorrect buffer factory

In buffer factory change(#1173), every pariticipant has its own
buffer factory, can't use publisher's bufferfactory to create
DownTrack

* clean code
2022-11-28 13:04:56 +08:00
Raja Subramanian
086009f05a Do not forward media till peer connection is connected. (#1194)
There were some failures with missing media. The only thing I could
see between working and non-working case is when media forwarding
starts. So, delay media forwarding till peer connection is connected.

Also, add a subscribe op only if a subscribe/unsubscribe queuing is
successful. There was a recent change to not queue a subscribe when
the participant is closed/disconnected. This got the subscribe op
counter out of whack.
2022-11-26 21:42:19 +05:30
cnderrauber
0310aa9250 Make sure client get participant info before track fired (#1147) 2022-11-07 14:50:45 +08:00
cnderrauber
5edb42a9fd experiment fallback to tcp when udp unstable (#1119)
* fallback to tcp when udp unstable
2022-10-31 09:40:20 +08:00
cnderrauber
7a7fc09372 Add fps calculator for VP8 and DependencyDescriptor (#1110)
* Add fps calculator for VP8 and DependencyDescriptor

* clean code

* unit test

* clean code

* solve comment
2022-10-26 09:28:28 +08:00
cnderrauber
8fd3e8fe2d Support track level stereo and red setting (#1086)
* Support track level stereo and red setting

* fix test client
2022-10-17 10:48:11 +08:00