Commit Graph

192 Commits

Author SHA1 Message Date
cnderrauber
11ae7fdbb6 Don't switch candidate if signal closed when pc failed (#1498)
* Don't switch candidate if signal closed when pc failed

* change comment

* test case
2023-03-08 15:16:40 +08:00
cnderrauber
48cf30ba23 Send disconnected participant update for reconnecting user (#1495)
* Send disconnected participant update for reconnecting user

* clean code
2023-03-07 09:13:15 +08:00
Raja Subramanian
9e327b1f3c Connection quality (#1490)
* Make connection quality not too optimistic.

With score normalization, the quality indicator showed good
under conditions which should have normally showed some badness.

So, a few things in this PR
- Do not normalize scores
- Pick the weakest link as the representative score (moving away from
  averaging)
- For down track direction, when reporting delta stats, take the number
  of packets sent actually. If there are holes in the feed (upstream
  packet loss), down tracks should not be penalised for that loss.

State of things in connection quality feature
- Audio uses rtcscore-go (with a change to accommodate RED codec). This
  follows the E-model.
- Camera uses rtcscore-go. No change here. NOTE: THe rtscore here is
  purely based on bits per pixel per frame (bpf). This has the following
  existing issues (no change, these were already there)
  o Does not take packet loss, jitter, rtt into account
  o Expected frame rate is not available. So, measured frame rate is
    used as expected frame rate also. If expected frame rate were available,
    the score could be reduced for lower frame rates.
- Screen share tracks: No change. This uses the very old simple loss
  based thresholding for scoring. As the bit rate varies a lot based on
  content and rtcscore video algorithm used for camera relies on
  bits per pixel per frame, this could produce a very low value
  (large width/height encoded in a small number of bits because of static content)
  and hence a low score. So, the old loss based thresholding is used.

* clean up

* update rtcscore pointer

* fix tests

* log lines reformat

* WIP commit

* WIP commit

* update mute of receiver

* WIP commit

* WIP commit

* start adding tests

* take min score if quality matches

* start adding bytes based scoring

* clean up

* more clean up

* Use Fuse

* log quality drop

* clean up debug log

* - Use number of windows for wait to make things simpler
- track no layer expected case
- always update transition
- always call updateScore
2023-03-05 12:55:04 +05:30
David Zhao
8c43b7b48f Fix unsubscribed speakers stuck as speaking to clients (#1475)
When we unsubscribe from a speaker, SendSpeakerUpdates will drop updates
from that speaker. This has the side effect of dropping the "clearing"
message that we are sending as well.
2023-02-26 23:56:09 -08:00
David Zhao
e855620379 Prevent subscribing to track that's closing (#1454)
Due to the order of events in MediaTrackReceiver and friends, SubscribedTrack
will be closed before the track is removed from RoomTrackManager.

Because of this, when a track is unpublished, it's possible to be subscribed
to the track as it's closing.

By introducing a closing state, we'd prevent accidental subscription to
closing tracks.
2023-02-22 01:14:49 -08:00
Raja Subramanian
9f94fc8347 Callback support for migrate state change. (#1435)
This can be used to detect changes in migrate state and signal
migration completion to remote nodes.
2023-02-17 13:13:01 +05:30
Raja Subramanian
6cb46107c8 Delete signal de-duper. (#1427)
Not a good design. There is not an easy way to filter messages
before it hits media node. Without that, there is not a lot
of advantage.

And there are sequences that are not handled correctly in this
deleted implementation.

So, deleting code to prevent use.
2023-02-16 09:32:48 +05:30
cnderrauber
4367e93855 parallel writing for data packet broadcast (#1425) 2023-02-15 17:18:43 +08:00
David Zhao
2851a8ac98 Improved robustness of subscription stack (#1382)
UpdateSubscription had a shortcoming where when it couldn't find the
participant, it ignored the request.

This PR further removes the reliance of current publisher state from
subscribers.
- SubscribeToTrack only takes in a trackID
- Introduced RoomTrackManager to maintain all published tracks to a room
- Added TrackUnpublished event to clearly indicate when a track has been removed
- SubscribeRequested event no longer include information about the publisher
2023-02-06 18:08:26 -08:00
cnderrauber
8b6dab780c Add reconnect reason and signal rtt calculation (#1381)
* Add connect reason and signal rtt calculate

* Update protocol

* solve comment
2023-02-06 11:12:25 +08:00
David Zhao
be4764b93b Improve panic recovery to use participant logger. (#1375)
Also made IssueFullReconnect public
2023-02-02 14:55:50 -08:00
cnderrauber
7e5ba6a3b0 Improve connectivity check (#1366)
* Add Timer to detect dtls failure quickly

* Fix pc state check in timeout after ice

* More strict conditions to switch candidate type

* log for signal interuppt

* typo
2023-02-01 20:00:34 +08:00
David Zhao
cd6b8b80b9 feat: SubscriptionManager to consolidate subscription handling (#1317)
Added a new manager to handle all subscription needs. Implemented using reconciler pattern. The goals are:

improve subscription resilience by separating desired state and current state
reduce complexity of synchronous processing
better detect failures with the ability to trigger full reconnect
2023-01-24 23:06:16 -08:00
cnderrauber
55962e300c enable track level audo nack config (#1306) 2023-01-13 17:07:06 +08:00
cnderrauber
81fb1c5ef0 Add idle check for participant (#1303) 2023-01-12 17:26:53 +08:00
cnderrauber
25debc6d35 add reconnect response to update configuration while reconnecting (#1300)
* add reconnect response to update configuration while reconnecting

* fix test
2023-01-11 17:40:12 +08:00
Raja Subramanian
4ba7e57683 Make an IsDisconnected interface and use it (#1278) 2022-12-31 12:53:02 +05:30
Raja Subramanian
1a48cc6a8b Track subscription operations per source track. (#1248) 2022-12-23 12:23:26 +05:30
Raja Subramanian
f24c1b95c2 Initial commit of signal deduper. (#1243)
* Initial commit of signal deduper.

Idea is protect against signal storm from misbehaving clients.

Design:
- SignalDeduper interface with one method to handle a SignalRequest and
  return if dupe or not.
- Signal specific deduper. Could have made a single de-duper which could
  handle all signal message types, but making it per type so that the
  code is cleaner.
- Some module (like the router) can instantiate whatever signal types
  it wants to de-dupe. When a signal message is received, that module
  can run the signal message through the list of de-dupers and
  potentially drop the message if any of the de-dupers declare that the
  message is a dupe. Making it a list makes things a little bit
  inefficient, but keeps things cleaner. Hopefully, not many de-dupers
  will be needed so that the inefficiency is not pronounced.

* re-arrange comments

* helper function

* add ParticipantClosed
2022-12-21 09:29:56 +05:30
Raja Subramanian
50e39b9985 Check participant SID also while removing a participant. (#1237) 2022-12-19 22:53:11 +05:30
Raja Subramanian
241a7120f5 ICE config using protocol model (#1233)
* ICE config using protocol model

* use pointers consistently

* protocol pointer

* mage generate
2022-12-19 10:25:08 +05:30
David Zhao
33902a9f2a Do not send ParticipantLeft webhook event unless connected successfully. (#1234)
Fixes #1130
2022-12-18 17:37:55 -08:00
Haibo Chen
8a6c6de1db update name of participant (#1213) 2022-12-15 22:03:59 -08:00
Raja Subramanian
6bd5504bff Add option to issue full reconnect on a publication error. (#1214)
* Add option to issue full reconnect on a publication error.

Leaving the publication error timeout at 30 seconds as there
are some publications taking long. Also, there are cases
where the peer connection fails after 30 seconds. The peer
connection failure happens after publication error is detected.
But, 30 seconds is a good amount of time for publication to establish.

* prevent recursive lock
2022-12-06 14:46:59 +05:30
cnderrauber
3c907ed460 Add stats for data channel and signal (#1198)
* Add stats for data channel and signal

* Solve comment
2022-11-30 14:53:19 +08:00
cnderrauber
aaeb3c933c Fix rtcp lost for downtrack used incorrect buffer factory (#1195)
* Fix rtcp lost for downtrack used incorrect buffer factory

In buffer factory change(#1173), every pariticipant has its own
buffer factory, can't use publisher's bufferfactory to create
DownTrack

* clean code
2022-11-28 13:04:56 +08:00
Raja Subramanian
086009f05a Do not forward media till peer connection is connected. (#1194)
There were some failures with missing media. The only thing I could
see between working and non-working case is when media forwarding
starts. So, delay media forwarding till peer connection is connected.

Also, add a subscribe op only if a subscribe/unsubscribe queuing is
successful. There was a recent change to not queue a subscribe when
the participant is closed/disconnected. This got the subscribe op
counter out of whack.
2022-11-26 21:42:19 +05:30
cnderrauber
0310aa9250 Make sure client get participant info before track fired (#1147) 2022-11-07 14:50:45 +08:00
cnderrauber
5edb42a9fd experiment fallback to tcp when udp unstable (#1119)
* fallback to tcp when udp unstable
2022-10-31 09:40:20 +08:00
cnderrauber
7a7fc09372 Add fps calculator for VP8 and DependencyDescriptor (#1110)
* Add fps calculator for VP8 and DependencyDescriptor

* clean code

* unit test

* clean code

* solve comment
2022-10-26 09:28:28 +08:00
cnderrauber
8fd3e8fe2d Support track level stereo and red setting (#1086)
* Support track level stereo and red setting

* fix test client
2022-10-17 10:48:11 +08:00
Raja Subramanian
573850261a Cache RTPStats and seed on re-use (#1080)
* Cache RTPStats and seed on re-use

When a cached down track is re-used, RTPStats was not cached.
This caused sender reports getting out-of-sync with the remote side.
Cache RTPStats and seed it on re-use.

* staticcheck
2022-10-12 09:10:17 +05:30
Raja Subramanian
30e5037418 Minor clean up of media track & friends module (#1067) 2022-10-04 05:23:18 +05:30
Raja Subramanian
b3bd403316 Small clean up - remove unused pariticpant close reason (#1055) 2022-09-29 21:53:18 +05:30
Raja Subramanian
b3e148771a Tweaks to reduce supervisor error logs (#1039)
Seeing some supervisor error logs under two conditions
- Issuing a full reconnect - client should close this session and
form a new one. So, supervisor errors on the to be closed session
is not useful.
- Some times it takes a long time for publisher PC to establish.
If publish monitor timer stars when a pending track is added,
the time out fires before ICE/DTLS is established. So, include
a condition to start timer on publication monitor only after
peer connection is connected.
2022-09-27 08:20:06 +05:30
Raja Subramanian
dfc71d5bf8 Add a flag to signal need to close underlying media track. (#1038)
With migration in, once the local track is published, the
remote track should be closed. Add a flag to `RemovePublishedTrack`
to control the close behaviour. Invoke `Close` if specified.

Without, the remote track is not closed if it is waiting to resolve,
i. e. not yet attached. That remote track is left hanging.
2022-09-26 15:32:22 +05:30
Raja Subramanian
33f782a99b Use PostEvent to avoid casting to concrete type (#1006) 2022-09-15 12:22:13 +05:30
Raja Subramanian
07c43e0972 Supervisor beginnings (#1005)
* Remove VP9 from media engine set up.

* Remove vp9 from config sample

* Supervisor beginnings

Eventual goal is to have a reconciler which moves state from
actual -> desired. First step along the way is to observe/monitor.
The first step even in that is an initial implementation to get
feedback on the direction.

This PR is a start in that direction
- Concept of a supervisor at local participant level
- This supervisor will be responsible for periodically monitor
  actual vs desired (this is the one which will eventually trigger
  other things to reconcile, but for now it just logs on error)
- A new interface `OperationMonitor` which requires two methods
  o Check() returns an error based on actual vs desired state.
  o IsIdle() returns bool. Returns true if the monitor is idle.
- The supervisor maintains a list of monitors and does periodic check.

In the above framework, starting with list of
subscriptions/unsubscriptions. There is a new module
`SubscriptionMonitor` which checks subscription transitions.
A subscription transition is queued on subscribe/unsubscribe.
The transition can be satisfied when a subscribedTrack is added OR
removed. Error condition is when a transition is not satisfied for
10 seconds. Idle is when the transition queue is empty and
subscribedTrack is nil, i. e. the last transition would have been
unsubscribe and subscribed track removed (unsubscribe satisfied).

The idea is individual monitors can check on different things.
Some more things that I am thinking about are
- PublishedTrackMonitor - started when an add track happens,
  satisfied when OnTrack happens, error if `OnTrack` does not
  fire for a while and track is not muted, idle when there is
  nothing pending.
- PublishedTrackStreamingMonitor - to ensure that a published track
  is receiving media at the server (accounting for dynacast, mute, etc)
- SubscribedTrackStreamingMonitor - to ensure down track is sending
  data unless muted.

* Remove debug

* Protect against early casting errors

* Adding PublicationMonitor
2022-09-15 11:16:37 +05:30
cnderrauber
f1915feb1a keep mid unchange after migration for subscribed track (#995) 2022-09-09 17:39:09 +08:00
Raja Subramanian
d13c4be923 Close subscriber PC after a wait to aid in migration. (#979)
* Close subscriber PC after a wait to aid in migration.

* mage generate
2022-09-03 01:16:51 +05:30
David Zhao
69bf31944e Send connection type to telemetry (#964)
* Send connection type to telemetry

When connected, determine how the participant's primary connection is
connected and report it in ParticipantActive event.

* address feedback

* fixed case where prflx is reported instead of relay

* incorporate comments
2022-08-29 23:17:13 -07:00
Raja Subramanian
9b0539eb43 Need this for clean up during migration (#965) 2022-08-29 13:19:58 +05:30
David Zhao
747089a005 Additional closure reasons (#958) 2022-08-25 19:36:47 -07:00
cnderrauber
1350400c3a fallback to turn over tls when tcp short connection happen (#950)
* fallback to tls when tcp failed

* go mod

* magefile
2022-08-24 20:42:56 +08:00
Raja Subramanian
aaa3a5b46e Transport restructure (#944)
* WIP commit

* WIP commit

* fix copy pasta

* setting PC with previous answer has to happen synchronously

* static check

* WIP commit

* WIP commit

* fixing transport tests

* fix tests and clean up

* minor renaming

* FIx test race

* log event when channel is full
2022-08-24 14:31:45 +05:30
Raja Subramanian
70422c0267 Export CloseSignalConnection (#936)
* Export CloseSignalConnection

There are a few places where that close pattern is repeated.
Export it and use that function in other places directly.

* fix test
2022-08-21 11:33:35 +05:30
Raja Subramanian
0cd9c87dc9 Misc clean up (#931)
* Start RTCP workers after peer connection connects

* Move more things into transport module

* Start RTCP workers only on connected

* Test needs PeerConnection() method

* adjust comment
2022-08-19 11:49:12 +05:30
Raja Subramanian
f5627c3859 Prevent track subscriptions/adding receivers after close (#924)
* Prevent track subscriptions/adding receivers after close

With subscribe/unsubscribe queuing, a subscribe may be
attempted after a call to `RemoveAllSubscribers`.
So, renaming `RemoveAllSubscribers` to `InitiateClose`
and maintaining state that track is in the process of closing.

* Mime specific remove

* Remove unused error

* do not add receiver when closing
2022-08-17 13:07:59 +05:30
Raja Subramanian
641f8d4519 Transport refactor (#907)
* WIP commit

* WIP commit

* WIP commit

* WIP commit

* WIP commit

* Clean up

* fix tests

* debug logs

* Remove comments

* Fix data channel creation on migration and clean up unused stuff

* log offer/answer send/receive
2022-08-12 11:20:54 +05:30
David Zhao
f09885825e Return ServerInfo to clients on join (#904)
* checkpoint

* Return ServerInfo in join response

* also include node information

* less verbose quality score

* update go modules
2022-08-10 17:04:17 -07:00