* Use a worker to report signal/data stats.
Was checking if reporting is needed on every update.
The check is wasted work if volume of signal/data messages is high
as reporting happens only once in 10 seconds.
Changing to a worker based on a timer. And also aligning with
telemetry reporting interval which defaults to 30 seconds.
* Remove unused constant
Logging expected WS close at Infow to understand reasons for closure.
Moving "read from ws" to Debugw as it happens when signalling closes.
Also filter out a data channel abort chunk log as it shows a bunch of
errors, but those are expected though.
* let panics crash
* Revert "let panics crash"
This reverts commit 8027cccadd.
* catch and log panics then os.Exit
* Recover only recovers, caller can exit
* only exit on pacic, still need Recover calls in goroutines
* Add Timer to detect dtls failure quickly
* Fix pc state check in timeout after ice
* More strict conditions to switch candidate type
* log for signal interuppt
* typo
It's not always possible for WebSocket clients to obtain status code or
error messages returned during WS Upgrade. Moving autocreation validation
to an explicit interface in the validation step so the /rtc/validate
would be able to return an appropriate message.
Sometimes the initial selected node could fail. In that case, we'll give it a few more attempts to locate a media node for the session instead of failing it after the first try.
* initial commit
* add correct label
* clean up
* more cleanup on adding stats
* cleanup
* move things to pub and sub monitors, ensure stats are correctly updated
* fix merge conflict
* Fix panic on MacOS (#1296)
* fixing last feedback
Co-authored-by: Raja Subramanian <raja.gobi@tutanota.com>
In extreme cases, media nodes could take more than 5s to spin up the
session. Increasing this timeout to 10s reduces the number of disconnections
due to edge cases.
* Remove VP9 from media engine set up.
* Remove vp9 from config sample
* Supervisor beginnings
Eventual goal is to have a reconciler which moves state from
actual -> desired. First step along the way is to observe/monitor.
The first step even in that is an initial implementation to get
feedback on the direction.
This PR is a start in that direction
- Concept of a supervisor at local participant level
- This supervisor will be responsible for periodically monitor
actual vs desired (this is the one which will eventually trigger
other things to reconcile, but for now it just logs on error)
- A new interface `OperationMonitor` which requires two methods
o Check() returns an error based on actual vs desired state.
o IsIdle() returns bool. Returns true if the monitor is idle.
- The supervisor maintains a list of monitors and does periodic check.
In the above framework, starting with list of
subscriptions/unsubscriptions. There is a new module
`SubscriptionMonitor` which checks subscription transitions.
A subscription transition is queued on subscribe/unsubscribe.
The transition can be satisfied when a subscribedTrack is added OR
removed. Error condition is when a transition is not satisfied for
10 seconds. Idle is when the transition queue is empty and
subscribedTrack is nil, i. e. the last transition would have been
unsubscribe and subscribed track removed (unsubscribe satisfied).
The idea is individual monitors can check on different things.
Some more things that I am thinking about are
- PublishedTrackMonitor - started when an add track happens,
satisfied when OnTrack happens, error if `OnTrack` does not
fire for a while and track is not muted, idle when there is
nothing pending.
- PublishedTrackStreamingMonitor - to ensure that a published track
is receiving media at the server (accounting for dynacast, mute, etc)
- SubscribedTrackStreamingMonitor - to ensure down track is sending
data unless muted.
* Remove debug
* Protect against early casting errors
* Adding PublicationMonitor
When the instance handling the signal request did not respond to the
initial connection, we will fail the connection attempt instead of
having it hang forever.