* Rework node stats a bit.
Related protocol PR - https://github.com/livekit/protocol/pull/1023
- Make a config for node stats measurements. Wanted to put the config in
`routing` package, but a circular dependency forced me to put in
config.go
- Make rate calculations explicit, i. e. requested via config.
Previously, it had some odd checks to decide when to calculate rate
and it would have been calculating over different windows.
- Report signal/data channel bytes every 5 seconds to stats collection
module. Previously, it was doing it every 30 seconds and that meant
some windows could have had a large spike
NOTE: Still need to think about this for load calculations as a large
number of participants leaving could flush in a small window and that
could report a large spike in bytes/packets. Maybe need to ignore
signal bytes for load calculation?
* deps
* use default node stats config if given config is nil
* split out node stats into a struct for re-use
* update config
* De-centralize some configs to where they are used.
And make default variables.
Renaming a bit, but these are all internal config and have not been
added to documented config.
* Keep documented config as is.
* test
* typo
* Add counter for pub&sub time metrics
The pub&sub shows large value in migration related case like
muted/disabled migration, the subscription time depends on
the time when publisher unmute the track(sending rtp packet
after migration), add a counter to distinguish since we
can't control the time in such cases and the first subscription
attemps also is more meaningful than those cases.
* Add info log for high publish delay
* Record out-of-packet count/rate in prom.
Adding a field to AnalyticsStream to make this easier to report.
Let me know if adding to AnalyticsStream is not ok.
Will set up a protocol PR if it is okay.
* deps
* speed up track publication
Add metrics for track publication and subscription
Return EnabledCodecs in JoinResponse so client can
choose codec without server side codec fallback
Cache remote webrtc track without AddTrackRequest to
let client send publisher offer before AddTrackRequest response
* go mod
* clean code
Because we aren't able to get CPU count/load info on Windows, they are
stubbed out to return placeholders. This restores compatibility to run
on Windows.
Sometimes the initial selected node could fail. In that case, we'll give it a few more attempts to locate a media node for the session instead of failing it after the first try.
Added a new manager to handle all subscription needs. Implemented using reconciler pattern. The goals are:
improve subscription resilience by separating desired state and current state
reduce complexity of synchronous processing
better detect failures with the ability to trigger full reconnect
* add prometheus stats for rtt/jitter/packet loss
* add track source to metrics
* better packet loss bins
* add track type to metrics
* remove source from AnalyticsStat
* regenerate telemetry service fake
* compute loss from per stream packet count
* initial commit
* add correct label
* clean up
* more cleanup on adding stats
* cleanup
* move things to pub and sub monitors, ensure stats are correctly updated
* fix merge conflict
* Fix panic on MacOS (#1296)
* fixing last feedback
Co-authored-by: Raja Subramanian <raja.gobi@tutanota.com>