Commit Graph

84 Commits

Author SHA1 Message Date
Raja Subramanian
4ed9b5f90e Revert "Using shadow pattern for stats workers (#742)" (#744)
This reverts commit 2b561d2bad.
2022-05-31 11:06:44 +05:30
Raja Subramanian
f19815754c Do not re-compute average on real time metric change (#743) 2022-05-31 10:33:17 +05:30
Raja Subramanian
2b561d2bad Using shadow pattern for stats workers (#742) 2022-05-31 10:32:54 +05:30
Raja Subramanian
508aa471a9 Track participant join total + rate in node stats (#741)
* Track participant join total + rate in node stats

* update protocol
2022-05-30 15:58:30 +05:30
cnderrauber
f958fbcc1c simulcast codecs support (#720)
simulcast codecs support 

Co-authored-by: David Zhao <dz@livekit.io>
2022-05-27 19:55:50 +08:00
Raja Subramanian
33032f6c4b Fix some test races and other things found with go test -race (#711) 2022-05-24 10:16:36 +05:30
Raja Subramanian
8ef53037eb Lock stats worker maps (#704) 2022-05-21 10:36:49 +05:30
David Zhao
79296d0939 Fixed concurrent modification to map (#702)
Synchronizes access to stats worker maps. Previously it was accessed
from both OpsQueue goroutine and run() worker
2022-05-20 13:45:13 -07:00
Raja Subramanian
012337c96a Fix sense of tranmission label (#692) 2022-05-18 12:52:05 +05:30
David Zhao
7eb3362d0a Keep track of retransmissions in NodeStats (#677) 2022-05-10 15:25:24 -07:00
David Zhao
bd7e3beda4 Improve frequency of stats update (#673)
* Improve frequency of stats update

Prometheus stats are updated as the data becomes available, instead of
aggregated along with telemetry batches. Node availability decisions can
now react much faster to these stats.

* use the same intervals for connection quality updates
2022-05-09 08:55:06 -07:00
Raja Subramanian
081b97142f Variable collision killed stats workers (#670) 2022-05-06 23:42:40 +05:30
Raja Subramanian
c6f895db15 Prevent concurrent access of stats worker map (#666) 2022-05-04 23:00:20 +05:30
David Zhao
4e5863496c Set numCPUs correctly in non-linux environment (#653) 2022-04-24 23:25:33 -07:00
David Zhao
3c53b843c5 Fixes bps and pps average computation. (#639)
Exclude NACK count from being a trigger to refresh stats.
Since NACKs are updated instantaneously without having to wait for
Telemetry updates that occurs every 10s, having even a single NACK
could cause us to compute averages prematurely.
2022-04-20 19:17:02 -07:00
David Zhao
b821a0997d Use common logging init functions (#633)
* Use common logging init functions

* update protocol commit

* fix tests
2022-04-20 00:15:11 -07:00
David Zhao
431069af95 Rename StatsUpdateFrequency -> StatsUpdateInterval 2022-04-19 22:22:58 -07:00
David Zhao
282e2aed49 Increase frequency of status updates and longer availability threshold (#628)
* Increase frequency of status updates and longer avail. threshold.

* better fix.

* fix room close test failure due to slow peer connection Close

* Perform avg computation more frequently if data has changed
2022-04-19 22:18:00 -07:00
Raja Subramanian
a19ca69f5f Prevent stats update if the deltas are empty (#619)
* Prevent stats update if the deltas are empty

* increase force interval

* static check

* Change max delay to 30 seconds
2022-04-18 22:51:34 +05:30
Raja Subramanian
a98d955284 Delta stats throughout (#615)
* Use delta stats throughout and avoid calculating deltas in telemetry

* Fix a few things after testing

* Remove debug

* Fix tests

* delete instead of setting to nil

* Point to the latest protocol
2022-04-16 21:11:32 +05:30
Raja Subramanian
92009b6428 Consistently stop tickers (#593) 2022-04-05 20:42:06 +05:30
David Colburn
0b8a180554 Code inspection (#581)
* Code inspection

* fix [4]int64 conversiong
2022-03-30 13:49:53 -07:00
shishirng
a6bb59b159 handle deltas being null leading to crash (#567)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-25 19:18:32 -04:00
shishirng
579d3d1a19 Check if current stats < prev and guard against underflow (#563)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-23 15:16:59 -04:00
Raja Subramanian
076eb1c8ae Dampen stream allocator (#551)
* WIP commit

* WIP commit

* WIP commit

* format

* NACK window

* Remove layer when it is expected to stop

* Remove debug
2022-03-22 22:23:22 +05:30
Mathew Kamkar
cf63da2e64 prometheus livekit_room_total node_id label 2022-03-21 16:43:01 -07:00
David Zhao
f14c452f8c Telemetry and webhook improvements. (#535)
* Telemetry and webhook improvements.

* avoid blocking on telemetry channel - increase channel size and drop when full
* send ParticipantJoined webhook when fully joined (i.e. on ParticipantActive)
* send TrackPublished & TrackUnpublished webhooks
* increase number of parallel webhook workers to 50

* update protocol
2022-03-18 23:20:33 -07:00
Mathew Kamkar
cac6d22a72 store cpu load in node stats (#524)
* store cpu load in node stats

* num cpus uint32

* cpu load selector test

* dep update
2022-03-16 14:51:22 -07:00
shishirng
cd2a7c2447 Telemetry: send video layers in TrackPublishedUpdate event (#500)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-10 14:49:01 -05:00
shishirng
c34b907d58 Add checks to prevent bytes/packet counts from going -ve (#499)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-09 16:51:23 -05:00
shishirng
57ecec73d7 Send participantInfo on participant left event to store identity (#498)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-09 14:35:01 -05:00
shishirng
c3a3fb569d add track publisher info in track subscribed event (#473)
* add track publisher info in track subscribed event

Signed-off-by: shishir gowda <shishir@livekit.io>

* update protocol ver

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-28 13:48:02 -05:00
Raja Subramanian
2706dc130f Replace sync/atomic usage with uber/atomic (#471) 2022-02-28 09:57:17 +05:30
Raja Subramanian
0170cc1cb6 Staticcheck (#464)
Using `go get -u honnef.co/go/tools/cmd/staticcheck`
Uneaarthed a couple of real bugs
2022-02-25 12:04:08 +05:30
David Colburn
20f21cce2b Egress (#455)
* egress updates

* pass egressInfo to delete

* update typefakes

* export StartEgress

* update protocol

* new rpc, rename stores

* add json tag

* update tests

* update protocol
2022-02-24 14:57:14 -08:00
shishirng
3e7fae96ea Add telemetry method to capture max video_quality (#457)
* Add telemetry method to capture max video_quality

Signed-off-by: shishir gowda <shishir@livekit.io>

* Telemetry fakes

Signed-off-by: shishir gowda <shishir@livekit.io>

* Update go mod dep

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-22 19:08:49 -05:00
shishirng
7fcb887eb8 use delta bytes in window to identify max layer (#442)
total_bytes is aggregate, when we switch from higher layer to lower
layer, it takes time for lower layers total_bytes to catch up to
stopped higher layers

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-17 15:15:10 -05:00
shishirng
c534099e3a fix connection_scores not being sent to telemetry during delta calc (#439)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-16 19:31:59 -05:00
shishirng
8680f6fd23 Send trackInfo object in TRACK_SUBSCRIBED event (#431)
Need track details in subscribed events

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-10 16:48:16 -05:00
shishirng
e96e8e7f97 Clean up closed tracks stats and handle -ve packet_lost (#430)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-10 11:21:17 -05:00
shishirng
6f7e6c4556 Compute delta stats to send downstream (#426)
* Compute delta stats to send downstream

Signed-off-by: shishir gowda <shishir@livekit.io>

* Update tests: total_packets should be diff between 2 packets

First packet was 1, second was 4. diff should be 3

Signed-off-by: shishir gowda <shishir@livekit.io>

* If there are no videoLayers, do not sent in Stats

For audio and Downstream tracks, we do not  get layers

Signed-off-by: shishir gowda <shishir@livekit.io>

* Use prev Max layer for current delta and update layer info for next round
2022-02-09 20:45:53 -05:00
Raja Subramanian
222b02aa73 RTT (#420)
* Consolidating PLI throttle

Use the throttler in `sfu.WebRTCReceiver`.

Does change shape of config object.

* Move PLIThrottleConfig to sfu.WebRTCReceiver

* fix test compile

* Cleaning up unused stuff

* improve readability

* RTT

- Calculate down track RTT using RTCP Receiver report
- Surface it back to the participant
- Participant updates all its published trackes
  (throttled to limit update to once in 5 seconds)
- That propagates to all the upstream sfu.Buffer and the nacker.
  So, we will have RTT throttled NACKs.

* rtt callback
2022-02-09 09:34:40 +05:30
Raja Subramanian
36289bbca7 FPS (#410)
* WIP commit

* WIP commit

* WIP commit

* WIP commit

* WIP commit

* WIP commit

* Clean up

* Clean up

* Store RTT in stats

* spelling mistake

* Make tests compile

* Fix test compilation error

* fix tests

* clone

* latest protocol
2022-02-08 12:53:14 +05:30
David Zhao
a6eb4290d3 Generate telemetry stubs (#412) 2022-02-07 23:15:24 -08:00
shishirng
32b56e0fd6 Add ParticipantActive telemetry method (#411)
* Add ParticipantActive telemetry method

Signed-off-by: shishir gowda <shishir@livekit.io>

* fix test

Signed-off-by: shishir gowda <shishir@livekit.io>

* Update go mod

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-07 17:23:39 -05:00
David Colburn
7bbd238188 clean up logs and imports (#400) 2022-02-03 14:20:19 -07:00
David Colburn
3d132730f9 replace entire nodeStats object (#393) 2022-01-31 17:09:36 -07:00
shishirng
1e156025b4 Store client meta on participant join (#380)
* Store client meta on participant join

capture region, time_to_connect, ip, node

Signed-off-by: shishir gowda <shishir@livekit.io>

* Update proto dep

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-01-27 15:44:03 -05:00
shishirng
26eea78b54 Telemetry connection scores (#377)
* octets - total bytes needs to be uint64

uint32 wraps at 4GB

Signed-off-by: shishir gowda <shishir@livekit.io>

* Cleanup stats handler to use connectionQuality stats

remove per packet rtcp handlers, buffer stats

* cleanup connection stats

* Update mediatrack to store rtcp stats in connection stats

* Update downstream handling of connection stats and telemetry

* Update telemetry tests

Signed-off-by: shishir gowda <shishir@livekit.io>

* Misc fixes

Signed-off-by: shishir gowda <shishir@livekit.io>

* Minor fix to avoid accessing buffer before its allocated

Signed-off-by: shishir gowda <shishir@livekit.io>

* start updateStats worker in AddReciever()

Signed-off-by: shishir gowda <shishir@livekit.io>

* Use previous score to calculate avg scores

* Restructure connectionStats

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-01-27 11:24:54 -05:00
shishirng
56ebd521f9 Telemetry capture published track updates (#367)
* Telemetry capture published track updates

Signed-off-by: shishir gowda <shishir@livekit.io>

* Updated OnVideoLayerUpdate to take slice of layers

Signed-off-by: shishir gowda <shishir@livekit.io>

* Update proto dep

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-01-24 14:38:04 -05:00