Commit Graph

95 Commits

Author SHA1 Message Date
David Zhao
3da908302a Do not warn when notifier isn't configured (#1043)
By default there are no webhook URLs to notify, so a notifier isn't created.
2022-09-26 13:30:27 -07:00
David Colburn
803046b882 Auto egress (#1011)
* auto egress

* fix room service test

* reuse StartTrackEgress

* add timestamp

* update prefixed filename explicitly

* update protocol

* clean up telemetry

* fix telemetry tests

* separate room internal storage

* auto participant egress

* remove custom template url

* fix internal key

* use map for stats workers

* remove sync.Map

* remove participant composite
2022-09-21 12:04:19 -07:00
cnderrauber
c401ca58af turn packet and bytes stats used for telemetry and load control (#969)
* stats for turn

* add connections stats

* stats for standalone turn server only

* wire update
2022-08-31 11:00:27 +08:00
Mathew Kamkar
767d660809 Use LocalNode ID in Prometheus metrics (#959) 2022-08-25 22:16:20 -07:00
shishirng
79cf614783 Send egressInfo in telemetry event (#941)
Signed-off-by: shishir gowda <shishir@livekit.io>

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-08-23 08:18:12 -04:00
David Zhao
b8bda3f14b Separate calls to Telemetry vs Prometheus room lifecycle (#935)
* Separate calls to Telemetry vs Prometheus room lifecycle

* remove unused import
2022-08-20 20:22:16 -07:00
shishirng
a3e8304b56 send participant info/identity during track_published event (#846)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-07-21 17:34:52 -04:00
Raja Subramanian
29039b4e76 Use a go routine to clean up stats workers. (#836)
* Use a go routine to clean up stats workers.

It is possible that certain events (like TrackUnpublished) can
happen after the participant is closed. For webhooks pertaining
to those events, need details like room name/id. So,reap stats
workers a little while after the participant left event happens.

* handle data race report

* log analytics worker reap

* debug log
2022-07-18 11:47:43 +05:30
Mathew Kamkar
e0676132d4 Packet stats from TC (#832)
* system level packet stats from tc

* drop percent

* test fix

* formatting

* formatting/wording

* prometheus metrics

* update livekit protocol go module
2022-07-15 10:41:40 -07:00
David Colburn
fbbcbe77df Remove recording (#811)
* remove recorder service

* update protocol
2022-07-05 18:39:32 -07:00
David Zhao
b316698409 Release with GoReleaser. Allow start without key configuration (#788) 2022-06-26 12:27:43 -07:00
Raja Subramanian
4ed9b5f90e Revert "Using shadow pattern for stats workers (#742)" (#744)
This reverts commit 2b561d2bad.
2022-05-31 11:06:44 +05:30
Raja Subramanian
f19815754c Do not re-compute average on real time metric change (#743) 2022-05-31 10:33:17 +05:30
Raja Subramanian
2b561d2bad Using shadow pattern for stats workers (#742) 2022-05-31 10:32:54 +05:30
Raja Subramanian
508aa471a9 Track participant join total + rate in node stats (#741)
* Track participant join total + rate in node stats

* update protocol
2022-05-30 15:58:30 +05:30
cnderrauber
f958fbcc1c simulcast codecs support (#720)
simulcast codecs support 

Co-authored-by: David Zhao <dz@livekit.io>
2022-05-27 19:55:50 +08:00
Raja Subramanian
33032f6c4b Fix some test races and other things found with go test -race (#711) 2022-05-24 10:16:36 +05:30
Raja Subramanian
8ef53037eb Lock stats worker maps (#704) 2022-05-21 10:36:49 +05:30
David Zhao
79296d0939 Fixed concurrent modification to map (#702)
Synchronizes access to stats worker maps. Previously it was accessed
from both OpsQueue goroutine and run() worker
2022-05-20 13:45:13 -07:00
Raja Subramanian
012337c96a Fix sense of tranmission label (#692) 2022-05-18 12:52:05 +05:30
David Zhao
7eb3362d0a Keep track of retransmissions in NodeStats (#677) 2022-05-10 15:25:24 -07:00
David Zhao
bd7e3beda4 Improve frequency of stats update (#673)
* Improve frequency of stats update

Prometheus stats are updated as the data becomes available, instead of
aggregated along with telemetry batches. Node availability decisions can
now react much faster to these stats.

* use the same intervals for connection quality updates
2022-05-09 08:55:06 -07:00
Raja Subramanian
081b97142f Variable collision killed stats workers (#670) 2022-05-06 23:42:40 +05:30
Raja Subramanian
c6f895db15 Prevent concurrent access of stats worker map (#666) 2022-05-04 23:00:20 +05:30
David Zhao
4e5863496c Set numCPUs correctly in non-linux environment (#653) 2022-04-24 23:25:33 -07:00
David Zhao
3c53b843c5 Fixes bps and pps average computation. (#639)
Exclude NACK count from being a trigger to refresh stats.
Since NACKs are updated instantaneously without having to wait for
Telemetry updates that occurs every 10s, having even a single NACK
could cause us to compute averages prematurely.
2022-04-20 19:17:02 -07:00
David Zhao
b821a0997d Use common logging init functions (#633)
* Use common logging init functions

* update protocol commit

* fix tests
2022-04-20 00:15:11 -07:00
David Zhao
431069af95 Rename StatsUpdateFrequency -> StatsUpdateInterval 2022-04-19 22:22:58 -07:00
David Zhao
282e2aed49 Increase frequency of status updates and longer availability threshold (#628)
* Increase frequency of status updates and longer avail. threshold.

* better fix.

* fix room close test failure due to slow peer connection Close

* Perform avg computation more frequently if data has changed
2022-04-19 22:18:00 -07:00
Raja Subramanian
a19ca69f5f Prevent stats update if the deltas are empty (#619)
* Prevent stats update if the deltas are empty

* increase force interval

* static check

* Change max delay to 30 seconds
2022-04-18 22:51:34 +05:30
Raja Subramanian
a98d955284 Delta stats throughout (#615)
* Use delta stats throughout and avoid calculating deltas in telemetry

* Fix a few things after testing

* Remove debug

* Fix tests

* delete instead of setting to nil

* Point to the latest protocol
2022-04-16 21:11:32 +05:30
Raja Subramanian
92009b6428 Consistently stop tickers (#593) 2022-04-05 20:42:06 +05:30
David Colburn
0b8a180554 Code inspection (#581)
* Code inspection

* fix [4]int64 conversiong
2022-03-30 13:49:53 -07:00
shishirng
a6bb59b159 handle deltas being null leading to crash (#567)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-25 19:18:32 -04:00
shishirng
579d3d1a19 Check if current stats < prev and guard against underflow (#563)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-23 15:16:59 -04:00
Raja Subramanian
076eb1c8ae Dampen stream allocator (#551)
* WIP commit

* WIP commit

* WIP commit

* format

* NACK window

* Remove layer when it is expected to stop

* Remove debug
2022-03-22 22:23:22 +05:30
Mathew Kamkar
cf63da2e64 prometheus livekit_room_total node_id label 2022-03-21 16:43:01 -07:00
David Zhao
f14c452f8c Telemetry and webhook improvements. (#535)
* Telemetry and webhook improvements.

* avoid blocking on telemetry channel - increase channel size and drop when full
* send ParticipantJoined webhook when fully joined (i.e. on ParticipantActive)
* send TrackPublished & TrackUnpublished webhooks
* increase number of parallel webhook workers to 50

* update protocol
2022-03-18 23:20:33 -07:00
Mathew Kamkar
cac6d22a72 store cpu load in node stats (#524)
* store cpu load in node stats

* num cpus uint32

* cpu load selector test

* dep update
2022-03-16 14:51:22 -07:00
shishirng
cd2a7c2447 Telemetry: send video layers in TrackPublishedUpdate event (#500)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-10 14:49:01 -05:00
shishirng
c34b907d58 Add checks to prevent bytes/packet counts from going -ve (#499)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-09 16:51:23 -05:00
shishirng
57ecec73d7 Send participantInfo on participant left event to store identity (#498)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-09 14:35:01 -05:00
shishirng
c3a3fb569d add track publisher info in track subscribed event (#473)
* add track publisher info in track subscribed event

Signed-off-by: shishir gowda <shishir@livekit.io>

* update protocol ver

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-28 13:48:02 -05:00
Raja Subramanian
2706dc130f Replace sync/atomic usage with uber/atomic (#471) 2022-02-28 09:57:17 +05:30
Raja Subramanian
0170cc1cb6 Staticcheck (#464)
Using `go get -u honnef.co/go/tools/cmd/staticcheck`
Uneaarthed a couple of real bugs
2022-02-25 12:04:08 +05:30
David Colburn
20f21cce2b Egress (#455)
* egress updates

* pass egressInfo to delete

* update typefakes

* export StartEgress

* update protocol

* new rpc, rename stores

* add json tag

* update tests

* update protocol
2022-02-24 14:57:14 -08:00
shishirng
3e7fae96ea Add telemetry method to capture max video_quality (#457)
* Add telemetry method to capture max video_quality

Signed-off-by: shishir gowda <shishir@livekit.io>

* Telemetry fakes

Signed-off-by: shishir gowda <shishir@livekit.io>

* Update go mod dep

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-22 19:08:49 -05:00
shishirng
7fcb887eb8 use delta bytes in window to identify max layer (#442)
total_bytes is aggregate, when we switch from higher layer to lower
layer, it takes time for lower layers total_bytes to catch up to
stopped higher layers

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-17 15:15:10 -05:00
shishirng
c534099e3a fix connection_scores not being sent to telemetry during delta calc (#439)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-16 19:31:59 -05:00
shishirng
8680f6fd23 Send trackInfo object in TRACK_SUBSCRIBED event (#431)
Need track details in subscribed events

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-10 16:48:16 -05:00