Commit Graph

96 Commits

Author SHA1 Message Date
shishirng 8dc5a899a9 Create stats worker for participant on Active if not exists (#1059)
on migration, participants don't send JOIN event, so create it in
PARTICIPANT_ACTIVE event
2022-09-29 17:26:43 -04:00
David Zhao 3da908302a Do not warn when notifier isn't configured (#1043)
By default there are no webhook URLs to notify, so a notifier isn't created.
2022-09-26 13:30:27 -07:00
David Colburn 803046b882 Auto egress (#1011)
* auto egress

* fix room service test

* reuse StartTrackEgress

* add timestamp

* update prefixed filename explicitly

* update protocol

* clean up telemetry

* fix telemetry tests

* separate room internal storage

* auto participant egress

* remove custom template url

* fix internal key

* use map for stats workers

* remove sync.Map

* remove participant composite
2022-09-21 12:04:19 -07:00
cnderrauber c401ca58af turn packet and bytes stats used for telemetry and load control (#969)
* stats for turn

* add connections stats

* stats for standalone turn server only

* wire update
2022-08-31 11:00:27 +08:00
Mathew Kamkar 767d660809 Use LocalNode ID in Prometheus metrics (#959) 2022-08-25 22:16:20 -07:00
shishirng 79cf614783 Send egressInfo in telemetry event (#941)
Signed-off-by: shishir gowda <shishir@livekit.io>

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-08-23 08:18:12 -04:00
David Zhao b8bda3f14b Separate calls to Telemetry vs Prometheus room lifecycle (#935)
* Separate calls to Telemetry vs Prometheus room lifecycle

* remove unused import
2022-08-20 20:22:16 -07:00
shishirng a3e8304b56 send participant info/identity during track_published event (#846)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-07-21 17:34:52 -04:00
Raja Subramanian 29039b4e76 Use a go routine to clean up stats workers. (#836)
* Use a go routine to clean up stats workers.

It is possible that certain events (like TrackUnpublished) can
happen after the participant is closed. For webhooks pertaining
to those events, need details like room name/id. So,reap stats
workers a little while after the participant left event happens.

* handle data race report

* log analytics worker reap

* debug log
2022-07-18 11:47:43 +05:30
Mathew Kamkar e0676132d4 Packet stats from TC (#832)
* system level packet stats from tc

* drop percent

* test fix

* formatting

* formatting/wording

* prometheus metrics

* update livekit protocol go module
2022-07-15 10:41:40 -07:00
David Colburn fbbcbe77df Remove recording (#811)
* remove recorder service

* update protocol
2022-07-05 18:39:32 -07:00
David Zhao b316698409 Release with GoReleaser. Allow start without key configuration (#788) 2022-06-26 12:27:43 -07:00
Raja Subramanian 4ed9b5f90e Revert "Using shadow pattern for stats workers (#742)" (#744)
This reverts commit 2b561d2bad.
2022-05-31 11:06:44 +05:30
Raja Subramanian f19815754c Do not re-compute average on real time metric change (#743) 2022-05-31 10:33:17 +05:30
Raja Subramanian 2b561d2bad Using shadow pattern for stats workers (#742) 2022-05-31 10:32:54 +05:30
Raja Subramanian 508aa471a9 Track participant join total + rate in node stats (#741)
* Track participant join total + rate in node stats

* update protocol
2022-05-30 15:58:30 +05:30
cnderrauber f958fbcc1c simulcast codecs support (#720)
simulcast codecs support 

Co-authored-by: David Zhao <dz@livekit.io>
2022-05-27 19:55:50 +08:00
Raja Subramanian 33032f6c4b Fix some test races and other things found with go test -race (#711) 2022-05-24 10:16:36 +05:30
Raja Subramanian 8ef53037eb Lock stats worker maps (#704) 2022-05-21 10:36:49 +05:30
David Zhao 79296d0939 Fixed concurrent modification to map (#702)
Synchronizes access to stats worker maps. Previously it was accessed
from both OpsQueue goroutine and run() worker
2022-05-20 13:45:13 -07:00
Raja Subramanian 012337c96a Fix sense of tranmission label (#692) 2022-05-18 12:52:05 +05:30
David Zhao 7eb3362d0a Keep track of retransmissions in NodeStats (#677) 2022-05-10 15:25:24 -07:00
David Zhao bd7e3beda4 Improve frequency of stats update (#673)
* Improve frequency of stats update

Prometheus stats are updated as the data becomes available, instead of
aggregated along with telemetry batches. Node availability decisions can
now react much faster to these stats.

* use the same intervals for connection quality updates
2022-05-09 08:55:06 -07:00
Raja Subramanian 081b97142f Variable collision killed stats workers (#670) 2022-05-06 23:42:40 +05:30
Raja Subramanian c6f895db15 Prevent concurrent access of stats worker map (#666) 2022-05-04 23:00:20 +05:30
David Zhao 4e5863496c Set numCPUs correctly in non-linux environment (#653) 2022-04-24 23:25:33 -07:00
David Zhao 3c53b843c5 Fixes bps and pps average computation. (#639)
Exclude NACK count from being a trigger to refresh stats.
Since NACKs are updated instantaneously without having to wait for
Telemetry updates that occurs every 10s, having even a single NACK
could cause us to compute averages prematurely.
2022-04-20 19:17:02 -07:00
David Zhao b821a0997d Use common logging init functions (#633)
* Use common logging init functions

* update protocol commit

* fix tests
2022-04-20 00:15:11 -07:00
David Zhao 431069af95 Rename StatsUpdateFrequency -> StatsUpdateInterval 2022-04-19 22:22:58 -07:00
David Zhao 282e2aed49 Increase frequency of status updates and longer availability threshold (#628)
* Increase frequency of status updates and longer avail. threshold.

* better fix.

* fix room close test failure due to slow peer connection Close

* Perform avg computation more frequently if data has changed
2022-04-19 22:18:00 -07:00
Raja Subramanian a19ca69f5f Prevent stats update if the deltas are empty (#619)
* Prevent stats update if the deltas are empty

* increase force interval

* static check

* Change max delay to 30 seconds
2022-04-18 22:51:34 +05:30
Raja Subramanian a98d955284 Delta stats throughout (#615)
* Use delta stats throughout and avoid calculating deltas in telemetry

* Fix a few things after testing

* Remove debug

* Fix tests

* delete instead of setting to nil

* Point to the latest protocol
2022-04-16 21:11:32 +05:30
Raja Subramanian 92009b6428 Consistently stop tickers (#593) 2022-04-05 20:42:06 +05:30
David Colburn 0b8a180554 Code inspection (#581)
* Code inspection

* fix [4]int64 conversiong
2022-03-30 13:49:53 -07:00
shishirng a6bb59b159 handle deltas being null leading to crash (#567)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-25 19:18:32 -04:00
shishirng 579d3d1a19 Check if current stats < prev and guard against underflow (#563)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-23 15:16:59 -04:00
Raja Subramanian 076eb1c8ae Dampen stream allocator (#551)
* WIP commit

* WIP commit

* WIP commit

* format

* NACK window

* Remove layer when it is expected to stop

* Remove debug
2022-03-22 22:23:22 +05:30
Mathew Kamkar cf63da2e64 prometheus livekit_room_total node_id label 2022-03-21 16:43:01 -07:00
David Zhao f14c452f8c Telemetry and webhook improvements. (#535)
* Telemetry and webhook improvements.

* avoid blocking on telemetry channel - increase channel size and drop when full
* send ParticipantJoined webhook when fully joined (i.e. on ParticipantActive)
* send TrackPublished & TrackUnpublished webhooks
* increase number of parallel webhook workers to 50

* update protocol
2022-03-18 23:20:33 -07:00
Mathew Kamkar cac6d22a72 store cpu load in node stats (#524)
* store cpu load in node stats

* num cpus uint32

* cpu load selector test

* dep update
2022-03-16 14:51:22 -07:00
shishirng cd2a7c2447 Telemetry: send video layers in TrackPublishedUpdate event (#500)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-10 14:49:01 -05:00
shishirng c34b907d58 Add checks to prevent bytes/packet counts from going -ve (#499)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-09 16:51:23 -05:00
shishirng 57ecec73d7 Send participantInfo on participant left event to store identity (#498)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-03-09 14:35:01 -05:00
shishirng c3a3fb569d add track publisher info in track subscribed event (#473)
* add track publisher info in track subscribed event

Signed-off-by: shishir gowda <shishir@livekit.io>

* update protocol ver

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-28 13:48:02 -05:00
Raja Subramanian 2706dc130f Replace sync/atomic usage with uber/atomic (#471) 2022-02-28 09:57:17 +05:30
Raja Subramanian 0170cc1cb6 Staticcheck (#464)
Using `go get -u honnef.co/go/tools/cmd/staticcheck`
Uneaarthed a couple of real bugs
2022-02-25 12:04:08 +05:30
David Colburn 20f21cce2b Egress (#455)
* egress updates

* pass egressInfo to delete

* update typefakes

* export StartEgress

* update protocol

* new rpc, rename stores

* add json tag

* update tests

* update protocol
2022-02-24 14:57:14 -08:00
shishirng 3e7fae96ea Add telemetry method to capture max video_quality (#457)
* Add telemetry method to capture max video_quality

Signed-off-by: shishir gowda <shishir@livekit.io>

* Telemetry fakes

Signed-off-by: shishir gowda <shishir@livekit.io>

* Update go mod dep

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-22 19:08:49 -05:00
shishirng 7fcb887eb8 use delta bytes in window to identify max layer (#442)
total_bytes is aggregate, when we switch from higher layer to lower
layer, it takes time for lower layers total_bytes to catch up to
stopped higher layers

Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-17 15:15:10 -05:00
shishirng c534099e3a fix connection_scores not being sent to telemetry during delta calc (#439)
Signed-off-by: shishir gowda <shishir@livekit.io>
2022-02-16 19:31:59 -05:00