Commit Graph

171 Commits

Author SHA1 Message Date
Raja Subramanian
787b8450e9 Record out-of-packet count/rate in prom. (#2980)
* Record out-of-packet count/rate in prom.

Adding a field to AnalyticsStream to make this easier to report.
Let me know if adding to AnalyticsStream is not ok.

Will set up a protocol PR if it is okay.

* deps
2024-09-07 00:19:54 +05:30
Raja Subramanian
bec7453a1f Recreate stats worker on resume if needed. (#2982)
* Ref count the stats worker.

NOTE: Don't liek this much, but wanted to open this get some 👀 on
this and get feedback.

There are two entities, one for counting signal bytes and another for
media stats. They both send `ParticipantJoined` and `ParticipantLeft`
event.

In the case of a participant resume, as the old web socket
connection is closed, that triggers a signal stats counter close. That
would call `ParticipantLeft` and that would close the stats worker.

The closed stats worker got reaped in `FlushStats` after three minutes.

So, all events after that did not have a worker and hence went
unreported including missing participant_left webhook because it relied
on checking if a participant was ever connected and that needed to check
the worker state.

Using a ref count to keep track of join/leaves. And not close the worker
until ref count goes down to 0.

* create a stats worker on resume

* revert incorrect changes

* transfer connected state

* transfer connected state when creating worker

* resolve participant on a resume
2024-09-06 23:58:03 +05:30
cnderrauber
947e8f5909 Speed up track publication (#2952)
* speed up track publication

Add metrics for track publication and subscription

Return EnabledCodecs in JoinResponse so client can
choose codec without server side codec fallback

Cache remote webrtc track without AddTrackRequest to
let client send publisher offer before AddTrackRequest response

* go mod

* clean code
2024-08-23 18:38:32 +08:00
Paul Wells
afda860162 prevent race in telemetry worker cleanup (#2879) 2024-07-18 03:37:45 -07:00
Lukas Herman
8a229fda9d add participant session duration metric (#2801) 2024-06-17 17:52:08 -04:00
David Zhao
ecf1175832 Generate and send uuid with analytics (#2790)
* Generate and send uuid with analytics

* go mod
2024-06-13 23:00:50 -07:00
Paul Wells
d95b59de58 update protocol (#2764)
* update protocol

* deps
2024-06-05 23:50:54 -07:00
Paul Wells
f1886ece42 update protocol (#2760)
* update protocol

* deps
2024-06-05 19:46:34 -07:00
David Zhao
b99650aaf6 Send NodeID with analytics events (#2749) 2024-06-02 09:09:55 -07:00
cnderrauber
7ed1284b96 report average forward metrics (#2737)
* report average forward metrics

* unused parameter
2024-05-28 17:03:18 +08:00
cnderrauber
2288e402ac register forward metrics (#2735) 2024-05-27 15:47:01 +08:00
Paul Wells
38470f378b add message bytes metric (#2731) 2024-05-26 14:01:13 -07:00
cnderrauber
e6aa36fdd6 Add forward stats (#2725)
* Add forward metrics

* ignore packets was not forwarded

* rename
2024-05-24 17:43:28 +08:00
Paul Wells
9a5db132eb add room/participant name limit (#2704)
* add room/participant name limit

* defaults

* simplify

* omitempty

* handle 0 config

* fix race

* unlock

* tidy
2024-05-06 17:25:18 -07:00
Paul Wells
ac1b0e38ca store active stats workers in list (#2690)
* store active stats workers in list

* test

* single node cleanup

* cleanup

* cleanup

* cleanup
2024-04-26 17:24:10 -07:00
cnderrauber
f239f8bff1 Fix SubParticipant twice when paticipant left (#2672) 2024-04-23 16:09:02 +08:00
Mathew Kamkar
10c8582a6b get cpu stats from cgroup, remove env (#2636)
* get cpu stats from cgroup, remove env

* undo rand seed removal

* tests
2024-04-08 21:15:17 -07:00
Raja Subramanian
63b1fba082 Add start/end time to AnalyticsStream. (#2618)
* Add start/end time to AnalyticsStream.

* fix test
2024-04-03 12:23:18 +05:30
Paul Wells
f1c991c547 skip logging retry message when ws disconnections before signal finishes (#2604) 2024-03-29 06:30:12 -07:00
Raja Subramanian
14321f21bf Make OpsQueueParams to make it easier to understand args. (#2578) 2024-03-14 10:27:24 +05:30
Paul Wells
ad341d41f5 start telemetry participant worker to collect signal stats (#2538)
* start telemetry participant worker to collect signal stats

* format

* resolve room

* tidy
2024-03-03 02:47:51 -08:00
Denys Smirnov
f5eb6c8a95 Update usage of core.Fuse. (#2519) 2024-02-28 03:48:58 +02:00
Raja Subramanian
7649e4ffab Post data and signal stats once in 5 minutes (#2518) 2024-02-27 15:45:32 +05:30
Paul Wells
e5b8e25064 use shared psrpc utils (#2506)
* use shared psrpc utils

* fix

* deps
2024-02-24 00:38:49 -08:00
Raja Subramanian
5ac5bd236a Let track events go through after participant close. (#2487)
* Let track events go through after participant close.

Also, reducing lock scope in telemetry service.

* use shadow
2024-02-17 13:40:07 +05:30
Mathew Kamkar
7508560fde larger buckets for jitter prometheus histogram (#2468) 2024-02-09 12:09:51 -08:00
Raja Subramanian
b71d373f4a Use Deque in ops queue. (#2418)
* Use Seque in ops queue.

Standardizing some uses
- Change OpsQueue to use Deque so that it can grow/shrink as necessary and
  need not worry about channel getting full and dropping events.
- Change StreamAllocator and TelemetryService to use OpsQueue so that
  they also need not worry about channel size and overflows.

* Address feedback

* delete obvious comment

* clean up
2024-01-28 13:48:30 +05:30
Paul Wells
c726cbf2ba increase max session start time bin size (#2380) 2024-01-12 03:49:23 -08:00
Paul Wells
2fe2a9c9f2 add session start time metric (#2377) 2024-01-11 23:23:51 -08:00
shishirng
3770fbce64 Analytics: send local node room state/info (#2335)
* Analytics: send local node room state/info

Signed-off-by: shishir gowda <shishir@livekit.io>
2023-12-22 18:59:04 -05:00
Raja Subramanian
dcff75a516 Record number of data messages in prometheus. (#2282) 2023-12-01 16:10:57 +05:30
Raja Subramanian
53542b09a0 Participant traffic load. (#2262)
* Participant traffic load.

Capturing information about participant traffic
- Upstream/Downstream
- Audio/Video/Data
- Packets/Bytes

This captures a notion of how much traffic load a participant is
generating.

Can be used to make allocation decisions.

* Clean up

* SIP patches

* reporter goroutine

* unlock

* move traffic stats from protocol

* check type
2023-11-26 23:05:00 +05:30
Raja Subramanian
56dd399684 Use a worker to report signal/data stats. (#2260)
* Use a worker to report signal/data stats.

Was checking if reporting is needed on every update.
The check is wasted work if volume of signal/data messages is high
as reporting happens only once in 10 seconds.

Changing to a worker based on a timer. And also aligning with
telemetry reporting interval which defaults to 30 seconds.

* Remove unused constant
2023-11-22 11:47:15 +05:30
Paul Wells
f4a984d446 preallocate prometheus packet counters (#1942) 2023-08-08 01:06:14 -07:00
David Zhao
981fb7cac7 Adding license notices (#1913)
* Adding license notices

* remove from config
2023-07-27 16:43:19 -07:00
Benjamin Pracht
552e3758d5 Add IngressUpdated event (#1775) 2023-06-16 10:58:49 -07:00
David Zhao
f71544e27a Do not send ParticipantJoined webhook if connection was resumed (#1795)
* Do not send ParticipantJoined webhook if connection was resumed

* isResume -> isMigration
2023-06-15 15:39:04 -07:00
shishirng
2dd4e1365b Send EgressUpdated event (#1792)
Signed-off-by: shishir gowda <shishir@livekit.io>
2023-06-14 18:56:07 -04:00
David Zhao
7e5a7ae79f Fixed windows build (#1768) 2023-06-04 00:17:25 -07:00
Benjamin Pracht
e7879a46fc Add ingress telemetry support (#1763) 2023-06-02 17:38:19 -07:00
David Zhao
956735ae05 Fix node stats updates on Windows (#1748)
Because we aren't able to get CPU count/load info on Windows, they are
stubbed out to return placeholders. This restores compatibility to run
on Windows.
2023-05-29 10:53:08 -07:00
shishirng
3de51181ec Fix setting minscore - initialized to 0 (#1725)
Signed-off-by: shishir gowda <shishir@livekit.io>
2023-05-19 11:00:32 -04:00
shishirng
2e93d386fe send min/median connection score along with avg (#1720)
* send min/median connection score along with avg
* guard against divide by zero for avg score calculation
* update median calculation

Signed-off-by: shishir gowda <shishir@livekit.io>
2023-05-18 13:50:54 -04:00
Raja Subramanian
a085afc6ee Send quality stats to prometheus. (#1708) 2023-05-12 09:44:03 +05:30
David Colburn
2ccee369a6 update notifier (#1702) 2023-05-09 20:52:22 -07:00
David Colburn
ab6c994db4 update protocol/psrpc (#1643)
* update protocol/psrpc

* metadata references
2023-04-21 12:43:20 -07:00
David Zhao
40ceddd18b Integrate QueuedNotifier, fixes out-of-order delivery (#1615) 2023-04-15 01:20:23 -07:00
Paul Wells
6636e37664 add prometheus psrpc metrics observer (#1571)
* add prometheus psrpc metrics observer

* record rpc error counts

* update psrpc

* update protocol
2023-04-05 03:50:43 -07:00
David Colburn
108b251045 egress updated webhook (#1555) 2023-03-27 16:34:44 -07:00
David Colburn
191a9e8014 update core to 0.0.5 (#1540)
* update core

* sort imports

* fix typos

* redundant types
2023-03-22 16:53:23 -07:00