livekit

mirror of https://github.com/livekit/livekit.git synced 2026-05-24 19:05:36 +00:00

Author	SHA1	Message	Date
Raja Subramanian	cfe3178542	Reconcile RTP stats with RTX data. (#3252 ) * RTX RTPStats * WIP * RTCP RTX handler * reconcile rtx * cache size * clean up * test * clean up	2024-12-15 14:33:02 +05:30
cnderrauber	54f9f7de51	upgrade to pion/webrtc v4 (#3213 )	2024-11-28 16:05:38 +08:00
Raja Subramanian	9ac48e2984	Grab time under lock. (#3100 ) Revert part of my previous commit. I vaguely remembered there was a reason for having code like that, but did not remember the details and ended up consolidating. The issue is that time needs to be grabbed under lock so that two events happening close to each other do not get order swapped.	2024-10-15 22:39:43 +05:30
Raja Subramanian	8b604df32a	Set FEC enabled properly in connection stats module. (#3098 ) * Set FEC enabled properly in connection stats module. With RED, the FEC indication is in primary codec. Also, clean up some bits that were not necessary (TrackInfoAvailable is not needed) TODO: There are still a couple of things to figure out - If codec is RED, Opus is added as second codec synthetically using https://github.com/livekit/livekit/blob/33098337fc17705bbdb3283c7a7034aa6b2f3745/pkg/rtc/mediaengine.go#L31 which hard codecs FEC enabled. Ideally, we should get the primary codec parameters from SDP offer. - The WebRTCReceiver does not have information about primary codec. For now, just setting FEC to true when RED is enabled. It is okay as it just affects when we declare quality drops, but ideally the primary codec should be retrieved from SDP offer. * clean up and comment * full prop check	2024-10-15 17:39:42 +05:30
Raja Subramanian	d052caa104	Use PPS mode rather than max to adjust packet loss weight. (#3095 )	2024-10-14 20:16:19 +05:30
Raja Subramanian	a8da4872b1	Drop quality a bit faster on score trending lower to be more responsive. (#3093 ) Also, logging a bit more about quality changes to understand why high(ish) loss does not drop quality. Will remove the loss thresholded logging after collecting some data.	2024-10-14 17:21:42 +05:30
Raja Subramanian	f154b236b5	Fix down stream packet loss reporting. (#3092 ) * Fix down stream packet loss reporting. * format	2024-10-14 11:08:10 +05:30
Raja Subramanian	8ac33a868c	Splitting out rtp stats stuff into its own package. (#3060 ) * Splitting out rtp stats stuff into its own package. Going to be making some lighter versions of these. Will be cleaner to have all of these grouped together. So, as a first step, just making a package for it. * tests	2024-10-03 15:51:24 +05:30
Raja Subramanian	d53f732ada	Do not take padding packets into account in max pps calculation (#2990 )	2024-09-09 11:08:50 +05:30
Raja Subramanian	787b8450e9	Record out-of-packet count/rate in prom. (#2980 ) * Record out-of-packet count/rate in prom. Adding a field to AnalyticsStream to make this easier to report. Let me know if adding to AnalyticsStream is not ok. Will set up a protocol PR if it is okay. * deps	2024-09-07 00:19:54 +05:30
Raja Subramanian	f9f761b223	Demote some less useful/noisy logs. (#2743 )	2024-05-29 12:05:18 +05:30
Raja Subramanian	71b5ffed93	Less confusing variable name (#2706 )	2024-05-08 16:10:49 +05:30
Raja Subramanian	af0b0c4734	Connection quality LOST only if RTCP is also not available. (#2670 ) * Connection quality LOST only if RTCP is also not available. It is possible that sender stops all layers of video due to some constraint (CPU or bandwidth). Packet reception going dry due to that should not trigger `LOST` quality. Add last received RTCP time also to distinguish the case of real `LOST` and sender stopping traffic. Some bits to watch for - With audio, RTCP reports could be more than 5 seconds apart (5 seconds is the default interval for connection quality scorer), but audio senders usually send silence packets even when there is no input. So audio completely stopping can be considered `LOST`. - With video, have to observe if all clients continue to send RTCP even if all layers are stopped. - RTCP bandwidth is not supposed to exceed the primary stream bandwidth. libwebrtc calculates that and spaces out RTCP reports accordingly. That is the reason why audio reports are that far apart. If a video stream is encoded at a very low bit rate, it could also be sending RTCP rarely. So, there is the case of LOST being indistinguishable from sender stopping all layers. But, this should be a rare case. * typo	2024-04-21 23:35:24 +05:30
Raja Subramanian	ec41d20f81	Reduce RED weight in half. (#2648 )	2024-04-12 20:39:53 +05:30
wanshuangcheng	e1b68012a1	chore: fix typos in comment (#2634 ) Signed-off-by: wanshuangcheng <wanshuangcheng@outlook.com>	2024-04-10 09:27:48 -07:00
Raja Subramanian	63b1fba082	Add start/end time to AnalyticsStream. (#2618 ) * Add start/end time to AnalyticsStream. * fix test	2024-04-03 12:23:18 +05:30
Raja Subramanian	45581433cc	Add option to enable bitrate based scoring (#2600 )	2024-03-27 18:45:53 +05:30
Raja Subramanian	ea66eae9f5	Start moving things to structured logging (#2527 )	2024-02-29 14:35:19 +05:30
Denys Smirnov	f5eb6c8a95	Update usage of core.Fuse. (#2519 )	2024-02-28 03:48:58 +02:00
Raja Subramanian	174e69c81d	Restore min score to 30. (#2435 ) Was at 20 when LOST was introduced, but was going to 20 even when under not LOST conditions. When there are packets, want the min to be at 30. Going down to 20 resulted in reporting LOST quality even when packets were flowing (although they were experiencing heavy loss and quality would have been very bad, yet they are not lost). Also, sample warning about adding packet to bucket even more.	2024-02-02 08:52:52 +05:30
Raja Subramanian	a2053dfd94	ConnectionQuality DISCONNECTED -> LOST (#2276 )	2023-11-29 23:17:17 +05:30
Raja Subramanian	396371312b	Use variables for score -> quality mapping (#2268 ) * Use variables for score -> quality mapping * spelling	2023-11-28 11:51:21 +05:30
Raja Subramanian	5f76d1adcc	Introduce `DISCONNECTED` connection quality. (#2265 ) * Introduce `DISCONNECTED` connection quality. Currently, this state happens when any up stream track does not send any packets in an analysis window when it is expected to send packets. This can be used by participants to know the quality of a potentially disconnected participant. Previously, it took 20 - 30 seconds for the stale timeout to kick in and disconnect the limbo participant which triggered a participant update through which other participants knew about it. Previously, `POOR` quality was also overloaded to denote that the up stream is not sending any packets. With this change, that is a separate indicator, i. e. `DISCONNECTED`. * clean up * Update deps * spelling	2023-11-27 23:06:53 +05:30
Raja Subramanian	2cf751d261	Use timer in scorer lock scope. (#2066 ) Using time from outside make anachronous samples in expected distance/bit rate measurement. So, have to let the time be snap shotted in scorer lock scope.	2023-09-13 01:38:34 +05:30
Raja Subramanian	254a35543d	Fix down stream stats. (#2063 ) Need to pass in the correct time. Previously streaming start was determined by another delta snap shot which as removed for efficiency. Did not realise that we were passing in zero time for stats. Also, revert of the change (the part which did not re-pause) from this PR (https://github.com/livekit/livekit/pull/2037). That change affects other paths. The edge it was trying to fix is more rare. Need to think about a way which covers all cases.	2023-09-12 08:34:28 +05:30
Raja Subramanian	1b20b8f1ac	Make interface for connection stats. (#2056 ) * Make interface for connection stats. Implement suggestion from @paulwe to clean that up a bit. * fix test	2023-09-11 08:39:33 +05:30
Raja Subramanian	c09d8d0878	Split RTPStats into receiver and sender. (#2055 ) * Split RTPStats into receiver and sender. For receiver, short types are input and need to calculate extended type. For sender (subscriber), it can operate only in extended type. This makes the subscriber side a little simpler and should make it more efficient as it can do simple comparisons in extended type space. There was also an issue with subscriber using shorter type and calculating extended type. When subscriber starts after the publisher has already rolled over in sequence number OR timestamp, when subsequent publisher side sender reports are used to adjust subscriber time stamps, they were out of whack. Using extended type on subscriber does not face that. * fix test * extended types from sequencer * log	2023-09-11 07:33:39 +05:30
Raja Subramanian	b95670f56b	Removing one snapshot in down track. (#2047 ) Profiling showed updating jitter going through the snapshot maps. With the reduction of one, there should only be one snapshot and hopefully that should gain some cycles back.	2023-09-07 22:22:00 +05:30
David Zhao	981fb7cac7	Adding license notices (#1913 ) * Adding license notices * remove from config	2023-07-27 16:43:19 -07:00
Raja Subramanian	5459bd2931	Push track quality to poor on a bandwidth constrained pause. (#1867 ) * Push track quality to poor on a bandwidth constrained pause. * add tests * scale distance by divisor * fix test distance to desired * wait longer for subscription manager to reconcile	2023-07-11 15:29:35 +05:30
Raja Subramanian	e6f5f2f344	Prevent anachronous sample reading. (#1863 ) * Prevenet anachronous sample reading. Not so pretty way of solving this. Please let me know if you have thoughts. Passing in time allows testing easier. But, that also leads to time reversal problems. Example scenario 1. Connection stats worker gets a time and initiates quality calculation. 2. A layer transition is recorded after that. 3. By the time, scorer is called to calculate score with time from Step 1, there is time reversal and results in anachronous sample. One option is to use a scorer lock in connection stats module and wrap all calls to scorer in that lock, but that does not prevent the passed in time stamps themselves getting out of order. Also, stand alond use of scorer in some other context will be problematic. Doing the hybrid thing of taking current time in scorer if passed in time is zero so that scorer lock domain controls it. * use zero time everywhere in normal flow * make APIs with and without time passed in as Paul suggested	2023-07-10 08:39:52 +05:30
Raja Subramanian	e3954d1d64	Use timed aggregator. (#1843 ) * Use timed aggregator. For aggregate bitrate and average distance from desired. Also, clean up debug added to track leak. * update deps	2023-07-01 10:21:15 +05:30
Raja Subramanian	496656627e	Logging more to understand layer transition leak better. (#1840 )	2023-06-30 11:59:53 +05:30
Raja Subramanian	cea41e4189	Discount out-of-order packets in downstream score. (#1831 ) * Discount out-of-order packets in downstream score. More notes inline. * correct comment * clean up comment	2023-06-27 17:44:53 +05:30
Raja Subramanian	72ed5b19f7	Use receiver report stats for loss/rtt/jitter. (#1781 ) * Use receiver report stats for loss/rtt/jitter. Reversing a bit of https://github.com/livekit/livekit/pull/1664. That PR did two snapshots (one based on what SFU is sending and one based on combination of what SFU is sending reconciled with stats reported from client via RTCP Receiver Report). That PR reported SFU only view to analytics. But, that view does not have information about loss seen by client in the downstream. Also, that does not have RTT/jitter information. The rationale behind using SFU only view is that SFU should report what it sends irrespective of client is receiving or not. But, that view did not have proper loss/RTT/jitter. So, switch back to reporting SFU + receiver report reconciled view. The down side is that when receiver reports are not receiver, packets sent/bytes sent will not be reported to analytics. An option is to report SFU only view if there are no receiver reports. But, it becomes complex because of the offset. Receiver report would acknowledge certain range whereas SFU only view could be different because of propagation delay. To simplify, just using the reconciled view to report to analytics. Using the available view will require a bunch more work to produce accurate data. (NOTE: all this started due to a bug where RTCP was not restarted on a track resume which killed receiver reports and we went on this path to distinguish between publisher stopping vs RTCP receiver report not happening) One optimisation to here here concerns the check to see if publisher is sending data. Using a full DeltaInfo for that is an overkill. Can do a lighter weight for that later. * return available streams * fix test	2023-06-09 23:31:25 +05:30
Raja Subramanian	1d3faefc5e	More scoring tweaks (#1719 ) 1. Completely removing RTT and jitter from score calculation. Need to do more work there. a. Jitter is slow moving (RFC 3550 formula is designed that way). But, we still get high values at times. Ideally, that should penalise the score, but due to jitter buffer, effect may not be too bad. b. Need to smooth RTT. It is based on receiver report and if one sample causes a high number, score could be penalised (this was being used in down track direction only). One option is to smooth it like the jitter formula above and try using it. But, for now, disabling that also. 2. When receiving lesser number of packets (for example DTX), reduce the weight of packet loss with a quadratic relationship to packet loss ratio. Previously using a square root and it was potentially weighting it too high. For example, if only 5 packets were received due to DTX instead of 50, we were still giving 30% weight (sqrt(0.1)). Now, it gets 1% weight. So, if one of those 5 packets were lost (20% packet loss ratio), it still does not get much weight as the number of packets is low., 3. Slightly slower decrease in score (in EWMA) 4. When using RED, increase packet loss weight thresholds to be able to take more loss before penalizing score.	2023-05-18 20:16:43 +05:30
Raja Subramanian	28a8a808f2	Do not add empty video layers in stats. (#1685 )	2023-05-05 08:59:08 +05:30
Raja Subramanian	50ab72a5f8	DownTrack scoring when RR is not received. (#1664 )	2023-04-28 14:50:06 +05:30
Raja Subramanian	c1c4e8aea0	Include packetsMissing field in string representation (#1659 ) * Include packetsMissing field in string representation * do not set stub directly	2023-04-27 14:39:05 +05:30
Raja Subramanian	9db46bb866	Avoid divide-by-zero and NaN (#1656 )	2023-04-26 21:29:25 +05:30
Raja Subramanian	09c0b25787	Ensure that RR is not received for a while before running scorer on nil (#1653 ) data. Without the check, it was getting tripped by publisher not publishing any data. Both conditions returned nil, but in one case, the receiver report should have been received, but no movement in number of packets.	2023-04-24 23:39:30 +05:30
Raja Subramanian	a9fe9f331c	Run quality scorer when there are no streams. (#1633 ) * Run quality scorer when there are no streams. In the down stream direction, receiver report is used for scoring. If there are no receiver reports, it should go to `dry` state and report poor quality. Update scorer on dry condition only when update score has not happened for longer than some multiple of update interval. Cannot update on every interval when there are no streams as receiver report might be just missed. Waiting for longer to ensure that report is definitely not received. * update last stats time	2023-04-19 13:05:43 +05:30
Raja Subramanian	59961c1992	Aggregate method for RTPDeltaInfo (#1562 )	2023-03-30 13:34:52 +05:30
Raja Subramanian	de86ccb3df	Calculate stats duration. (#1554 )	2023-03-28 07:18:31 +05:30
Raja Subramanian	db2e9f1f8b	Use layer 2 for SVC always. (#1542 ) This fixes the case of screen share forwarding. We should probably also look at proper AddTrack. The problem was that - AddTrack used two layers for screen share from JS sample app - Track was published with rid = f. Given that and the track info, consistent layer mapping set the layer as 1. - `getBufferLocked` always uses the highest layer for SVC - Between the two, when down track was requesting PLI, there was no buffer at the requested layer and hence no PLI went out. A few other notes - Tried locking SVC to layer 0 (instead of layer 2), but that resulted in PLI layer lock spamming. It did not happen in v1.3.0 of the server though. Not sure what causes that. Need to investigate later. But, that does not happen when using layer 2 buffer as SVC buffer. - When using layer 2 for SVC, the PLI throttle config will be using that of layer 2. Is that okay? - `buffer` structure should maintain more stats about spatial layers for SVC case so that layer stats can be reported to analytics/scoring etc. - In general, `buffer` may need some more hooks to make it SVC aware so that it can handle various spetial layer aware/specific bits.	2023-03-23 13:16:14 +05:30
David Colburn	191a9e8014	update core to 0.0.5 (#1540 ) * update core * sort imports * fix typos * redundant types	2023-03-22 16:53:23 -07:00
Raja Subramanian	e7c5872758	Dependent RTT/jitter control. (#1537 )	2023-03-22 11:59:32 +05:30
Raja Subramanian	f782c8956d	Extend range of `GOOD` scores. (#1536 ) Empirically, the experience is not bad for a larger range. So, triggering POOR too early causes confusion.	2023-03-22 11:36:30 +05:30
Raja Subramanian	f770f0cb67	Use pointer to struct in logging (#1530 )	2023-03-19 21:57:35 +05:30
Raja Subramanian	aeefbb080e	Account for time before measurement available in connection quality. (#1528 )	2023-03-19 18:34:56 +05:30

1 2

93 Commits