Commit Graph

2868 Commits

Author SHA1 Message Date
Raja Subramanian 47324abd0e Drop run away receiver reports. (#4170) 2025-12-17 21:58:47 +05:30
Paul Wells 462ec324be prevent uint overflow setting packet not found count (#4169) 2025-12-17 06:54:23 -08:00
Raja Subramanian 5c841b8ea1 Some logging changes. (#4168)
* Some logging changes.

Trying to chase a case of large sequence number gap on subscriber side
where packets are sent after a long time.

* return values instead of logging
2025-12-17 18:05:29 +05:30
Paul Wells 2f2d0a5735 skip lost sequence number ranges in getIntervalStats (#4166) 2025-12-17 00:02:51 -08:00
Paul Wells 898ebe058c clean up manual roomservice log redaction (#4165)
* clean up manual roomservice log redaction

* deps
2025-12-16 23:02:45 -08:00
Paul Wells 3e41725395 move delete to oss service store (#4164) 2025-12-16 22:00:43 -08:00
Raja Subramanian 5964efbba5 Ensure subscribe data track handles are unique (#4162) 2025-12-16 13:52:05 +05:30
Raja Subramanian a26c48304a Add support for RTP stream restart. (#4161)
* Add support for RTP stream restart.

When an unhandled packet is encountered, try a restart sequence.
Restart happens when 5 packets with contiguous sequence numbers and same
or increasing time stamps are received. Note that this does not work for
B-frame type of scenarios, but that is true for receive path handling
even before this. As WebRTC does not use B-frames, it is fine. But,
needs to be looked at again if B-frames are necessary.

It is controlled by a config that is disabled by default.

* clean up

* debug log
2025-12-16 13:21:39 +05:30
Raja Subramanian 386f0b3822 fix typo in clearing index when removing track from room track manager (#4158) 2025-12-14 19:57:39 +05:30
Paul Wells 0abfb2515c deregister observability function when participant is closed (#4157)
* deregister observability function when participant is closed

* tidy
2025-12-14 04:21:07 -08:00
Raja Subramanian 97aba5e77b Consistently undo update to sequence number and timestamp when the (#4156)
incoming packet cannot be sequenced.
2025-12-13 15:46:04 +05:30
Raja Subramanian 2317c29531 Fix panic while removing track from room track manager. (#4153)
Cannot range and use `slices.Delete` in the loop.
2025-12-12 14:19:18 +05:30
Raja Subramanian a0a28ac5e3 Avoid duplicate track add to room track manager. (#4152)
* Avoid duplicate track add to room track manager.

Don't have proof that this happens, but in the leak chase, this is
another component at room level and holds references to tracks. Guessing
this is not cleaning tracks till room is closed.

* add a report
2025-12-12 12:42:04 +05:30
Raja Subramanian f01008f876 Revert telemetry stats worker wait configuration. (#4151)
Mostly reverting https://github.com/livekit/livekit/pull/4148. Leaving
the one bit to pass in a wait time to `Flush`.
2025-12-12 10:56:25 +05:30
Raja Subramanian ca4b56d2d5 Handle case of sequence number jump just after start. (#4150)
It is possible that the stream stops just after start and
restarts much later introducing a large gap in sequence number.
That could look like an unhandled case because the wrap back handler
does not have enough packets yet.

Let other checks based on time stamp gap take effect and only if that
also leaves the sequence number unhandled, drop the packet.
2025-12-12 00:29:15 +05:30
Raja Subramanian 97099cae3e Configurable telemetry stats worker clean up wait. (#4148)
* Configurable telemetry stats worker clean up wait.

* make worker clean up wait setting atomic
2025-12-11 11:25:32 +05:30
changgesi d7db7cb389 chore: fix a large number of spelling issues (#4147)
Signed-off-by: changgesi <changgesi@outlook.com>
2025-12-11 09:34:13 +05:30
Raja Subramanian 498304cdd9 defensive nil check (#4144) 2025-12-10 13:33:08 +05:30
Raja Subramanian 20f6a49780 Store ddParser in atomic.Pointer (#4143)
* Store ddParser in atomic.Pointer

as release is handled outside lock

* log space

* make non-struct methods to release packets
2025-12-10 13:01:17 +05:30
Raja Subramanian 037cb9062f release ext packet if patching fails (#4142) 2025-12-10 12:09:49 +05:30
Raja Subramanian dd598ef23f Release ExtPacket if dependency descriptor or other parsing fails (#4141) 2025-12-10 11:05:19 +05:30
Raja Subramanian 1c1a836c3c Mark RTCP buffer Write as noinline. (#4138)
Seeing a bunch of objects in ReadStreamSRTP.write which does not make
a lot of sense as the function does not allocate anything
(https://github.com/pion/srtp/blob/8fe528a0c4ebb5c46d40a9fd5b77e5b6655fa919/stream_srtp.go#L68-L77)

RTP buffer was marked noinline in an easrlier PR.
Marking RTCP buffer write also as noinline to check if heap reporting
changes.
2025-12-08 22:30:30 +05:30
Raja Subramanian 64f3d1e972 switch participant callbacks to room to listener interface (#4136)
* switch participant callbacks to room to listener interface

* mage generate

* clean up

* clear listener

* clean up

* use interface in up data track manager

* tweaks

* Paul feedback - should reduce the diff as this keeps the room handlers as is except making methods for a couple of anonymous handlers

* clean up
2025-12-08 15:59:45 +05:30
Paul Wells c6e6c0215f add debug metric for tracking references (#4134) 2025-12-07 11:39:21 -08:00
Raja Subramanian a30c79fa6d Use isEnding to indicate if down track could be resumed. (#4132)
There is no need to cache down track if participant is going away.
2025-12-06 19:55:20 +05:30
Raja Subramanian 8c241ecf12 Fix RTCP reader leak in DownTrack. (#4131)
When a participant is closing, RTCP readers should be cleaned up from
factory even if the participant is expected to resume. The resumed
participant will be a new participant session and peer connection(s) and
everything will be set up again.
2025-12-06 17:49:23 +05:30
Raja Subramanian 3eef869a68 Do not pause rid in SDP (#4129) 2025-12-05 15:57:31 +05:30
Raja Subramanian 7c1a0fab7c Fix concurrent map access. (#4127)
https://github.com/livekit/livekit/issues/4126
2025-12-05 10:48:10 +05:30
Raja Subramanian 14446b1cc1 Let participant close remove the published tracks. (#4125) 2025-12-04 22:37:08 +05:30
cnderrauber fa0633aa3e move utils.WrapAround to mediatransportutil (#4124) 2025-12-04 17:45:11 +08:00
Raja Subramanian 7954748d7a Data tracks (#4089)
* WIP

* WIP

* Starting to add some signalling integration testing.

* Working tests.

* fix tests

* Forward data packets (#4096)

* WIP commit

* WIP

* WIP

* fix forwarding

* address PR comments

* move some methods from LocalParticipant to Participant interface

* handle subscription update

* add extensions and tests

* more packet tests

* add test for replace extension and fix a bug

* update protocol and add config
2025-12-04 10:44:34 +05:30
Raja Subramanian 7158d98366 log bucket growth (#4122) 2025-12-03 18:48:02 +05:30
Raja Subramanian 64c651431e Update mediatransportutil (#4115)
- New bucket API to pass in max packet size and sequence number offset
  and seequence number size generic type
- Move OWD estimator to mediatransportutil.
2025-11-28 21:51:53 +05:30
Raja Subramanian 0a2943bbc5 Clean up bits added to debug peer connection close hang. (#4114) 2025-11-28 10:30:39 +05:30
Raja Subramanian bd5382daaa Splitting transport close timeout logs. (#4108)
After adding more fields in
https://github.com/livekit/livekit/pull/4105/files, it was not even
logging. Access to one of the added fields must have ended up waiting on
a lock and blocked.

Unfotunately, the deadlock fix in https://github.com/pion/ice/pull/840
did not address the peer connection close hang.

Splitting the logs so that the base log still happens. Ordering after
looking at the code and guessing what could still log to see if we get
more of the logs and learn more about the state and which lock ends up
the first blocking one.
2025-11-27 10:02:01 +05:30
Raja Subramanian a6418ae219 Log more peer conenction state on close timeout. (#4105) 2025-11-26 19:58:31 +05:30
Raja Subramanian 06d999748f Check for cancel on unsubscription/source track going away. (#4104) 2025-11-25 21:32:21 +05:30
Raja Subramanian 7f10e18bac Record join/publish/subscribe cancellations. (#4102)
To get better picture of success/failure rate.
2025-11-25 14:06:02 +05:30
Raja Subramanian 402936324c Clear stereo=1 if stereo is not enabled. (#4101) 2025-11-24 21:31:56 +05:30
Raja Subramanian 70f6def39d Add checks for participant and sub-components close. (#4100)
* Add checks for participant and sub-components close.

Looks like there might be some memory leak with participant sessions not
getting closed properly. Adding checks (to be cleaned up later) to see
if there is a consistent place where things might hang.

* init with right type

* Remove unnecessary goroutine, thank you @milos-lk

* clean up
2025-11-24 18:07:33 +05:30
Raja Subramanian ffbabcc772 Switch forwarding latency log to Debugw (#4098) 2025-11-23 11:22:10 +05:30
aleb_the_flash 27d82a724e Fix "address" typo in transport logs (addddress → address) (#4097)
Correct triple-d spelling of "address" field in transport logs.

I’m not sure whether this was intentional, but I noticed it
while creating Grafana queries and filters. This matters because
anyone filtering logs using the correct spelling may
unintentionally miss relevant data, leading to incomplete or
misleading analysis.
2025-11-22 21:30:02 +05:30
Raja Subramanian 37a06821e2 logger proto redaction. (#4090)
Unfortunately, this could not be used for twirp/analytics redaction.

Probably worth writing a proto clone utility which will filter out based
on tags.
2025-11-18 14:15:17 +05:30
cnderrauber 54cf7d46c8 Control latency of lossy data channel (#4088)
* Control latency of lossy data channel

* remove log

* test
2025-11-18 16:30:16 +08:00
Raja Subramanian d510fff1e7 Downgrade x/tools to be able to make a release (#4084) 2025-11-15 18:56:22 +05:30
Raja Subramanian c3964ba2eb Use sync.Pool for objects in packet path. (#4066)
* Use sync.Pool for objects in packet path.

Seeing cases of forwarding latency spikes that aling with GC.

This might be a bit overkill, but using sync.Pool for small +
short-lived objects in packet path.

Before this, all these were increasing in alloc_space heap profile
samples over time. With these, there is no increase (actually the lines
corresponding to geting from pool does not even show up in heap
accounting when doing `list` in `pprof`)

* merge

* Paul feedback
2025-11-14 16:13:23 +05:30
Raja Subramanian f8b994d491 Forwarding latency measurement tweaks. (#4080)
* Forwarding latency measurement tweaks.

- prom transmission type public
- do not measure short term values as it is not used and saves some lock
  contention time in packet path potentially. Adding a separate method
  for that.
- Change latency/jitter summary reporting to `ns` also to match the
  histogram.

* add GetShortStats
2025-11-13 18:39:49 +05:30
cnderrauber 2d5054ad01 kind details for connector (#4072) 2025-11-11 21:50:48 +08:00
Raja Subramanian a272e28ae0 Log raeson for subscriber not being to determine codec. (#4071) 2025-11-11 16:42:42 +05:30
Raja Subramanian 4ce07bedeb Higher resolution forwarding latency histogram. (#4067)
* Higher resolution forwarding latency histogram.

Was using the average latency/jitter of last second to populate
forwarding latency/jitter histogram. But, it is too coarse, i. e. the
average value of latency/jitter is very low and those summarised samples
end up in the lowest bucket always.

A few things to address it
- record per packet forwarding latency in histogram
- adjust histogram bins to include smaller values
- Drop jitter histogram

This is a per packet call, but prometheus histogram is supposedly
fast/light weight. Would be good to get better resolution histograms.
Hence doing this. Please let me know if there are performance concerns.

* typo

* one more typo
2025-11-09 17:29:40 +05:30