Commit Graph

73 Commits

Author SHA1 Message Date
Raja Subramanian 2974ba879f Unsubscribe from data track on close (#4443)
* Unsubscribe from data track on close

* clean up
2026-04-10 15:29:25 +05:30
Raja Subramanian 934f8598e2 Clean up data track observers on unsubscribe. (#4421)
Media track clean up fixed some leaks. There are more when the
participants thrash. This is not the issue, but doing this to match
media tracks.
2026-04-02 11:55:46 +05:30
Paul Wells 9ab8c1d522 clear track notifier observers on subscription teardown (#4413)
When a subscriber disconnects, observer closures registered on the
publisher's TrackChangedNotifier and TrackRemovedNotifier were never
removed. These closures capture the SubscriptionManager, which holds
the ParticipantImpl, preventing the entire participant object graph
(PCTransport, SDPs, RTP stats, DownTracks) from being garbage collected.

In rooms with many participants that disconnect and reconnect frequently,
this causes unbounded memory growth proportional to the number of
disconnect events. The leaked memory is not recoverable while the room
remains open.

Clear notifiers in both handleSubscribedTrackClose (individual
subscription teardown) and SubscriptionManager.Close (full participant
teardown), matching the existing cleanup in handleSourceTrackRemoved.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 20:48:48 -07:00
Raja Subramanian b81bac0ec3 Key telemetry stats worker using combination of roomID, participantID (#4323)
Test / test (push) Failing after 17s
Release to Docker / docker (push) Failing after 3m42s
* Key telemetry stats work using combination of roomID, participantID

With forwarded participant, the same participantID can existing in two
rooms.

NOTE: This does not yet allow a participant session to report its
events/track stats into multiple rooms. That would require regitering
multiple listeners (from rooms a participant is forwarded to).

* missed file

* data channel stats

* PR comments + pass in room name so that telemetry events have proper room name also
2026-02-16 13:56:13 +05:30
Raja Subramanian 08793bea89 Use active at time to check for track not bound timeout. (#4206) 2025-12-29 20:32:08 +05:30
Raja Subramanian 3606ce542f Do not warn about track not bound if participant is not ready. (#4205)
Analysed half a dozen cases and all of them were due to participant is
not active yet.

Also, some misc logging changes.
2025-12-29 18:16:11 +05:30
changgesi d7db7cb389 chore: fix a large number of spelling issues (#4147)
Signed-off-by: changgesi <changgesi@outlook.com>
2025-12-11 09:34:13 +05:30
Raja Subramanian 8c241ecf12 Fix RTCP reader leak in DownTrack. (#4131)
When a participant is closing, RTCP readers should be cleaned up from
factory even if the participant is expected to resume. The resumed
participant will be a new participant session and peer connection(s) and
everything will be set up again.
2025-12-06 17:49:23 +05:30
Raja Subramanian 7954748d7a Data tracks (#4089)
* WIP

* WIP

* Starting to add some signalling integration testing.

* Working tests.

* fix tests

* Forward data packets (#4096)

* WIP commit

* WIP

* WIP

* fix forwarding

* address PR comments

* move some methods from LocalParticipant to Participant interface

* handle subscription update

* add extensions and tests

* more packet tests

* add test for replace extension and fix a bug

* update protocol and add config
2025-12-04 10:44:34 +05:30
Raja Subramanian 0a2943bbc5 Clean up bits added to debug peer connection close hang. (#4114) 2025-11-28 10:30:39 +05:30
Raja Subramanian 06d999748f Check for cancel on unsubscription/source track going away. (#4104) 2025-11-25 21:32:21 +05:30
Raja Subramanian 7f10e18bac Record join/publish/subscribe cancellations. (#4102)
To get better picture of success/failure rate.
2025-11-25 14:06:02 +05:30
Raja Subramanian 70f6def39d Add checks for participant and sub-components close. (#4100)
* Add checks for participant and sub-components close.

Looks like there might be some memory leak with participant sessions not
getting closed properly. Adding checks (to be cleaned up later) to see
if there is a consistent place where things might hang.

* init with right type

* Remove unnecessary goroutine, thank you @milos-lk

* clean up
2025-11-24 18:07:33 +05:30
Raja Subramanian 5ca1626439 Support join request as proto + base64 encoded query param (#3836)
* Support join request as proto + base64 encoded query param

* joinPublish

* staticcheck

* deps

* tests

* gzip

* test

* deps

* clean up
2025-08-07 11:13:27 +05:30
Raja Subramanian 10103449c5 Add country label to edge prom stats. (#3816)
* Add country label to edge prom stats.

* data channel country stats

* test

* pub/sub time country
2025-07-24 13:23:05 +05:30
Raja Subramanian 2a6a9b8a4a Grouping all signal messages into participant_signal. (#3801)
Currently, it is a bit of a mish-mash
- some compose the message fully and just call send()
- some give parameters and the message is composed in
  participant_signal.go

Was thinking about making an interface for signalling and have v1/v2
impls, but did not want to repeat composing messages if there are common
messages. And some of those function reach into `ParicipantImpl` object
and use information (simple example of p.IsReady()) which would become
more elaborate if the signaller is split out into its own struct.

Maybe, just need to make an interface for the sink and send to the
correct sink based on v1 /v2 signal transport.

But, for now, just grouping all signal messaages in one file
so that it is easier to manage later.
2025-07-18 15:24:52 +05:30
Raja Subramanian b9a44c3fbf Signalling V2 protocol implementation start (#3794)
* WIP

* name

* refactor validate

* WIP

* WIP

* signal cache initial impl

* HandleConnect in room manager

* generate subscriber offer

* handle ConnectRequest as stand alone

* segmentation, reassembly

* clean up

* rearrange

* lock scope

* support metadata in connect request

* prom

* add SifTrailer to ConnectResponse

* prom for get offer error counter

* RtcInit counter

* Jie feedback

* signal client

* consolidate v1 and v2 into SignalClient

* clean up

* comment

* deps

* mage generate

* fix tests

* pass around roomName and participantIdentity

* mage generate
2025-07-18 00:01:21 +05:30
cnderrauber dbb70e0f06 Fix dynacast quality for moving out tracks (#3664)
* Make sure moving out track has been unsubscribed

Remove start time checking in subscription manager
as We always use new track ID for republished track at #3020
so there is no race condition now.

Also RemoveSubscriber for moving out tracks for safety,
the subscription manager will handle the removed event but
RemoveSubscriber again will not be bad.

* Clear subscriber node max quality for moving out tracks
2025-05-14 12:54:33 +08:00
cnderrauber 793b383a52 Add Moving participant to another room (#3648)
* Add Moving participant to another room

it is implemented in cloud only since the destination
room can exist in different node with the source room

* Update pkg/service/errors.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* rename

* test panic

* fake LocalParticipantHelper

* revert delete line

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-08 12:58:24 +08:00
Raja Subramanian 9551c52c85 Try 2 to consolidate mime type (#3407)
* Normalize mime type and add utilities.

An attempt to normalize mime type and avoid string compares remembering
to do case insensitive search.

Not the best solution. Open to ideas. But, define our own mime types
(just in case Pion changes things and Pion also does not have red mime
type defined which should be easy to add though) and tried to use it everywhere.
But, as we get a bunch of callbacks and info from Pion, needed conversion in
more places than I anticipated. And also makes it necessary to carry
that cognitive load of what comes from Pion and needing to process it
properly.

* more locations

* test

* Paul feedback

* MimeType type

* more consolidation

* Remove unused

* test

* test

* mime type as int

* use string method

* Pass error details and timeouts. (#3402)

* go mod tidy (#3408)

* Rename CHANGELOG to CHANGELOG.md (#3391)

Enables markdown features in this otherwise already markdown'ish formatted document

* Update config.go to properly process bool env vars (#3382)

Fixes issue https://github.com/livekit/livekit/issues/3381

* fix(deps): update go deps (#3341)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Use a Twirp server hook to send API call details to telemetry. (#3401)

* Use a Twirp server hook to send API call details to telemetry.

* mage generate and clean up

* Add project_id

* deps

* - Redact requests
- Do not store responses
- Extract top level fields room_name, room_id, participant_identity,
  participant_id, track_id as appropriate
- Store status as int

* deps

* Update pkg/sfu/mime/mimetype.go

* Fix prefer codec test

* handle down track mime changes

---------

Co-authored-by: Denys Smirnov <dennwc@pm.me>
Co-authored-by: Philzen <Philzen@users.noreply.github.com>
Co-authored-by: Pablo Fuente Pérez <pablofuenteperez@gmail.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Paul Wells <paulwe@gmail.com>
Co-authored-by: cnderrauber <zengjie9004@gmail.com>
2025-02-10 10:44:15 +05:30
Raja Subramanian 28c39efa06 Exempt egress participant from track permissions. (#3322)
* Exempt egress participant from track permissions.

* test
2025-01-11 12:50:34 +05:30
Raja Subramanian 7f0c14306f One shot signalling mode fixes (#3223)
* set desired on synchronous track

* debug

* debug

* direction

* reuse

* clean up
2024-11-30 14:55:36 -08:00
cnderrauber 54f9f7de51 upgrade to pion/webrtc v4 (#3213) 2024-11-28 16:05:38 +08:00
Raja Subramanian 3498e53650 Participant method to check a track by name is subscribed. (#3192)
* Set down track connected flag in one-shot-signalling mode.

Also, added maintaing ICE candidates for info purposes.
And doing analytics events (have to maintain the subscription inside
subscriptionmanager to get list of subscribed tracks, so added enough
bits from the async path into sync path to get the analytics bits also)

* comment typo

* method to check if a track name is subscribed
2024-11-22 07:43:38 +05:30
Raja Subramanian 31d6dd7107 Set down track connected flag in one-shot-signalling mode. (#3191)
* Set down track connected flag in one-shot-signalling mode.

Also, added maintaing ICE candidates for info purposes.
And doing analytics events (have to maintain the subscription inside
subscriptionmanager to get list of subscribed tracks, so added enough
bits from the async path into sync path to get the analytics bits also)

* comment typo
2024-11-21 18:41:33 +05:30
Raja Subramanian 9f25603213 One shot signalling mode (#3188)
* WIP

* comment

* Verify method on LocalParticipant

* cleanup

* clean up

* pass in one-shot-mode to StartSession

* null message source and sink

* feedback and also remove check in ParticipantImpl for one-shot-mode-filtering as a null sink can be used for that
2024-11-21 09:33:28 +05:30
cnderrauber cf59267631 Add counter for pub&sub time metrics (#3084)
* Add counter for pub&sub time metrics

The pub&sub shows large value in migration related case like
muted/disabled migration, the subscription time depends on
the time when publisher unmute the track(sending rtp packet
after migration), add a counter to distinguish since we
can't control the time in such cases and the first subscription
attemps also is more meaningful than those cases.

* Add info log for high publish delay
2024-10-11 12:07:24 +08:00
cnderrauber c8dbe8e977 reset subscription time when downtrack closed and expect resume (#3083) 2024-10-10 16:06:44 +08:00
cnderrauber eed925fddf avoid race condition on downtrack.Codec (#3032) 2024-09-22 14:27:26 +08:00
Raja Subramanian 7df6f86693 Initial plumbing for metrics. (#2950)
* Initial plumbing for metrics.

This implements
- metrics received from participant.
- callback to room.
- room distributes it to all other participants (excluding the sending
  participant).
- other participants forward to client.
- counting metrics bytes in data channel stats

TODO:
  - recording/processing/batching
  - should recording/processing/batching happen on publisher side or
    subscriber side?
  - should metrics be echoed back to publisher?
  - grants to publish/subscribe metrics.

* mage generate

* clear OnMetrics on close

* - CanSubscribeMetrics permission.
- Echo back to sender.

* update deps

* No destination identities for metrics

* WIP

* use normalized timestamp for server injected timestamps

* compile

* debug log metrics batch

* correct comment

* add baseTime to wire

* protocol dep

* Scope metrics forwarding to only participants that a participant is
subscribed to.

Also remove the participant_metrics.go file as it was not doing anything
useful.

* update comment

* utils.ErrorIsOneOf

* couple of more utils.CloneProto
2024-09-19 11:42:31 +05:30
cnderrauber 978db00034 Add sdk, participant_kind to pub sub metrics (#3023)
* exclude go client from track publication metric

* add sdk,participant_kind lables

* fix test
2024-09-19 10:42:47 +08:00
Raja Subramanian 098aa78ab7 Do not remove from subscription map on unsubscribe. (#3002)
* Do not remove from subscription map on unsubscribe.

Notes in line as to why.

* Avoiding the extra check. It is fine to check under lock for clean up.
It is done in other places.

* comments
2024-09-14 12:56:48 +05:30
cnderrauber 947e8f5909 Speed up track publication (#2952)
* speed up track publication

Add metrics for track publication and subscription

Return EnabledCodecs in JoinResponse so client can
choose codec without server side codec fallback

Cache remote webrtc track without AddTrackRequest to
let client send publisher offer before AddTrackRequest response

* go mod

* clean code
2024-08-23 18:38:32 +08:00
Raja Subramanian 8c323330b6 Store subscriber forwarder state (#2907)
* Forwarder state for migrating participant.

* clean up

* update protocol deps

* cleanup debug
2024-08-05 21:13:07 +05:30
Raja Subramanian ef838e4fa2 Indicate if track is expectd to be resumed in onClose callback. (#2800)
That is the main change. Changed variable name to `isExpectedToResume`
everywhere to be consistent.

Planning to use the callback value in relays to determine if the down
track should be closed or switched to a different up track.
2024-06-17 23:51:00 +05:30
Raja Subramanian 95f5c94b4d Notify initial permissions (#2595)
* Notify initial permissions

NOTE: This does add an initial subscription permission notification
which should be fine, but something to watch for.

A stress test combining
- mute/unmute on publisher side.
- allowing/revoking permission for subscriber from publisher side.
- subscribing/unsubscribing from subscriber side.
results in a scenario where a subscription permission update of
`not_allowed` being sent and on a re-subscribe, an `allowed` update does
not happen.

It happens like so
- Subscription revoke cloes the down track of subscriber.
- The subscription is still desired.
- So, a subscription reconcile runs and sees `permission: false`. This
  sends subscription permission of `not_allowed`.
- Unsubscribe request comes in and sets `desired: false`.
- Reconsiler runs again and sees `desired: false` and `subscribedTrack:
  nil`. This cleans up the subscription.
- Publisher grants permission for the subscriber.
- Subscriber subscribes to the track again. A new subscription is
  created.
- Reconciler runs and sees `permission: true`, but there is no
  permission change as it is a new subscription object. So, `allowed`
  subscription permission update is not sent and the client is stuck at
  `not_allowed`.

Fix, maintain if permission has been initialized. Has the effect of
sending an initial update which should be fine.

* clean up comment

* no default
2024-03-22 23:22:20 +05:30
Raja Subramanian 8442b2b37c Maintain subscription count. (#2515)
* Maintain subscription count.

Does not affect function as it is not decremented only if limits are
configured. But, good to maintain proper count anyway.

* wire
2024-02-27 12:11:24 +05:30
Raja Subramanian 5ac5bd236a Let track events go through after participant close. (#2487)
* Let track events go through after participant close.

Also, reducing lock scope in telemetry service.

* use shadow
2024-02-17 13:40:07 +05:30
Raja Subramanian d216f94ac1 Remove some logs. (#2484)
* Remove some logs.

Also, changing Errorw -> Warnw in a bunch of places.
Going to move towards using `Errorw` for cases where a functionally
unexpected condition happens, i.e by design a condition should not
happen yet it triggered kind of scenarios.

* log error
2024-02-15 18:05:50 +05:30
Raja Subramanian f95194c833 Fixes to sync state disabled tracks. (#2459)
* Fixes to sync state disabled tracks.

* test
2024-02-07 13:52:57 +05:30
Raja Subramanian bcf9fe3f0f Use a participant worker queue in room. (#2420)
* Use a participant worker queue in room.

Removes selectively needing to call things in goroutine from
participant.

Also, a bit of drive-by clean up.

* spelling

* prevent race

* don't need to remove in goroutine as it is already running in the worker

* worker will get cleaned up in state change callback

* create participant worker only if not created already

* ref count participant worker

* maintain participant list

* clean up oldState
2024-01-28 22:10:35 +05:30
Raja Subramanian bf0e88dea4 Squelch only the log, not the error return. (#2379) 2024-01-12 16:58:23 +05:30
Raja Subramanian 3687396d84 Squelch error logs while waiting for track resolve. (#2376) 2024-01-12 12:16:19 +05:30
Raja Subramanian bdcd142c0d Adding some logs in subscribe path. (#2343)
Trying to chase down an older client failing to subscribe some times.
2023-12-25 14:12:08 +05:30
David Zhao 3fe124c87f Log cleanup pass (#2285)
* Log cleanup pass

Demoted a bunch of logs to DEBUG, consolidated logs.

* use context logger and fix context var usage

* moved common error types, fixed tests
2023-12-02 15:07:31 -08:00
David Zhao 65934e6486 Fix ICE connection fallback (#2144)
* Fix ICE connection fallback

Short connection detection relied on iceFailedTimeout, which previously
had been misinterpreted. Since we've reduced iceFailedTimeout, it is
creating false negatives.

We'll instead use PingTimeout since clients are expected to keep the
signal connection active.

* reduce ping interval to align with total ice failure timeout
2023-10-15 14:36:12 -07:00
Raja Subramanian 0c34f12fa1 Demote some high frequency logs to Debugw (#1925) 2023-08-02 00:03:38 +05:30
Raja Subramanian 1cb74b9e1b Check for desired before clean up. (#1865)
Fix a potential race between needsCleanup checking and a re-subscribe
setting desired back to true.
2023-07-10 13:20:57 +05:30
Raja Subramanian 869f23a054 Close subscriptions promptly (#1845)
* Close subscriptions promptly

Two things:
-----------
1. Because the desired is not changed, the notifiers are not notified
that the subscription is not observing any more. So, that holds
a refernce to the subscription manager.

Address the above by setting `setDesired` to false on all subscriptions
when subscription manager closes. That will remove observer from the
notifiers.

2. When subscription manager is closed, the down track close
is invoked which flows back (with onClose callback of downtrack) to
subscription manager "handleSubscribedTrackClose". That callback
handler sets the subscribed track to nil for that subscription.

A couple of scenarios here
a. Without the above change, desired could have been true and it would
have looked that the track needs to try subscription again because
`needsSubscribe == true` (desired == true && subscribedTrack == nil)

b. Even with the change above, there is a new condition of
`desired == false && subscribedTrack == nil` and there was no handler
for that condition in the reconciler.

Address this by adding a `needsCleanup` function and delete subscription
from the map. Note that the reconciler may not be running to execute
this action as subscription manager would have closed the `closeCh`, but
doing the code in the interest of proper clean up.

* clean up
2023-07-01 12:31:51 +05:30
Raja Subramanian 583648a1ed Avoid closure to reduce life span of objects. (#1809)
A subscription in subscription manager could live till the source
track goes away even though the participant with that subscription
is long gone due to closure on source track removal. Handle it by using
trackID to look up on source track removal.

Also, logging SDPs when a negotiation failure happens to check
if there are any mismatches.
2023-06-20 19:06:01 +05:30