Commit Graph

2559 Commits

Author SHA1 Message Date
Raja Subramanian
20bddfea1e Clean up published track on participant removal. (#3527)
Clean up the tracks in the synchronous path and remove track from track
manager. This is not strictly required in a single node case. But,
multi-node needs this. So, doing this here for consistency.
2025-03-14 16:09:22 +05:30
Raja Subramanian
65d8aa2847 Handle subscribe race with track close better. (#3526)
There are two very very edge case scenarios this is triyng to address.

Scenario 1:
-----------
- both pA and pB migrating
- pA migrates first and subscribes to pB via remote track of pB
- while the above subscribe is happening, pB also migrates and
  closes the remote track
- by the time the subscribe set up completes, it realises that
  the remote track is not open any more and removes itself as
  subscriber
- but that removal is using the wrong `isExpectedToResume` as clearing
  all receivers has not run yet which is what caches the
  `isExpectedToResume`.
- That meant, the down track transceiver is not cached and hence not
  re-used when re-subscribing via pB's local track
- Fix it by caching the expected to resume when changing receiver state
  to `closing`.

Scenario 2:
-----------
- both pA and pB migrating
- pA migrates first and subscribes to pB via remote track of pB
- while the above subscribe is happening, pB also migrates and
  closes the remote track
- pB's local track is published before the remote track can be fully
  closed and all the subscribers removed. That local track gets added
  to track manager.
- While the remote track is cleaning, subscription manager triggers
  again to for pA to subscribe to pB's track. The track manager now
  resolves to the local track.
- Local track subscription progresses. As the remote track clean up is
  not finished, the transceiver is not cached. So, the local track based
  subscription creates a new transceiver and that ends up causing
  duplicate tracks in the SDP offer.
- Fix it by creating a FIFO in track manager and only resolve using the
  first one. So, in the above case, till the remote track is fully
  cleaned up, the track manager will resolve to that. Yes, the
  subscriptions itself will fail as the track is not in open state (i. e.
  it might be in `closing` state), but that is fine as subscription
  manager will eventually resolve to the local track and proper
  transceiver re-use can happen.
2025-03-14 14:37:37 +05:30
Raja Subramanian
a6cb00b31e Reduce seeder duration to 30s and also do not force send PLI. (#3525)
Can use the normal PLI throttle cadence.
2025-03-13 10:41:42 +05:30
Raja Subramanian
c823320528 Add a key frame seeder in up track. (#3524) 2025-03-12 22:11:27 +05:30
Raja Subramanian
0f61ff3a2f Remove redundant log (#3523) 2025-03-12 15:31:08 +05:30
Raja Subramanian
7685cd25fd Log ParticipantInit on signal start to get a picture of join params (#3522) 2025-03-12 14:55:16 +05:30
Paul Wells
ac9e62ef05 add server agent load threshold config (#3520)
* remove agent worker load threshold

* cleanup
2025-03-11 21:07:01 -07:00
Raja Subramanian
cd5d32f005 Add pID and connID to log context to make it easier to search using pID. (#3518) 2025-03-11 22:33:08 +05:30
shishirng
2d9aa6dde4 Update api call info method (#3515)
* register RequestRouted handler for updating method

Signed-off-by: shishir gowda <shishir@livekit.io>

* pass room to telemetry in DeleteRoom api to extract roomID

Signed-off-by: shishir gowda <shishir@livekit.io>

---------

Signed-off-by: shishir gowda <shishir@livekit.io>
2025-03-11 05:56:30 -04:00
Raja Subramanian
b3779a9086 WebHookConfig (#3517)
* default webhook config

* WebHookConfig

* fix test

* protocol with yaml tags
2025-03-11 13:49:29 +05:30
cnderrauber
6121b9af5e Check ForwardParticipant room name (#3514) 2025-03-11 10:07:05 +08:00
Raja Subramanian
50ab47c11b Log packet drops/forward. (#3510)
Seeing an error in an e2e test, after migration, no packets are
forwarded. The only reason seems to be payload type mismatch (assuming
there are no errors in the forwarding loop pulling packets from buffer).

So, logging some packet stats in forwarding loop.
2025-03-10 16:36:25 +05:30
cnderrauber
139d1b139c Add ForwardParticipant method to room service (#3507)
It returns unimplemented error now.
2025-03-10 14:08:38 +08:00
Raja Subramanian
6c04909f88 Use atomic to store codec. (#3505)
* Use atomic to store codec.

It can change on up stream codec change, but not seeing any racy
behaviour with atomic access.

Reverting the previous change to mute with this change.

* no mime arg
2025-03-09 11:51:48 +05:30
Raja Subramanian
7f6afe05ad Prevent bind lock deadlock on muted. (#3504)
Need to re-visit the bind lock scope and maybe make the codec/mime
atomic and access them without bind lock. But, doing a whack-a-mole a
bit first to move things forward. Will look at making them atomics.
2025-03-09 11:21:09 +05:30
Paul Wells
48063df5b8 load mime type before calling writeBlankFrameRTP (#3502) 2025-03-07 23:12:20 -08:00
Raja Subramanian
d2e6cd150e Do not bind lock across flush which could take time (#3501) 2025-03-08 11:13:35 +05:30
Denys Smirnov
47896f50e3 Update protocol and IO service. (#3499) 2025-03-08 01:19:42 +05:30
Raja Subramanian
3a35cbc401 Log migration complete only when coming from sync (#3496) 2025-03-07 19:02:39 +05:30
NinaLua
c2f17a1072 refactor: using slices.Contains to simplify the code (#3495)
Signed-off-by: NinaLua <iturf@sina.cn>
2025-03-07 14:08:47 +05:30
tiaoxizhan
01e51dbd7f fix: fix the wrong error return value (#3493)
Signed-off-by: tiaoxizhan <tiaoxizhan@outlook.com>
2025-03-06 15:53:36 +05:30
cnderrauber
ff9115b228 Disable dd parser for vp8 if extension is not found (#3492)
Browser would not send dd extension for vp8 in some case even if
it is negotiated.
2025-03-06 17:20:00 +08:00
Benjamin Pracht
c3e06f0523 Do not attempt to create objects for URL ingresses as the ingress service will do so (#3491) 2025-03-05 15:15:02 -08:00
Raja Subramanian
f0edfbba8d Fix receiver rtt/jitter. (#3487) 2025-03-04 21:22:17 +05:30
Raja Subramanian
05dfd30d5b Take RTT and jitter from receiver view while reporting track stats for (#3483)
* Take RTT and jitter from receiver view while reporting track stats for
down stream tracks.

* adjust jitter in aggregate
2025-03-03 18:48:37 +05:30
cnderrauber
04ed56835e Don't issue TrackPublished/Unpublished event on migrated track (#3482) 2025-03-03 15:32:17 +08:00
Raja Subramanian
1cffe30cd0 Use a RED transformer to consolidate both RED -> Opus OR Opus -> RED (#3481)
* Use a RED transformer to consolidate both RED -> Opus OR Opus -> RED

* public

* clean up

* clean up debug
2025-03-02 13:29:56 +05:30
Raja Subramanian
591888f712 Fix missing RTCP sender report when forwarding RED as Opus. (#3480)
With publish RED and subscribe Opus, the RTCP sender reports were not
sent to down track as publisher sender reports were not forwarded to the
down track.
2025-03-02 11:52:17 +05:30
cnderrauber
900da73e6d Add ice candidates logs for failed peerconnection (#3473) 2025-02-27 14:47:28 +08:00
Raja Subramanian
83a839811c Transfer metadata cache over flow counter. (#3472)
* Transfer metadata cache over flow coutner.

Without that, logging was not getting sampled.

* sender
2025-02-27 11:21:11 +05:30
Raja Subramanian
6d44e433f4 Fix panic with invalid layer. (#3470)
* Fix panic with invalid layer.

Log an error so that we can understand which track porduces that.

* drop bad layer packet without forwarding
2025-02-27 09:48:13 +05:30
Raja Subramanian
fcb05e97c5 Properly initialise DD layer selector. (#3467) 2025-02-26 09:48:54 +05:30
Paul Wells
43bd251575 simplify vls base access (#3465) 2025-02-26 09:34:59 +05:30
Raja Subramanian
7350e99331 transfer from non-null for codec change (#3464) 2025-02-25 23:31:14 +05:30
Raja Subramanian
ca4526048c structured logging (#3461) 2025-02-24 11:47:35 +05:30
Raja Subramanian
83b94b31ac Do not revoke track subscription on permission update for exempt (#3458)
participants.
2025-02-21 11:25:29 +05:30
Denys Smirnov
60a09cb4be Implement SIP iterators. (#3332) 2025-02-20 13:13:21 +02:00
cnderrauber
363353d6e5 Fix codec regression failed after migration (#3455)
The backup codec could arrive before primary
in migration.
2025-02-20 15:54:11 +08:00
David Zhao
1c69a9eeed Dependent participants should not trigger count towards FirstJoinedAt (#3448)
* Dependent participants should not trigger count towards FirstJoinedAt

According to the API, empty timeout should be honored as long as no
independent participant joins the room. If we counted Agents and Egress
as part of FirstJoinedAt, it would have the side effect of using
departureTimeout instead of emptyTimeout for idle calculations.

* use Room logger
2025-02-20 00:57:40 -06:00
Paul Wells
3167266495 add datapacket stream metrics (#3450)
* add datapacket stream metrics

* normalize mime type
2025-02-19 22:28:10 -08:00
Raja Subramanian
69c8d0d165 Log migration complete (#3454) 2025-02-20 11:31:10 +05:30
cnderrauber
4b04b26a73 fix data channel slow reader test (#3453) 2025-02-20 10:40:25 +08:00
Paul Wells
f49103a003 add participant job type (#3443)
* add participant job type

* cleanup

* deps
2025-02-18 00:40:56 -08:00
cnderrauber
b2a54729f5 Don't drop message if calculate duration is too small (#3442)
* Don't drop message if calculate duration is too small

* fix test
2025-02-18 14:41:41 +08:00
Raja Subramanian
b3da3ff2cb Give more cache for RTX. (#3438)
- With probing the packet rate can get high suddenly and remote may not
  have sent receiver report as it might be sending for the non-spikey
  rate. That causes metadata cache overflows. So, give RTX more cahe.
- Don't need a large cache for primary as either reports come in
  regularly (or they are missing for a long time and having a biger
  cache is not the solution for that, so reduce primary cache size)
- Check for receiver report falling exactly back by (1 << 16). Had done
  that change in the inside for loop, but missed the top level check :-(
2025-02-15 22:43:28 +05:30
Raja Subramanian
0c966e6a7e Move a few logs to Debugw (#3437) 2025-02-15 22:16:17 +05:30
Raja Subramanian
56a61b6ce2 Safe access of proto fields. (#3436) 2025-02-15 05:51:53 +05:30
Raja Subramanian
5589637152 Seed on receiving forwarder state. (#3435)
This is mostly to clean up forwarder state cache for already started
tracks.

A scenario like the following could apply the seed twice and end up with
an incorrect state resulting in a large jump
- Participant A let's say is the one showing the problem
- Participant A migrates first. So, it tries to restore its down track states by querying state from the previous node.
- But, its down tracks start before the response can be received. However, it remains in the cache.
- Participant B migrates from a different node to where Participant A. So, the down track of Participant A gets switched from relay up track publisher -> local up track publisher.
- I am guessing the seeding gets applied twice in this case and the cached value from step 3 above causes the huge jump.

In those cases, the cache needs to be cleaned up.

(NOTE: I think this seeding of down track on migration is not necessary
as the SSRC of down track changes and the remote side seems to be
treating it like a fresh start because of that. But, doing this step
first and will remove the related parts after observing for a bit more)

Also, moving fetching forwarder state to a goroutine as it involves a
network call to the previous node via Director.
2025-02-14 15:46:08 +05:30
Raja Subramanian
9fd80c8919 Catch up if the diff is exactly (1 << 16) also. (#3433) 2025-02-14 12:50:52 +05:30
Raja Subramanian
dc0ff45fd7 Fix panic due to nil Egress (#3431) 2025-02-14 10:17:32 +05:30