* rtc: add RestartSessionTimer to re-anchor participant session duration
Exposes ParticipantImpl.RestartSessionTimer so the session timer can be
re-anchored to the actual join time. Duration is only ever emitted once
the participant becomes active, so re-anchoring at join keeps pre-join
wall-clock out of the reported/billed duration. Adds the method to the
LocalParticipant interface (fake regenerated) and a local protocol
replace to pick up SessionTimer.Reset.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* tidy
* update protocol
* report ended at for inactive sessions
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Paul Wells <paulwe@gmail.com>
* Add prom metrics for peer connectino state.
By direction (PUBLISHER vs SUBSCRIBER) and state ("started" ->
"connected"). This gives a way to track peer connections failing to
finish establishment.
The RTC active count can be useful for primary peer connection, but not
for non-primary. This counter can be used to track any and can generally
be used to understand success/failure rate of peer connection
establishment.
* add a couple of more states
* clean up and avoid duplicate reporting fully established
* staticcheck
* Update go deps to v4
Generated by renovateBot
* update dockertest to v4
* fix
---------
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: David Zhao <dz@livekit.io>
MoveToRoom resets the participant reporter resolver to receive new
(room, participant_session) keys for the destination, but the source
room's participant_session row never gets an end_time — the periodic
duration scrape only emits one once disconnectedAt is set, and a move
doesn't transition the participant to DISCONNECTED. Report end_time
immediately before the reset so the row is closed out cleanly.
* Metrics for participant active, i. e. fully established.
- Egress stub for v2 API
- Fix the participant canceled counter 🤦
- Add active counter -> this is increment when a participant becomes
active, i. e. primary peer connection established. Can be used to
monitor node wise connection establishment issues.
- Add singnalling validation fail counter.
With this, we have
- signalling validation fail
- signalling failed --> this is when the `startSession` fails
- signalling connected -> signalling is succesful and can send back
joinResponse to client
on media connection side
- rtc_init -> start
- rtc_connected -> participant session created (joined)
- rtc_active -> primay peer connection established
- rtc_canceled -> could not proceed with RTC connection due to not being
able to resume.
* signalling counters deps
* revert pion/webrtc to 4.2.12 to get SCTP without interleaving
* go back to pion/webrtc 4.2.11 and sctp 1.9.5
* telemetry: split webhook-processed hook registration out of NewTelemetryService
NewTelemetryService used to register a notifier processed-hook on the inner
*telemetryService directly. That made it impossible for downstream wrappers
(e.g. cloud's TelemetryService that overrides Webhook to fan out to a v3
observability pipeline) to intercept webhook events without double-firing
the legacy emission.
Lift the registration into a new exported helper RegisterWebhookHook, and
have the standalone server's wire provider createTelemetryService call it
right after construction so behavior is unchanged for callers that don't
wrap the service.
When a client hits /rtc/v[01]/validate with a base64 WrappedJoinRequest
whose embedded JoinRequest.ClientInfo is unset, validateInternal called
AugmentClientInfo with a nil *ClientInfo and panicked at ci.Address =
GetClientIP(req). The non-wrapped branch already allocates via
ParseClientInfo; do the same here so pi.Client always gets at least the
resolved client Address.
How it is different from existing participant attributes?
1. Async attribute can be added one at a time.
2. These are not included in `ParticipantInfo`.
3. Get an attribute bt participant identity and async attribute ID as
and when needed.
Data tracks (the new _data_track datachannel) previously only updated a
private dataTrackStats that logged a single summary at Close. Bytes never
reached the OnTrackStats -> TelemetryService.TrackStats pipeline that
media tracks and signal channels feed.
Wire DataTrack (UPSTREAM, publisher-home) and DataDownTrack (DOWNSTREAM,
per-subscriber) into BytesTrackStats on the same 5s cadence, mirroring
the media-track convention: subscriber's country and ID with publisher's
track ID for DOWNSTREAM. Cross-region proxy DataTracks leave the stats
pointer nil (no publisher reporter on that node, and relayed bytes would
double-count). Legacy dataTrackStats packet-loss/frame counters are
preserved.
Shared helper for callers that need to distinguish intentional/expected
participant closures (client leave, admin action, room teardown, migration)
from connection failures. Extracted from cloud's IsClosedIntentionally
switch so cloud-side code paths can share a single source of truth.
Reverting
- https://github.com/livekit/livekit/pull/4521
- https://github.com/livekit/livekit/pull/4525
There are TWCC feedback packets that are larger than MTU. Seems to
happen under a couple of conditions
1. Bad client data, i. e. severely out-of-order packets, bad sequence
numbers, etc.
2. On an ICE restart - this is rare, but it seemed to be flaky network
with some packets arriving and some not and causing a lot of gaps.
Either case, not much to do. If fargmentation/re-assembly back to
publisher works, the feedback will make it through. If not, feedbacks
will be missed and clients have to work with some missing data which is
not unexpected and the protocol is designed to handle.
However, filed pion/interceptor issue just in case - https://github.com/pion/interceptor/issues/416
* Don't require media sections when joining
Client except browser (rust/libwebrtc is known) could have problem
to fire ontrack event when reuses extra media section to subscribe
track, so disable this feature in server side and let client determine
if extra media sections are needed.
* lint
* rtc: report participant kind code and details
Plumb ParticipantKind and KindDetails through MediaTrack and
BytesTrackStats so track-level reporting can record the numeric kind
code plus details codes on every participant_session aggregation,
alongside the existing Kind string. Also picks up the new kind fields
on resolved BytesSignalStats participants.
Adds deployment/agentID/version to the agent worker logger.
Sfu will fallback to retransmit packet by media stream ssrc if rtx
is not negotiated (client doesn't have), so we should not disable
rtx explicitly (by codec config).
Fix#4519
Not a major issue, but just avoiding duplicate creation of NACK module.
RTCP feedback of `nack` and `nack pli` end up getting treated as `nack`
and was double creating.
* Apply ttl check only when authenticate allocation creating
TTL check could reject allocation/persmission refresh in
security enhancement #4505, cause long-live session disconnect
when turn credential is expired.
Only check ttl on allocation creating to prevent abusing leaked
credential but keep long-live session work.