* Initial plumbing for metrics.
This implements
- metrics received from participant.
- callback to room.
- room distributes it to all other participants (excluding the sending
participant).
- other participants forward to client.
- counting metrics bytes in data channel stats
TODO:
- recording/processing/batching
- should recording/processing/batching happen on publisher side or
subscriber side?
- should metrics be echoed back to publisher?
- grants to publish/subscribe metrics.
* mage generate
* clear OnMetrics on close
* - CanSubscribeMetrics permission.
- Echo back to sender.
* update deps
* No destination identities for metrics
* WIP
* use normalized timestamp for server injected timestamps
* compile
* debug log metrics batch
* correct comment
* add baseTime to wire
* protocol dep
* Scope metrics forwarding to only participants that a participant is
subscribed to.
Also remove the participant_metrics.go file as it was not doing anything
useful.
* update comment
* utils.ErrorIsOneOf
* couple of more utils.CloneProto
* Use new track id for republishing
Always generates new track id for track publishing
to make the behavior more consistent. A republishing of
clien track will be considered as a new track publication
and always trigger trackUnpublished & trackPublished event
on subscriber side.
* remove test
* Log only when not nil.
Default logging confuses debugging as we call using nil as well to make
the call site simpler. And logging a nil makes it look like it is
incorrect seeding. `nil` fields do not seed. So, don't log when `nil`.
* log SDP
With protobuf marshalling/unmarshalling, the default struct was getting
used and nil checks for non existence were failing. That means the right
layers were not reported active on migration.
Also, restore the wrapped loggers as some fields needs some conversion
before logging.
Also, log when RTCP sender report for reference layer is received.
Cleaning up a bunch of wrapped logger calls as we already have
logger.Proto which I forgot about while doing those wrappers.
* Do not remove from subscription map on unsubscribe.
Notes in line as to why.
* Avoiding the extra check. It is fine to check under lock for clean up.
It is done in other places.
* comments
To increase visibility of ICE reconnect, logging reconnected at Infow
level. Otherwise, it is hard to see if an ICE restart finished
successfully.
Also, cleaning up ICEConnectionDetails a bit. Just separate out
read-only fields into its own struct and use it for read-only export.
* Cache RTCP sender report in forwarder state.
To be used in migration.
TODO: need to check more places to operate pure in unix nano rather than
converting.
* match name
When a user includes a trailing slash in LIVEKIT_URL, it would produce
double slashes in the path, i.e. `https://myhost.livekit.cloud//twirp/RoomService.ListRooms`
Currently the server will send a 302 MOVED response, causing Twirp requests to fail.
We now remove the double slash in front within the middleware.
* Record out-of-packet count/rate in prom.
Adding a field to AnalyticsStream to make this easier to report.
Let me know if adding to AnalyticsStream is not ok.
Will set up a protocol PR if it is okay.
* deps
* Ref count the stats worker.
NOTE: Don't liek this much, but wanted to open this get some 👀 on
this and get feedback.
There are two entities, one for counting signal bytes and another for
media stats. They both send `ParticipantJoined` and `ParticipantLeft`
event.
In the case of a participant resume, as the old web socket
connection is closed, that triggers a signal stats counter close. That
would call `ParticipantLeft` and that would close the stats worker.
The closed stats worker got reaped in `FlushStats` after three minutes.
So, all events after that did not have a worker and hence went
unreported including missing participant_left webhook because it relied
on checking if a participant was ever connected and that needed to check
the worker state.
Using a ref count to keep track of join/leaves. And not close the worker
until ref count goes down to 0.
* create a stats worker on resume
* revert incorrect changes
* transfer connected state
* transfer connected state when creating worker
* resolve participant on a resume
* exponential backoff when calling CreateRoom
* dz+bb review: check ctx cancellation, remove retry max
* bb review: fix loop condition
* raja review: use timer, bring back max tries
But, do not record first packet time on an out-of-order packet.
It so happens that packets get out-of-order a lot more across relay.
And it turns out with some H.264 stream, the first few packets of a key
frame are very small (may be SPS/PPS, haven't checked), they get
out-of-oder quite a lot, so much so a down track never starts even it
has 20 - 25 key frames have passed through.
* Negotiate downttrack for subscriber before receiver is ready
This change will save 1 round sdp negotiation time for
subscribing to simulcast-codec or remote node track
* solve comment
* Fix simulcast-codec case
* Do not ICE restart on an idle/not yet started peer connection.
* Skip ICE restart on unestablished peer connection.
For publish only participants, the subscriber peer connection is not
negotiated. So, ICE restart was hitting an error while trying to restart
the SUBSCRIBER peer connection.
* use ICE gathering state as peer connection state may not have changed if first offer/answer was missed