Commit Graph

967 Commits

Author SHA1 Message Date
Raja Subramanian
ea2fa30cf8 Plug worker leaks (#2422)
Thank you @paulwe
2024-01-28 23:12:33 +05:30
Raja Subramanian
bcf9fe3f0f Use a participant worker queue in room. (#2420)
* Use a participant worker queue in room.

Removes selectively needing to call things in goroutine from
participant.

Also, a bit of drive-by clean up.

* spelling

* prevent race

* don't need to remove in goroutine as it is already running in the worker

* worker will get cleaned up in state change callback

* create participant worker only if not created already

* ref count participant worker

* maintain participant list

* clean up oldState
2024-01-28 22:10:35 +05:30
Raja Subramanian
38352b6125 Change transport queue. (#2419)
From a channel to OpsQueue. Have seen extreme cases (with a ton of
candidates) overflowing the channel.
2024-01-28 14:28:29 +05:30
Raja Subramanian
b71d373f4a Use Deque in ops queue. (#2418)
* Use Seque in ops queue.

Standardizing some uses
- Change OpsQueue to use Deque so that it can grow/shrink as necessary and
  need not worry about channel getting full and dropping events.
- Change StreamAllocator and TelemetryService to use OpsQueue so that
  they also need not worry about channel size and overflows.

* Address feedback

* delete obvious comment

* clean up
2024-01-28 13:48:30 +05:30
Raja Subramanian
d3da94c45e Augment LeaveRequest with alternate regions to connect. (#2408)
* Augment LeaveRequest with alternate regions to connect.

* update protocol and issue resume action on close if expected to resume

* use current protocol in tests

* address feedback
2024-01-25 22:22:46 +05:30
Raja Subramanian
43a40eb52d Using minimal TrackInfo when reporing to telemetry. (#2407)
Used the full TrackInfo in my previous PR, but telemetry might be
relying on top level Width/Height. So, make a pared down TrackInfo to
report to telemetry.

Also, correct some spelling/comments.
2024-01-25 10:27:55 +05:30
Raja Subramanian
79cdc2df2e Unify muted and unmuted migration paths. (#2406)
* Unify muted and unmuted migration paths.

If dynacast had disabled all layers, after a migration, the client did
not restart publish (it is akin to muted track). That failed migration
because migration state machine waits for unmuted tracks to be published
(i. e. server has to receive packets).

If a migrating track is in muted state, server does not wait for
packets. It synthesises the published event and catches up later when
packets actually come in.

Just treating all migrations as the erstwhile muted case. Sythesise
publish whether track is muted or not. In the unmuted case, packets
might arrive soon after whereas in muted case, it will depend on when
unmute happens.

This is tricky stuff. So, will need good testing.

* use muted from track info
2024-01-25 01:24:09 +05:30
Pablo Fuente Pérez
f6608977f0 Fix race condition on Participant.updateState (#2401)
The comparisson between the last and current ParticipantInfo_State wasn't atomic. This sometimes resulted in two calls to onStateChange method for the same participant state. In the end this was reflected in two ACTIVE events being generated for the same participant at exactly the same moment. The fix actually uses the atomic method Swap to properly protect the "compare and set" operation and avoid any race condition.
2024-01-22 17:11:34 -08:00
Raja Subramanian
899067ba0f Simulation scenarios to disable signal channel on resume (#2389)
* Add a simulation scenario to disconnect signal channel on resume

- Requesting that scenario add that participant to a map with a timeout
  of 5 seconds.
- If a resume (reconnect = 1) happens before the timeout, the signalling
  channel is closed immediately on resume.
- There is a clean up worker which will remove entries from the map when
  they timout.
- The participant is also removed from the map if the disconnect on
  resume is invoked once.

* simulate disconnect signal on resume no messages

* comment

* comment

* Close all retries

* update deps

* abort resume only if simulation applied

* Revert SIP change
2024-01-17 20:44:05 +05:30
Paul Wells
78bf642d67 record session once (#2381) 2024-01-12 05:55:56 -08:00
Raja Subramanian
bf0e88dea4 Squelch only the log, not the error return. (#2379) 2024-01-12 16:58:23 +05:30
Paul Wells
2fe2a9c9f2 add session start time metric (#2377) 2024-01-11 23:23:51 -08:00
Raja Subramanian
3687396d84 Squelch error logs while waiting for track resolve. (#2376) 2024-01-12 12:16:19 +05:30
Raja Subramanian
6ff7b0faba Pass mock track in track update callback (#2373) 2024-01-11 07:54:26 +05:30
Raja Subramanian
32bd75648f Wait for metadata update. (#2363) 2024-01-05 17:07:44 +05:30
Paul Wells
a10830a995 add protocol version 12 helper (#2358) 2024-01-03 00:06:43 -08:00
Raja Subramanian
7c8b60431d Update mute if necessary when updating track info. (#2357) 2024-01-02 23:32:43 +05:30
Raja Subramanian
2f1a2ff39d Protect against stats getting reset. (#2351)
A reset would make `after` look like it is `before` and the diff will be
large unsigned numbers.
2023-12-28 01:16:59 +05:30
Raja Subramanian
5eea679589 Include packets out-of-order in TrafficStats. (#2350)
PacketsLost may not provide useful if repairs are discounting the loss.
So, out-of-order packets are an indication of loss and maybe subsequent
repair. Note that out-of-order could be just out-of-order by a short
amount of time, but a lot of that happening is not good either.
So, out-of-order could provide a decent view of link quality.
2023-12-27 23:15:24 +05:30
Raja Subramanian
15500d8a18 Add padding and lost packets to traffic stats (#2349)
* Add padding and lost packets to traffic stats

* aggregate padding packets
2023-12-27 17:40:59 +05:30
Raja Subramanian
bdfc684cd7 Prevent race of new track and new receiver. (#2345)
* Prevent race of new track and new receiver.

Two different concepts
1. Creation of a new media track
2. Creation of a new receiver inside the media track
collided and caused track published to not be fired.

Unify to mark creation of new receiver as the source of truth.
With simulcast codecs, creation of a new receiver should be treated as a
new published track.

* Fire onTrackPublished only on new track
2023-12-25 23:05:59 +05:30
Raja Subramanian
bdcd142c0d Adding some logs in subscribe path. (#2343)
Trying to chase down an older client failing to subscribe some times.
2023-12-25 14:12:08 +05:30
Raja Subramanian
ee1a167c3e Correct logger field (#2341) 2023-12-24 12:25:05 +05:30
Raja Subramanian
6cac17affe Add some debug logs around track publish (#2340) 2023-12-23 18:55:24 +05:30
Raja Subramanian
26c96ec283 Synthesise codec when adding pending track for no simulcast case also. (#2339)
* Synthesise codec when adding pending track for no simulcast case also.

Older clients not using simulcast codecs were failing e2e migration
tests. Problem is that they did not have layer information and hence
SSRC could not be set on migration.

A codec was getting added later (when OnTrack was received). I missed
adding layers in that code. Could have cloned layers there and added it.
But, simplifying and adding at the start itself.

Also, cleaning up code in `MediaTrackReceiver` for no codecs case as it
should not happen any more.

* clone per layer

* fix priority determination
2023-12-22 17:09:49 +05:30
Paul Wells
01f90d185f copy receivers on write (#2336)
* copy receivers on write

* cleanup

* cleanup

* test
2023-12-21 08:23:22 -08:00
Raja Subramanian
4c1047d8c3 Populate simulcast codec layers. (#2334)
Previously, it was done on read. Missed populating it on write
in the TrackInfo consolidation effort.

Fix by populating layers when adding pending track itself.
As all codecs will have same layers, clone the top level layers and add
it all codecs.
2023-12-21 18:52:49 +05:30
Raja Subramanian
a4888fcf8f Prevent unsafe access (hopefully). (#2332)
* Prevent unsafe access (hopefully).

Thank you @paulwe for catching it.

* prevent recursive locks
2023-12-21 16:02:10 +05:30
Raja Subramanian
faff67162b Consolidate TrackInfo. (#2331)
* Consolidate TrackInfo.

TrackInfo was spread across a bit. Consolidating it.

* TODO comments

* test

* update TrackInfo on SSRC change

* further consolidation

* log mimes only

* update receivers on SSRC set

* clone proto on return

* feedback: break loop on mime match

* prevent data race
2023-12-21 09:56:54 +05:30
Raja Subramanian
d0c36aa6cc Make UpdateTrackInfo an interface. (#2327) 2023-12-20 09:58:08 +05:30
Raja Subramanian
7c841e8895 Only assign TrackInfo Version on fresh publish. (#2325)
* Only assign TrackInfo Version on fresh publish.

* remove redundant nil check
2023-12-20 09:48:13 +05:30
Raja Subramanian
37539fdf76 Add Version to TrackInfo. (#2324)
* Add Version to TrackInfo.

Set when a track is published.

* update protocol
2023-12-19 11:50:48 +05:30
Raja Subramanian
5ee307952e Reduce a couple of logs to Debugw. Small saving. (#2322) 2023-12-18 14:27:55 +05:30
Raja Subramanian
3cf4fbc6a9 Store identity in participant update cache. (#2320)
Need to store identity of other partiicpant in cache so that it
can be sent with the disconnected participant update.
Side note: Feels like the cache can be made to hold the full proto
to make things simpler, but just adding a field for now.
2023-12-15 15:40:10 +05:30
cnderrauber
a150eaf697 Fix mid info lost when migrating multi-codec simulcast track (#2315)
* Fix mid info lost when migrating multi-codec simulcast track

* update pion
2023-12-15 00:02:27 +08:00
Raja Subramanian
0478af449f Do not error on end-of-candidates candidate (#2314) 2023-12-14 15:22:57 +05:30
lukasIO
23b46042cc Populate disconnect updates with participant identity (#2310) 2023-12-13 15:32:25 +01:00
Raja Subramanian
dfcafff955 Log track info when media published. (#2306)
With pending track added moved to Debugw, will be good to have this when
track is published.
2023-12-11 11:20:25 +05:30
Raja Subramanian
c766676d36 Handle nil pair (#2305) 2023-12-10 21:44:16 +05:30
Raja Subramanian
83efa9258e Bump up protocol for connection quality LOST. (#2297)
Also log trackID/trackInfo in layer mapping.
2023-12-06 16:59:05 +05:30
David Zhao
02c28a5946 Fix Selected attribute not being copied (#2289) 2023-12-03 22:05:23 -08:00
David Zhao
98c81b92bb Helper function to remove address from ClientInfo (#2288) 2023-12-03 10:48:33 -08:00
David Zhao
37e1864df8 Expose detailed connection info with ICEConnectionDetails (#2287)
* Expose detailed connection info with ICEConnectionDetails

* clone to avoid data race

* lower transport

* simplify

* address feedback
2023-12-03 10:03:41 -08:00
David Zhao
3fe124c87f Log cleanup pass (#2285)
* Log cleanup pass

Demoted a bunch of logs to DEBUG, consolidated logs.

* use context logger and fix context var usage

* moved common error types, fixed tests
2023-12-02 15:07:31 -08:00
Raja Subramanian
d866b5110f Restrict scope of negotiation time out error logs (#2283)
* Restrict scope of negotiation time out error logs

1. Log "negotiation failed" only if signal channel was active
within half window of negotiation timeout. Negotiation timeout currently
is at 15 seconds. Signal pings are every 10 seconds.
2. In transport.go, do not report negotiation timed out and do not
callback negotiation failure if the peer connection state is not
connected. Goal of negotiation failure tracker is to take remedial
action when an in-session negotiation fails. Seeing a bunch of cases
of the case hitting even without ICE connection forming. Negotiation
timer is not intended for those cases.

* fix test
2023-12-02 12:44:37 +05:30
Raja Subramanian
7b778c50eb Group SDES items for one SSRC in the same chunk. (#2280) 2023-12-01 11:37:14 +05:30
Raja Subramanian
2ee5aa7c98 Add optional supervisor disable. (#2277)
* Add optional supervisor disable.

Used `DisableSupervisor` so that default can be enabled and
it can be disabled explicity. But, open to defaulting to disable
(i. e. change param to `EnableSupervisor`).

* Move nil check to call site
2023-11-30 13:04:31 +05:30
Raja Subramanian
fa061b47fc Logging adjustnments (#2273) 2023-11-29 15:40:01 +05:30
Paul Wells
890f0bfc67 initialize prometheus metrics in test files (#2267) 2023-11-27 21:31:39 -08:00
Raja Subramanian
5f76d1adcc Introduce DISCONNECTED connection quality. (#2265)
* Introduce `DISCONNECTED` connection quality.

Currently, this state happens when any up stream track does not
send any packets in an analysis window when it is expected to send
packets.

This can be used by participants to know the quality of a potentially
disconnected participant. Previously, it took 20 - 30 seconds for
the stale timeout to kick in and disconnect the limbo participant which
triggered a participant update through which other participants knew
about it.

Previously, `POOR` quality was also overloaded to denote that the
up stream is not sending any packets. With this change, that is a
separate indicator, i. e. `DISCONNECTED`.

* clean up

* Update deps

* spelling
2023-11-27 23:06:53 +05:30