Commit Graph

1513 Commits

Author SHA1 Message Date
Raja Subramanian
69a1e572be Attempt to reduce disruption due to probe. (#1839)
* Make congestion controller probe config

* Wait for enough estimate samples

* fixes

* format

* limit number of times a packet is ACKed

* ramp up probe duration

* go format

* correct comment

* restore default

* add float64 type to generated CLI
2023-06-30 11:09:46 +05:30
David Zhao
7be9e2258d Upgrade to Pion 3.0.11, disable active TCP (#1836) 2023-06-28 16:53:58 -07:00
Juan Navarro
2668073c29 Honor bind address passed as --bind also for RTC ports (#1815)
* Use net.JoinHostPort to build "host:port" strings for `net.Listen`

net.JoinHostPort provides a unified way of building strings of the form
"Host:Port", abstracting the particular syntax requirements of some
methods in the `net` package (namely, that IPv4 addresses can be given
as-is to `net.Listen`, but IPv6 addresses must be given enclosed in
square brackets).

This change makes sense because an address such as `[::1]` is *not* a
valid IPv6 address; the square brackets are just a detail particular to
the Go `net` library. As such, this syntax shouldn't be exposed to the
user, and configuration should just accept valid IPv6 addresses and
convert them as needed for usage within the code.

* Use '--bind' CLI flag to also filter RTC bind address

The local address passed to a command such as

    livekit-server --dev --bind 127.0.0.1

was being used as binding address for the TCP WebSocket port, but was
being ignored for RTC connections.

With `--dev`, the conf.RTC.UDPPort config is set to 7882, which enables
"UDP muxing" mechanism. Without interface or address filtering, Pion
would try to bind to port 7882 on *all* interfaces.

This was failing on a system with IPv6 enabled, when trying to bind to
an IPv6 address of the `docker0` interface. It seems to make sense that
the user-passed bind addresses are also honored for the RTC port
bindings.
2023-06-28 16:52:43 -07:00
Raja Subramanian
eaf70d5549 Pacer in down stream path. (#1835)
* Pacer interface to send packets

* notify outside lock

* use select

* use pass through pacer

* add error to OnSent

* Remove log which could get noisy

* Starting TWCC work (#1727)

* add packet time

* WIP commit

* WIP commit

* WIP commit

* minor comments

* Some measurements (#1736)

* WIP commit

* some notes

* WIP commit

* variable name change and do not post to closed channel

* unlock

* clean up

* comment

* Hooking up some more bits for TWCC (#1752)

* wake under lock

* Pacer in down stream path.

Splitting out only the pacer from a feature branch to
introduce the concept of pacer.

Currently, there should be no difference in functionality
as a pass through pacer is used.

Another implementation exists which is just put it in a queue and send
it from one goroutine.

A potential implementation to try would be data paced by bandwidth
estimate. That could include priority queues and such.

But, the main goal here is to introduce notion of pacer in the down
stream path and prepare for more congestion control possibilities down
the line.

* Don't need peak detector

* remove throttling of write IO errors
2023-06-28 13:22:44 +05:30
Raja Subramanian
2b0a470474 Less flapping in probe. (#1834)
- Increase max interval between probes to 2 minutes.
- Use a minimum probe rate of 200 kbps. This is to ensure that
the probe rate is decent and can produce a stronger signal.
2023-06-28 12:48:38 +05:30
Raja Subramanian
cea41e4189 Discount out-of-order packets in downstream score. (#1831)
* Discount out-of-order packets in downstream score.

More notes inline.

* correct comment

* clean up comment
2023-06-27 17:44:53 +05:30
cnderrauber
5b975af55f Refine dependency descriptor based selection forwarder (#1808)
* Don't update dependency info if unordered packet received

* Trace all active svc chains for downtrack

* Try to keep lower decode target decodable

* remove comments

* Test case

* clean code

* solve comments
2023-06-27 15:11:06 +08:00
Raja Subramanian
2896aeb126 Set potential codecs for tracks without simulcast codecs. (#1828)
When migrating muted track, need to set potential codecs.
For audio, there may not be `simulcast_codecs` in `AddTrack`.
Hence when migrating a muted track, the potential codecs are not set.
That results in no receivers in relay up track (because all this
could happen before the audio track is unmuted).

So, look at MimeType in TrackInfo (this will be set in OnTrack) and
use that as potential codec.
2023-06-27 04:34:41 +05:30
Raja Subramanian
352bb1d204 Add GetClientInfo interface, to be used to decide migration vs full-reconenct (#1827) 2023-06-26 23:15:53 +05:30
Raja Subramanian
95f360bbce Do not process events after participant close. (#1824)
* Do not process events after participant close.

Avoid processing transport events after participant/transport close.
It causes error logs which are not really errors, but distracting noise.

* correct comment
2023-06-25 09:26:14 +05:30
Raja Subramanian
81f41aca20 Full reconnect on publication mismatch on resume. (#1823)
* Full reconnect on publication mismatch on resume.

It is possible that publications mismatch on resume. An example sequence
- Client sends `AddTrack` for `trackA`
- Server never receives it due to signalling connection breakage.
- Client could do a resume (reconnect=1) noticing signalling connection
  breakage.
- Client's view thinks that `trackA` is known to server, but server does
  not know about it.
- A subsequence offer containing `trackA` triggers `trackInfo not
  available before track publish` and the track does not get published.

Detect the case of missing track and issue a full reconnect.

* UpdateSubscriptions from sync state a la cloud

* add missing shouldReconnect
2023-06-24 19:18:05 +05:30
Raja Subramanian
8ac394c5bb Removing commented out short cut path, don't need more debug data. (#1822) 2023-06-23 14:18:55 +05:30
Benjamin Pracht
1f6efedd31 Send updated events on state updates (#1819) 2023-06-22 09:20:58 -07:00
Paul Wells
c38791ff0a stop retrying signal connection if the request context is closed (#1820) 2023-06-22 07:09:34 -07:00
Raja Subramanian
00558dee5c Close participant on full reconnect. (#1818)
* Close participant on full reconnect.

A full reconnect == irrecoverable error. Participant cannot continue.
So, close the participant when issuing a full reconnect.
That should prevent subscription manager reconcile till the participant
is finally closed down when participant is stale.

* format
2023-06-22 10:09:10 +05:30
Raja Subramanian
2438058474 Drop error logs due to pipe close (#1813) 2023-06-21 14:11:17 +05:30
Raja Subramanian
84994b39ab Make the samples string more readable. (#1810) 2023-06-21 11:35:38 +05:30
Raja Subramanian
583648a1ed Avoid closure to reduce life span of objects. (#1809)
A subscription in subscription manager could live till the source
track goes away even though the participant with that subscription
is long gone due to closure on source track removal. Handle it by using
trackID to look up on source track removal.

Also, logging SDPs when a negotiation failure happens to check
if there are any mismatches.
2023-06-20 19:06:01 +05:30
Raja Subramanian
27051e9999 It is possible that pipe is closed before blank frame send, do not warn (#1807) 2023-06-20 11:58:01 +05:30
Raja Subramanian
f11a7a229f Remove unnecessary check (#1806) 2023-06-19 16:40:05 +05:30
Paul Wells
a6d091a810 update protocol (#1803) 2023-06-18 18:13:34 -07:00
Raja Subramanian
40f5902d36 Consistently use connID as log tag (#1801) 2023-06-17 21:02:02 +05:30
Raja Subramanian
2383234f6e Simplify sliding window collapse. (#1802)
* Simplify sliding window collapse.

Keep the same value collapsing simple.
Add it to sliding window as long as same value is received for longer
than collapse threshold.
But, add a prune with three conditions to process the siliding window
to ensure only valid samples are kept.

* flip the order of validity window and same value pruning

* increase collapse threshold to 0.5 seconds during non-probe
2023-06-17 18:56:38 +05:30
Raja Subramanian
395f403132 Small stream allocator tweaks. (#1800)
1. Probe end time needs to include the probe cluster running time also.
2. Apply collapse window only within the sliding window. This is to
   prevent cases of some old data declaring congestion. For example,
   an estimate could have fallen 15 seconds ago and there might have
   been a bunch of estimates at that fallen value. And the whole
   sliding window could have that value at some point. But, a further
   drop may trigger congestion detection. But, that might be acting too
   fast, i. e. on one instance of value fall. Change it so that we
   detect if there is a fall within the sliding window and apply
   collapse based on that.
2023-06-17 12:35:29 +05:30
Benjamin Pracht
552e3758d5 Add IngressUpdated event (#1775) 2023-06-16 10:58:49 -07:00
Raja Subramanian
cadf3bf649 Simulate muted audio track publish on migration. (#1799)
Till now only video was using simulated publish when migrating on mute.
But, with `pauseUpstream() + replaceTrack(null)`, it is possible that
client does not send any data when muted.

I do not think there is a problem to do this (even when cleint is
actually using mute which sends silence frames).
2023-06-16 22:00:38 +05:30
Raja Subramanian
908b7a9bb1 Promote some migration logs to Infow (#1798) 2023-06-16 19:00:17 +05:30
Raja Subramanian
6946d0a3a1 Do not mute forwarder when paused to bandwidth congestion. (#1796)
* Do not mute forwarder when paused to bandwidth congestion.

Detailed notes in code.

* remove word
2023-06-16 12:08:01 +05:30
David Zhao
f71544e27a Do not send ParticipantJoined webhook if connection was resumed (#1795)
* Do not send ParticipantJoined webhook if connection was resumed

* isResume -> isMigration
2023-06-15 15:39:04 -07:00
paulwe
0dab55556d add drain function to rtc service 2023-06-15 11:56:41 -07:00
Raja Subramanian
12db469297 Better tracking of signalling connection. (#1794)
* Better tracking of signalling connection.

- Reason for closing signaling channel.
- ConnectionID attached to request source/response sink

* Tests
2023-06-15 12:53:34 +05:30
shishirng
2dd4e1365b Send EgressUpdated event (#1792)
Signed-off-by: shishir gowda <shishir@livekit.io>
2023-06-14 18:56:07 -04:00
Raja Subramanian
afa7733748 Promote switch logs to Infow. (#1790) 2023-06-12 17:30:56 +05:30
Raja Subramanian
9809b8bc3a Use nack queue params. (#1789)
* Use nack queue params.

* fix test
2023-06-12 13:01:02 +05:30
cnderrauber
c91889edfd Add dependency descriptor stream tracker for svc codecs (#1788)
* Add dependency descriptor stream tracker for svc codecs

* Solve comments
2023-06-12 15:07:47 +08:00
Raja Subramanian
3d696ac39f Keep next timestamp on switch closer to ref. (#1784)
If ref is coming in slow (due to pacing), it is possible that
expected is ahead. Pulling next too far towards expected causes
warps in a subsequent report. Keep switches closer to ref.
2023-06-10 11:38:46 +05:30
Raja Subramanian
4805dec1f0 Create channel observer on probe reset. (#1783)
On a state change, it was possible an aborted probe was pending
finalize. When probe controller is reset, the probe channel
observer was not reset. Create a new non-probe channel observer
on state change to get a fresh start.

Also limit probe finalize wait to 10 seconds max. It is possible
that the estimate is very low and we have sent a bunch of probes.
Calculating wait based on that could lead to finalize waiting for
a long time (could be minutes).
2023-06-10 10:54:55 +05:30
Raja Subramanian
0e7bdeabcb Simplify probe done handling. (#1782)
* Simplify probe done handling.

Seeing a case where the channel abserver is not re-created after
an aborted probe. Simplifying probe done (no callbacks, making it
synchronous).

* log more
2023-06-10 02:07:28 +05:30
Raja Subramanian
72ed5b19f7 Use receiver report stats for loss/rtt/jitter. (#1781)
* Use receiver report stats for loss/rtt/jitter.

Reversing a bit of https://github.com/livekit/livekit/pull/1664.
That PR did two snapshots (one based on what SFU is sending
and one based on combination of what SFU is sending reconciled with
stats reported from client via RTCP Receiver Report). That PR
reported SFU only view to analytics. But, that view does not have
information about loss seen by client in the downstream.
Also, that does not have RTT/jitter information. The rationale behind
using SFU only view is that SFU should report what it sends irrespective
of client is receiving or not. But, that view did not have proper
loss/RTT/jitter.

So, switch back to reporting SFU + receiver report reconciled view.
The down side is that when receiver reports are not receiver,
packets sent/bytes sent will not be reported to analytics.

An option is to report SFU only view if there are no receiver reports.
But, it becomes complex because of the offset. Receiver report would
acknowledge certain range whereas SFU only view could be different
because of propagation delay. To simplify, just using the reconciled
view to report to analytics. Using the available view will require
a bunch more work to produce accurate data.
(NOTE: all this started due to a bug where RTCP was not restarted on
a track resume which killed receiver reports and we went on this path
to distinguish between publisher stopping vs RTCP receiver report not
happening)

One optimisation to here here concerns the check to see if publisher is sending data.
Using a full DeltaInfo for that is an overkill. Can do a lighter weight
for that later.

* return available streams

* fix test
2023-06-09 23:31:25 +05:30
Raja Subramanian
f518f5d743 Log head SN when packet cannot be fetched (#1780) 2023-06-09 12:13:06 +05:30
David Colburn
8235310a92 don't save info after UpdateStream (#1779) 2023-06-07 16:27:37 -07:00
Raja Subramanian
22813cd2be Recreate channel observer irrespective of probe success/fail. (#1778) 2023-06-08 01:40:07 +05:30
Raja Subramanian
b591140d66 Ignore receiver report till initialized (#1773) 2023-06-06 21:43:49 +05:30
Raja Subramanian
7ed3af193a No proof that this helps (#1772) 2023-06-06 11:28:13 +05:30
Raja Subramanian
076d8cad73 Promote switch log to Infow (#1771) 2023-06-06 11:20:57 +05:30
David Zhao
7e5a7ae79f Fixed windows build (#1768) 2023-06-04 00:17:25 -07:00
Raja Subramanian
f5c5d4e079 Wait for a more stable measurement of sample rate. (#1764) 2023-06-03 14:26:26 +05:30
Benjamin Pracht
e7879a46fc Add ingress telemetry support (#1763) 2023-06-02 17:38:19 -07:00
Raja Subramanian
c2ae34151c Enable some debug logs to debug freeze (#1761)
* Enable some debug logs to debug freeze

* log receiver sender report also
2023-06-02 16:31:19 +05:30
David Zhao
b5c8fe5294 Perform unsubscribe in parallel to avoid blocking (#1760)
* Perform unsubscribe in parallel to avoid blocking

When unsubscribing from tracks, we flush a blank frame in order to prepare
the transceivers for re-use. This process is blocking for ~200ms. If
the unsubscribes are performed serially, it would prevent other subscribe
operation from continuing.

This PR parallelizes that operation, and ensures subsequent subscribe
operations could reuse the existing transceivers.

* also perform in parallel when uptrack close

* fix a few log fields
2023-06-02 00:13:18 -07:00