Commit Graph

906 Commits

Author SHA1 Message Date
Raja Subramanian 57b3dfdcf4 Loss based congestion signal detector. (#3168)
* Loss based congestion signal detector.

It uses the same approach of thresholding + duration to detect
region of operation and further derive early warning/congested states.
A gutter is used for indeterminate region just like the queuing delay
based case.

The two approaches (queuing delay and loss) are treated independently,
i. e. packet groups have to satify the same type of condition (queuing
delay OR loss) to build up congestion.

The aggregate congestion signal is triggered if either one triggers.

Maybe, there is a way to accept hybrid signalling (i. e. each group
satisfying either threhsold adds up to congestion signal detection), but
needs more experimentation. For now, keeping them separate.

* apply max threshold

* clean up

* spelling
2024-11-11 13:27:49 +05:30
Raja Subramanian ceb8a70696 Use same components when logger is updated (#3166)
Logger in buffer can get updated when the layer is known. Use the same
components used in destructor.
2024-11-11 11:38:48 +05:30
Raja Subramanian 5109551262 Reduce lock scope. (#3167)
Also, do not close channel. stop fuse break will close
the worker and GC cleans up.
2024-11-11 11:38:32 +05:30
Raja Subramanian a3f2ca56f9 TWCC based congestion control - v0 (#3165)
* file output

* wake under lock

* keep track of RTX bytes separately

* packet group

* Packet group of 50ms

* Minor refactoring

* rate calculator

* send bit rate

* WIP

* comment

* reduce packet infos size

* extended twcc seq num

* fix packet info

* WIP

* queuing delay

* refactor

* config

* callbacks

* fixes

* clean up

* remove debug file, fix rate calculation

* fmt

* fix probes

* format

* notes

* check loss

* tweak detection settings

* 24-bit wrap

* clean up a bit

* limit symbol list to number of packets

* fmt

* clean up

* lost

* fixes

* fmt

* rename

* fixes

* fmt

* use min/max

* hold on early warning of congestion

* make note about need for all optimal allocation on hold release

* estimate trend in congested state

* tweaks

* quantized

* fmt

* TrendDetector generics

* CTR trend

* tweaks

* config

* config

* comments

* clean up

* consistent naming

* pariticpant level setting

* log usage mode

* feedback
2024-11-11 10:24:47 +05:30
Raja Subramanian 653857e42b Split out audio level config. (#3163)
* Split out audio level config.

Inline it in yaml as it is exposed/documented config.

* test

* default congestion control enable
2024-11-08 21:36:38 +05:30
Raja Subramanian 86383b2271 De-centralize some configs to where they are used. (#3162)
* De-centralize some configs to where they are used.

And make default variables.

Renaming a bit, but these are all internal config and have not been
added to documented config.

* Keep documented config as is.

* test

* typo
2024-11-08 12:47:30 +05:30
Raja Subramanian f3a13569ee Use int64 nanoseconds and reduce conversion in a few places (#3159) 2024-11-06 12:28:30 +05:30
Raja Subramanian d341ee1ce8 Maintain RTT marker for calculations. (#3139)
* Maintain RTT marker for calculations.

Restore the drift logging change.

* remove unnecessary cast
2024-10-25 11:50:59 +05:30
Raja Subramanian 542620b486 Revert "Adjust drift calculations for pass through. (#3129)" (#3138)
This reverts commit 7ab6e5df09.
2024-10-25 11:11:21 +05:30
cnderrauber ca77df8212 warn for multiple dd ext (#3135)
* warn for multiple dd ext

* unused
2024-10-24 16:59:24 +08:00
Raja Subramanian 7ab6e5df09 Adjust drift calculations for pass through. (#3129)
No functional effect, but was logging more than expected drift in the
down stream direction. Reason is that when passing through, we could be
using an older report. But, the adjustment was applied to the monotonic
clock and not the RTP timestamp. So, it looked like more time had
elapsed for the same RTP clock elapsed and logging higher than expected
drift. Correcting it so that the log is not misleading/confusing.
2024-10-23 11:03:43 +05:30
Raja Subramanian d4e3c63406 Seed duplicate packets and bytes. (#3124)
Had missed this before. This could have cause retransmit packets/bytes
to be high.
2024-10-21 23:58:41 +05:30
Raja Subramanian 45b2804df8 Skip divide-by-0. (#3119)
Does not crash, but does a NaN. Avoid that.
2024-10-19 16:21:23 +05:30
Raja Subramanian 40b10af960 Use monotonic time util. (#3112)
Thank you @paulwe for doing this. I was promising to do this for a
while, but just like other times, empty promises :-(
2024-10-17 10:49:24 +05:30
Raja Subramanian a66fff1576 Use pointers in unlikely so that values get de-referenced at log time (#3101)
* Use pointers in unlikely so that values get de-referenced at log time
after they have been filled in.

* instantiate logger at log time
2024-10-15 23:45:40 +05:30
Raja Subramanian 9ac48e2984 Grab time under lock. (#3100)
Revert part of my previous commit. I vaguely remembered there was a reason for
having code like that, but did not remember the details and ended up
consolidating. The issue is that time needs to be grabbed under lock so
that two events happening close to each other do not get order swapped.
2024-10-15 22:39:43 +05:30
Raja Subramanian 8b604df32a Set FEC enabled properly in connection stats module. (#3098)
* Set FEC enabled properly in connection stats module.

With RED, the FEC indication is in primary codec.

Also, clean up some bits that were not necessary (TrackInfoAvailable is
not needed)

TODO: There are still a couple of things to figure out
- If codec is RED, Opus is added as second codec synthetically using
  https://github.com/livekit/livekit/blob/33098337fc17705bbdb3283c7a7034aa6b2f3745/pkg/rtc/mediaengine.go#L31
  which hard codecs FEC enabled. Ideally, we should get the primary
  codec parameters from SDP offer.
- The WebRTCReceiver does not have information about primary codec. For
  now, just setting FEC to true when RED is enabled. It is okay as it
  just affects when we declare quality drops, but ideally the primary
  codec should be retrieved from SDP offer.

* clean up and comment

* full prop check
2024-10-15 17:39:42 +05:30
Raja Subramanian d052caa104 Use PPS mode rather than max to adjust packet loss weight. (#3095) 2024-10-14 20:16:19 +05:30
Raja Subramanian a8da4872b1 Drop quality a bit faster on score trending lower to be more responsive. (#3093)
Also, logging a bit more about quality changes to understand why
high(ish) loss does not drop quality. Will remove the loss thresholded
logging after collecting some data.
2024-10-14 17:21:42 +05:30
Raja Subramanian f154b236b5 Fix down stream packet loss reporting. (#3092)
* Fix down stream packet loss reporting.

* format
2024-10-14 11:08:10 +05:30
Raja Subramanian 6a721efa7c Log of down track write stop. (#3087) 2024-10-12 04:19:27 +05:30
cnderrauber 76bc112649 Don't return bind error on unsupport codec (#3085)
pion will not start transports if Bind fails at first answer
2024-10-11 14:59:54 +08:00
cnderrauber 85c653f665 dd selector debug logs (#3082) 2024-10-10 12:37:28 +08:00
cnderrauber 64d89dc2f8 Use difference debounce interval in negotiation (#3078)
Transport will send offer immediately if last
negotiation is before debounce interval in #1929,
it will cost two negotiation for a/v tracks if a
pubisher publishes two tracks at same time like
screenshare or enable mic/camera. This change use
a small debounce interval in this case to avoid this issue.
2024-10-09 21:13:05 +08:00
Raja Subramanian 5e22582c66 Make a lite version of sender stats to be used in relay down track. (#3069) 2024-10-06 13:01:08 +05:30
Paul Wells 3261560098 api for agent worker job count (#3068)
* api for agent worker job count

* cleanup

* temp deps

* temp deps

* deps
2024-10-05 05:13:52 -07:00
Raja Subramanian 2491ee7c7c Make lite version of RTPStatsReceiver called RTPStatsReceiverLite. (#3065)
* Make lite version of RTPStatsReceiver called RTPStatsReceiverLite.

Refactor around that.

Will probably make some more flavors to have lighter versions still.

* update deps

* use MarshalLogArray

* use util
2024-10-05 10:50:25 +05:30
Paul Wells 99f7be7c1c clean up redundant String calls in logs (#3064) 2024-10-03 08:08:46 -07:00
Raja Subramanian 8ac33a868c Splitting out rtp stats stuff into its own package. (#3060)
* Splitting out rtp stats stuff into its own package.

Going to be making some lighter versions of these.
Will be cleaner to have all of these grouped together.
So, as a first step, just making a package for it.

* tests
2024-10-03 15:51:24 +05:30
Paul Wells 0b4fd32905 add unlikely logger (#3058) 2024-10-02 22:58:25 -07:00
Raja Subramanian 0656b623f7 use marshalled logger (#3057) 2024-10-03 10:27:47 +05:30
Raja Subramanian 4d7839bff3 Fix clock rate skew calculation. (#3055)
Cannot cast NTP timestamp diff to time.Duration.
That causes duration to appear more than it actually is.
Was causing a bunch of log spam.
2024-10-01 00:33:36 +05:30
cnderrauber 341d1e512c change bind log to debug (#3033) 2024-09-23 12:05:38 +08:00
cnderrauber eed925fddf avoid race condition on downtrack.Codec (#3032) 2024-09-22 14:27:26 +08:00
Raja Subramanian 191e8635e8 fix missed baseTime init (#3025) 2024-09-19 18:37:35 +05:30
Raja Subramanian 7df6f86693 Initial plumbing for metrics. (#2950)
* Initial plumbing for metrics.

This implements
- metrics received from participant.
- callback to room.
- room distributes it to all other participants (excluding the sending
  participant).
- other participants forward to client.
- counting metrics bytes in data channel stats

TODO:
  - recording/processing/batching
  - should recording/processing/batching happen on publisher side or
    subscriber side?
  - should metrics be echoed back to publisher?
  - grants to publish/subscribe metrics.

* mage generate

* clear OnMetrics on close

* - CanSubscribeMetrics permission.
- Echo back to sender.

* update deps

* No destination identities for metrics

* WIP

* use normalized timestamp for server injected timestamps

* compile

* debug log metrics batch

* correct comment

* add baseTime to wire

* protocol dep

* Scope metrics forwarding to only participants that a participant is
subscribed to.

Also remove the participant_metrics.go file as it was not doing anything
useful.

* update comment

* utils.ErrorIsOneOf

* couple of more utils.CloneProto
2024-09-19 11:42:31 +05:30
Paul Wells 4deaac2f3f replace proto.Clone calls (#3024)
* replace proto.Clone calls

* deps

* tests
2024-09-18 22:47:33 -07:00
Raja Subramanian f21bc84967 Log only when not nil. (#3015)
* Log only when not nil.

Default logging confuses debugging as we call using nil as well to make
the call site simpler. And logging a nil makes it look like it is
incorrect seeding. `nil` fields do not seed. So, don't log when `nil`.

* log SDP
2024-09-18 12:39:53 +05:30
Raja Subramanian 9a4ddf05d5 Fix forwarder panic defer of nil senderReport (#3011) 2024-09-17 08:44:09 +05:30
Raja Subramanian 914e5d6993 Set SenderReport to nil on seeding if empty. (#3008)
With protobuf marshalling/unmarshalling, the default struct was getting
used and nil checks for non existence were failing. That means the right
layers were not reported active on migration.

Also, restore the wrapped loggers as some fields needs some conversion
before logging.
2024-09-16 23:56:13 +05:30
Raja Subramanian 3c8a7d7828 Log down track state on seeding. (#3003)
Also, log when RTCP sender report for reference layer is received.

Cleaning up a bunch of wrapped logger calls as we already have
logger.Proto which I forgot about while doing those wrappers.
2024-09-14 23:07:54 +05:30
Raja Subramanian 1b5bb4dddc Log ICE reconnected. (#2999)
To increase visibility of ICE reconnect, logging reconnected at Infow
level. Otherwise, it is hard to see if an ICE restart finished
successfully.

Also, cleaning up ICEConnectionDetails a bit. Just separate out
read-only fields into its own struct and use it for read-only export.
2024-09-12 13:16:41 +05:30
Raja Subramanian 182ab9a951 Log more details around layer switch (#2996)
* Log more details around layer switch

* split error
2024-09-11 09:23:42 +05:30
Raja Subramanian b678ccdd66 Cache RTCP sender report in forwarder state. (#2994)
* Cache RTCP sender report in forwarder state.

To be used in migration.

TODO: need to check more places to operate pure in unix nano rather than
converting.

* match name
2024-09-10 20:50:50 +05:30
Raja Subramanian d53f732ada Do not take padding packets into account in max pps calculation (#2990) 2024-09-09 11:08:50 +05:30
Raja Subramanian 787b8450e9 Record out-of-packet count/rate in prom. (#2980)
* Record out-of-packet count/rate in prom.

Adding a field to AnalyticsStream to make this easier to report.
Let me know if adding to AnalyticsStream is not ok.

Will set up a protocol PR if it is okay.

* deps
2024-09-07 00:19:54 +05:30
cnderrauber 4792e7e134 Revert "Add tracksubscribed event on downtrack added (#2934)" (#2975)
This reverts commit 8b47218270.
2024-09-04 10:42:18 +08:00
Raja Subramanian 6de871d4e8 Allow start streaming on an out-of-order packet. (#2971)
But, do not record first packet time on an out-of-order packet.
It so happens that packets get out-of-order a lot more across relay.
And it turns out with some H.264 stream, the first few packets of a key
frame are very small (may be SPS/PPS, haven't checked), they get
out-of-oder quite a lot, so much so a down track never starts even it
has 20 - 25 key frames have passed through.
2024-09-02 21:36:46 +05:30
cnderrauber efa85221b3 Negotiate downttrack for subscriber before receiver is ready (#2970)
* Negotiate downttrack for subscriber before receiver is ready

This change will save 1 round sdp negotiation time for
subscribing to simulcast-codec or remote node track

* solve comment

* Fix simulcast-codec case
2024-09-02 14:10:14 +08:00
Raja Subramanian 579f76cf7c Use 0 rollover when possible. (#2968) 2024-08-31 11:36:49 +05:30