* Don't wait rtp packet to fire track
Create track from sdp instead of first rtp packet,
it is consistent with the browser behavior and
will accelerate the track publication.
* fix test
* WIP
* no worker
* fixes
* use congested packet groups
* oldest group
* markers
* WIP
* WIP
* WIP
* WIP
* WIP
* clean up
* fmt
* consolidate
* store last packet only for bwe extension cases
* Try up-allocation on neutral trend.
Some probes end up with neutral trend due to getting much estimates of
same value. It is okay to try up-allocating in those cases. Otherwise,
the stream allocator some times gets stuck and does not up-allocate at
all as all probes end up neutral.
Changing the name of the signal to `NotCongesting` to signify it is
either neutral or clearing.
* wait 5 RTT for probe to finalize
* trend detector object encoder
Reverting back to pre-refactor behaviour. Was trying to avoid doing
special treatment when in probe, but REMB values are hard to predict
and the NACKs as well.
So, freeze updates when congesting in probe till the probe is done.
Otherwise, further changes while probe is finalising sometimes causes an
invalid signal and tracks are not up allocated.
* Clean up remote BWE a bit.
- Had forgotten to start worker, fix that
- ensure correct type of channel observer (probe OR non-probe) based on
probe state.
- introduce congested hangover state to see better state transitions.
Does not really affect operation, but state transitions are clearer.
* prevent 0 ticker
* Simplify probe sleep calculations.
Splitting into buckets made it problematic around the boundaries and it
was ugly code too. Simplify and set up probes with sleep after each
probe to get the desired interval/rate.
* continue after pop
* Add datastream packet type handling
* point to main in protocol
* Revert "point to main in protocol"
This reverts commit 2cc6ed6520.
* Update protocol
For applications with heavy data usage, accumulating data bytes over 5
minutes and then calculating rate using a much shorter window (like 2 -
5 seconds) makes it looks like there is a massive rate spike.
While this change is not a fix, this should soften the impact.
Need a better way to handle different parts of the system operating at
different frequencies. Can use rate in the reporting window, but that
will miss the spikes. Maybe that is okay. For example, if the reporting
window is 5 minutes and there was a 100 Mbps spike for about 10 seconds
of it, it would get smoothed out.
* Sender side snap shot clean up and logging.
Seeing cases of sender snap shot packet loss much higher the actual
packets some times. Tracking a bit more to understand that better.
- Rename variables to indicate what is coming from feed side clearly
- Fixed an issue with wrong init of feed side loss in snapshot
- Just use the loss from receiver report as it can go back (receiver
would subtract on receiving out-of-order packet).
- keep track sof reports in a snapshot (this is temporary for
debugging/understanding it better and will be removed later)
* remove check
* Set down track connected flag in one-shot-signalling mode.
Also, added maintaing ICE candidates for info purposes.
And doing analytics events (have to maintain the subscription inside
subscriptionmanager to get list of subscribed tracks, so added enough
bits from the async path into sync path to get the analytics bits also)
* comment typo
* method to check if a track name is subscribed
* Set down track connected flag in one-shot-signalling mode.
Also, added maintaing ICE candidates for info purposes.
And doing analytics events (have to maintain the subscription inside
subscriptionmanager to get list of subscribed tracks, so added enough
bits from the async path into sync path to get the analytics bits also)
* comment typo
* Log more details of RTP stats snap shots.
Seeing cases of loss more than 100%. Logging snap shots to understand it
better.
* log message
* use delta to update packets lost from RR
* remove cast
* WIP
* comment
* Verify method on LocalParticipant
* cleanup
* clean up
* pass in one-shot-mode to StartSession
* null message source and sink
* feedback and also remove check in ParticipantImpl for one-shot-mode-filtering as a null sink can be used for that
Cosmetic. While thinking through how to structure probing better,
noticing small things here and there. Cleaning up and making some small
PRs along the way.
* keep track of RTX bytes separately
* packet group
* Packet group of 50ms
* Minor refactoring
* rate calculator
* send bit rate
* WIP
* comment
* reduce packet infos size
* extended twcc seq num
* fix packet info
* WIP
* queuing delay
* refactor
* config
* callbacks
* fixes
* clean up
* remove debug file, fix rate calculation
* fmt
* fix probes
* format
* notes
* check loss
* tweak detection settings
* 24-bit wrap
* clean up a bit
* limit symbol list to number of packets
* fmt
* clean up
* lost
* fixes
* fmt
* rename
* fixes
* fmt
* use min/max
* hold on early warning of congestion
* make note about need for all optimal allocation on hold release
* estimate trend in congested state
* tweaks
* quantized
* fmt
* TrendDetector generics
* CTR trend
* tweaks
* config
* config
* comments
* clean up
* consistent naming
* pariticpant level setting
* log usage mode
* probing hacks
* WIP
* no lock
* packet group config
* ctr trend refactor
* cleanup and fixes
* format
* debug
* format
* move prober to ccutils
* clean up
* clean up
On a resume, the signal stats will call `ParticipantLeft`. Although, it
explicity says not to send events, it could still close the stats
worker.
To handle that, we created a stats worker if needed in
`ParticipantResume` notification in this PR
(https://github.com/livekit/livekit/pull/2982), but that is not enough
as that event could happen before previous signal connection closes the
stats worker.
A new stats worker does get created when `ParticipantJoined` is called
by the new signal connection, but it does not transfer connected state.
So, when the client leaves, `ParticipantLeft` is not sent.
I am not seeing why we should not transfer connected state always given
that it is the same participant SID/session. But, I have a feeling that
I am missing some corner case. Please let me know if I am missing
something here.
* Fix header size calculation in stats.
With pacer inserting some extensions, the header size used in stats
(and more impoetantly when probing for bandwidth estimation and
metering the bytes to control the probes) was incorrect. The size
was effectively was that of incoming extensions. It would have been
close enough though.
Anyhow, a bit of history
- initially was planning on packaging all the necessary fields into
pacer packet and pacer would callback after sending, but that was not
great for a couple of reasons
- had to send in a bunch of useless data (as far as pacer is
concerned) into pacer.
- callback every packet (this is not bad, just a function call which
happens in the foward path too, but had to lug around the above
data).
- in the forward path, there is a very edge case issue when calling stats update
after pacer.Enqueue() - details in https://github.com/livekit/livekit/pull/2085,
but that is a rare case.
Because of those reasons, the update was placed in the forward path
before enqueue, but did not notice the header size issue till now.
As a compromise, `pacer.Enqueue` returns the headerSize and payloadSize.
It uses a dummy header to calculate size. Real extension will be added
just before sending packet on the wire. pion/rtp replaces extension if
one is already present. So, the dummy would be replaced by the real one
before sending on the wire.
a21194ecfb/packet.go (L398)
This does introduce back the second rare edge case, but that is very
rare and even if it happens, not catastrophic.
* cleanup
* add extensions and dummy as well in downtrack to make pacer cleaner
* Use weighted loss to detect loss based congesiton signal.
- Increase JQR min loss to 0.25.
- Use weighted loss ratio so that more packet rate gets higher
weightage. At default config, 10 packets in 1/2 second will form a
valid packet group for loss based congestion signal consideration. Two
packets lost in that group may not be bad. So, bumped up the
JQR min loss to 0.25. However, 20% loss (or even much lesser loss)
could be problematic if the packet rate is higher (potentially
multiple streams affected and there could be a lot of NACKs as a result).
So, weight it by packet rate so that higher packet rates enter JQR
at lower losses.
* WIP
* use aggregated loss