* Some logging changes.
Trying to chase a case of large sequence number gap on subscriber side
where packets are sent after a long time.
* return values instead of logging
* Add support for RTP stream restart.
When an unhandled packet is encountered, try a restart sequence.
Restart happens when 5 packets with contiguous sequence numbers and same
or increasing time stamps are received. Note that this does not work for
B-frame type of scenarios, but that is true for receive path handling
even before this. As WebRTC does not use B-frames, it is fine. But,
needs to be looked at again if B-frames are necessary.
It is controlled by a config that is disabled by default.
* clean up
* debug log
* Avoid duplicate track add to room track manager.
Don't have proof that this happens, but in the leak chase, this is
another component at room level and holds references to tracks. Guessing
this is not cleaning tracks till room is closed.
* add a report
It is possible that the stream stops just after start and
restarts much later introducing a large gap in sequence number.
That could look like an unhandled case because the wrap back handler
does not have enough packets yet.
Let other checks based on time stamp gap take effect and only if that
also leaves the sequence number unhandled, drop the packet.
* switch participant callbacks to room to listener interface
* mage generate
* clean up
* clear listener
* clean up
* use interface in up data track manager
* tweaks
* Paul feedback - should reduce the diff as this keeps the room handlers as is except making methods for a couple of anonymous handlers
* clean up
When a participant is closing, RTCP readers should be cleaned up from
factory even if the participant is expected to resume. The resumed
participant will be a new participant session and peer connection(s) and
everything will be set up again.
- New bucket API to pass in max packet size and sequence number offset
and seequence number size generic type
- Move OWD estimator to mediatransportutil.
After adding more fields in
https://github.com/livekit/livekit/pull/4105/files, it was not even
logging. Access to one of the added fields must have ended up waiting on
a lock and blocked.
Unfotunately, the deadlock fix in https://github.com/pion/ice/pull/840
did not address the peer connection close hang.
Splitting the logs so that the base log still happens. Ordering after
looking at the code and guessing what could still log to see if we get
more of the logs and learn more about the state and which lock ends up
the first blocking one.
* Add checks for participant and sub-components close.
Looks like there might be some memory leak with participant sessions not
getting closed properly. Adding checks (to be cleaned up later) to see
if there is a consistent place where things might hang.
* init with right type
* Remove unnecessary goroutine, thank you @milos-lk
* clean up
Correct triple-d spelling of "address" field in transport logs.
I’m not sure whether this was intentional, but I noticed it
while creating Grafana queries and filters. This matters because
anyone filtering logs using the correct spelling may
unintentionally miss relevant data, leading to incomplete or
misleading analysis.
* Use sync.Pool for objects in packet path.
Seeing cases of forwarding latency spikes that aling with GC.
This might be a bit overkill, but using sync.Pool for small +
short-lived objects in packet path.
Before this, all these were increasing in alloc_space heap profile
samples over time. With these, there is no increase (actually the lines
corresponding to geting from pool does not even show up in heap
accounting when doing `list` in `pprof`)
* merge
* Paul feedback
* Forwarding latency measurement tweaks.
- prom transmission type public
- do not measure short term values as it is not used and saves some lock
contention time in packet path potentially. Adding a separate method
for that.
- Change latency/jitter summary reporting to `ns` also to match the
histogram.
* add GetShortStats
* Higher resolution forwarding latency histogram.
Was using the average latency/jitter of last second to populate
forwarding latency/jitter histogram. But, it is too coarse, i. e. the
average value of latency/jitter is very low and those summarised samples
end up in the lowest bucket always.
A few things to address it
- record per packet forwarding latency in histogram
- adjust histogram bins to include smaller values
- Drop jitter histogram
This is a per packet call, but prometheus histogram is supposedly
fast/light weight. Would be good to get better resolution histograms.
Hence doing this. Please let me know if there are performance concerns.
* typo
* one more typo