Commit Graph

48 Commits

Author SHA1 Message Date
Raja Subramanian 14b0b48b15 Push/pull for connection stats/quality scoring. (#1505)
* Push/pull for connection stats/quality scoring.

Was not happy with pure pull method missing a window because
of RTCP RR timing is slightly off for audio and using a much
larger window of data in the next update.

That also resulted in RTP stats getting some bits of code.
As that is per-packet processing, was not a good idea.

Switching to push-pull method.
For up track, it is pull, i. e. connection stats worker will pull stats.

For down track, there is a new notification about receiver report
reception. Using this to check for time to run stats. And adding a bit
of tolerance for processing window (currently set so that as long as it
is > 95% of usual processing interval). This allows two things
- for video, RTCP RR are more frequent, but we will still not process
  till enough time has passed
- for audio, RTCP RR could be once in 5 seconds or so. Can process when
  it is available rather than miss a window and use a much larger window
  later.

* uber atomic
2023-03-09 11:51:20 +05:30
Raja Subramanian 99601e6d41 Handle the case of no packets in down stream tracks better. (#1500) 2023-03-07 14:32:43 +05:30
Raja Subramanian 04269c100c Connection quality misc changes (#1496)
* Connectino quality misc changes

1. Call scorer.Update() with nil stat when no data available so that
   scorer can synthesise window with proper window time.
2. Substract out loss in interval to account for packets not sent at
   all.
3. Fix `packetsNotFound` variable in `getIntervalStats`. I remember this
   working at some point. Not sure if I fat fingered in another PR and
   deleted the increment line.
4. Logging a bit more when no packets expected. Those can get noisy
   especially when track is muted. But, seeing some unexplained
   instances of no packets leading to quality drop. So, temporary logging
   to get a bit more information.

* correct spelling

* Limit packet score minimum to 0.0
2023-03-07 09:08:19 +05:30
Raja Subramanian 9e327b1f3c Connection quality (#1490)
* Make connection quality not too optimistic.

With score normalization, the quality indicator showed good
under conditions which should have normally showed some badness.

So, a few things in this PR
- Do not normalize scores
- Pick the weakest link as the representative score (moving away from
  averaging)
- For down track direction, when reporting delta stats, take the number
  of packets sent actually. If there are holes in the feed (upstream
  packet loss), down tracks should not be penalised for that loss.

State of things in connection quality feature
- Audio uses rtcscore-go (with a change to accommodate RED codec). This
  follows the E-model.
- Camera uses rtcscore-go. No change here. NOTE: THe rtscore here is
  purely based on bits per pixel per frame (bpf). This has the following
  existing issues (no change, these were already there)
  o Does not take packet loss, jitter, rtt into account
  o Expected frame rate is not available. So, measured frame rate is
    used as expected frame rate also. If expected frame rate were available,
    the score could be reduced for lower frame rates.
- Screen share tracks: No change. This uses the very old simple loss
  based thresholding for scoring. As the bit rate varies a lot based on
  content and rtcscore video algorithm used for camera relies on
  bits per pixel per frame, this could produce a very low value
  (large width/height encoded in a small number of bits because of static content)
  and hence a low score. So, the old loss based thresholding is used.

* clean up

* update rtcscore pointer

* fix tests

* log lines reformat

* WIP commit

* WIP commit

* update mute of receiver

* WIP commit

* WIP commit

* start adding tests

* take min score if quality matches

* start adding bytes based scoring

* clean up

* more clean up

* Use Fuse

* log quality drop

* clean up debug log

* - Use number of windows for wait to make things simpler
- track no layer expected case
- always update transition
- always call updateScore
2023-03-05 12:55:04 +05:30
Raja Subramanian 8e6bcdaffe Change lock scope of access to RTCP sender report data. (#1473)
* Change lock scope of access to RTCP sender report data.

Forwarder calls back to get time stamp offset.
Holding buffer lock is a much bigger scoped lock.

Reduce lock scope and cache latest sender report under its own lock.
And use that cache when calculating time stamp offset.

* move sr cache to stream tracker manager for re-use in relay

* cache before spread
2023-02-27 12:28:25 +05:30
Raja Subramanian 0dcd4e4856 Ensure temporal is not at -1 for non-simulcast streams (#1441) 2023-02-18 09:52:53 +05:30
Raja Subramanian 2671493870 Use purely RR based RTT. (#1351)
* Use purely RR based RTT.

With normalization of NTP time stamp to local time,
don't need to keep track of NTP time of publisher + local time of
when a report is sent. RTT calculations can happen with RR only.

Also, do not log errors when RTT cannot be calculated due to
no last SR. This can happen if the receiver sends an RR before
it receives an SR. As SFU is doing SRs once in 5 seconds, it is
possible some RRs happen before the first SR.

* use error type

* correct error name
2023-01-30 19:32:06 +05:30
Raja Subramanian c696626fe8 Use local time base for NTP in RTCP Sender Report for downtracks. (#1321)
* Use local time base for NTP in RTCP Sender Report for downtracks.

More details in comments in code.

* Remove debug

* RTCPSenderReportInfo -> RTCPSenderReportDataExt

* Get rid of sender report data pointer checks
2023-01-25 11:00:15 +05:30
Raja Subramanian b7263b7625 Do not use local time stamp when sending RTCP Sender Report (#1315)
* some additional logging

* Do not use local time stamp when sending RTCP Sender Report

As local time does not take into account the transmission delay
of publisher side sender report, using local time to calculate
offset is not accurate.

Calculate NTP time stamp based on difference in RTP time.
Notes in code about some shortcomings of this, but should
get better RTT numbers. I think RTT numbers were bloated because of
using local time stamp.
2023-01-19 23:32:50 +05:30
Raja Subramanian 7aad888e4e Normalize NTP time base when calculating RTT (#1297)
* Normalize NTP time base when calculating RTT

* seed last SR
2023-01-11 12:25:14 +05:30
Raja Subramanian 2b89c821ab An attempt to use publisher side RTCP sender report while forwarding (#1286)
* WIP commit

* comment

* clean up

* remove unused stuff

* cleaner comment

* remove unused stuff

* remove unused stuff

* more comments

* TrackSender method to handle RTCP sender report data

* fix test

* push rtcp sender report data to down tracks

* Need payload type for codec id mapping in relay protocol

* rename variable a  bit
2023-01-06 14:07:18 +05:30
Raja Subramanian c9cc45c8b0 Move log to debug as warn does not show anything bad (#1230) 2022-12-16 10:32:50 +05:30
Raja Subramanian 55718724a9 Check forwarder started when seeing. (#1191)
When switching from local -> remote or remote -> local,
the forwarder state is cached and restored after the switch
to ensure continuity in sequence number /time stamp.
But, if the forwarder had not started before the switch,
the sequence number always starts at 1 because of seeding.
So, do not see unless forwarder was started before the switch.
2022-11-26 01:05:29 +05:30
Raja Subramanian 4d480fc05b Avoid divide-by-zero (#1141) 2022-11-02 22:38:18 +05:30
Raja Subramanian 170d4b8629 Seed snapshots (#1128)
* Seed snapshots

- For one cycle after seeding, delta snap shot can get a huge gap
because of snapshot iitializing from start if not present. Not
a huge deal sa it should not affect functionality, but saving/restoring
(at least with down track) snap shot is a big deal. So just do it.
- Have been seeing a bunch of cases of delta stats getting a lot of
packets due to out-of-order (what seems like) receiver report. So,
save the receiver report and log it when out-of-order is detected
to understand if they are closely spaced or something else could be
happening.

* Remove comment that does not apply anymore

* log current time and RR
2022-10-28 08:53:21 +05:30
Raja Subramanian e43c72c91d Accept same highest sequence number as a puased track sends that (#1102) 2022-10-19 12:43:22 +05:30
Raja Subramanian f854201101 Warn on out-of-order RTCP RR. (#1101)
Have been seeing a few instances of "too many packets expected in delta"
when trying to generate RTCP SR on down track. Actual sequence numbers
indicate that start is after the end.

As down track RTPStats are driven by receiver report, wondering if we
are getting RTCP_RR out-of-order somehow causing this to happen.
Cannot find any other reason for this.

So, accepting RTCP_RR based update only if the sequence number is higher
than existing and also logging a warning with sequence numbers if they
look out-of-order.
2022-10-19 11:21:43 +05:30
Raja Subramanian 573850261a Cache RTPStats and seed on re-use (#1080)
* Cache RTPStats and seed on re-use

When a cached down track is re-used, RTPStats was not cached.
This caused sender reports getting out-of-sync with the remote side.
Cache RTPStats and seed it on re-use.

* staticcheck
2022-10-12 09:10:17 +05:30
Raja Subramanian 792349cc56 Split out mediatransportutil (#1071) 2022-10-06 23:55:59 +05:30
Raja Subramanian df189984f3 Add resyn on next packet to buffer.Bucket (#968) 2022-08-30 12:58:10 +05:30
Raja Subramanian 9d22225e92 A few misc changes (#915)
- Do not update jitter on padding only packet.
Padding only packet may not have proper timestamp.
If it does, it probably has the time stamp of the
last packet with payload. That will also affect
jitter calculation, i. e. wall clock time is moving,
but RTP time is the same.
- Do not send `onMaxLayer` changed on bind.
It was probably racing with update when max layer
is updated when adaptive stream is off. There is
no need to send that update as the default would
be OFF. It will be enabled when adaptive stream
subscription turns it on or when max layer is
set when down track bind happens and adaptive stream
is off.
2022-08-15 15:57:19 +05:30
Raja Subramanian dbcc53f04e Use media payload size in scoring. (#912)
* Use media payload size in scoring.

Subtract out header bytes when calculating score.
This does not seem to affect the score (under perfect conditions),
but, using header bytes will inflate the bit rate and
will affect scoring.

* Add header bytes to ToProto

* protocol pointer

* fix test
2022-08-14 13:22:58 +05:30
cnderrauber 997461a2b6 rtpstats add update last packet method (#858) 2022-07-29 15:29:36 +08:00
Raja Subramanian 4c7d3161a9 Record dynacast requirement of a subscriber synchronously. (#834)
With rapid changes to subscription settings, use of a goroutine
could end up processing dynacast needs for that subscriber in
a different order. So, record the susbcription needs of a subscriber
in the callback and process the data in a go routine.
2022-07-15 11:46:02 +05:30
Raja Subramanian 9032db857c Connection quality clean up (#766)
* WIP commit

* WIP commit

* Remove debug

* Revert to reduce diff

* Fix tests

* Determine spatial layer from track info quality if non-simulcast

* Adjust for invalid layer on no rid, previously that function was returning 0 for no rid case

* Fall back to top level width/height if there are no layers

* Use duration from RTPDeltaInfo
2022-06-18 21:58:47 +05:30
Raja Subramanian bbd5d3739f Use logger with context (#764) 2022-06-17 11:12:29 +05:30
Raja Subramanian 1665f51bd0 Stats of NACKs acked and number of repeated NACKs. (#664)
* Stats of NACKs acked and number of repeated NACKs.

Also making a change in delta stat to drop negative packet loss
counts to 0. Because of windowing it is a legitimate case.
The receiver could have seen a loss in window we are measuring
and in the subsequent window, the receiver could have gotten
a retransmission and reduced the packet loss count resulting
in a negative delta. When we report negative delta, it could
get dropped by analytics validator. That will be lost data.
Avoid that.

* Remove unused code

* Pick up latest protocol
2022-05-02 22:43:21 +05:30
Raja Subramanian d863b45dc1 Remove Head field from ExtPacket structure. (#662)
* Remove `Head` field from `ExtPacket` structure.

Although we do not intend to, but if packets get out-of-order
in the forwarding path (maybe reading in multiple goroutines
or using some worker pool to distribute packets), the `Head`
indicator could lead to wrong behaviour. It is possible that
at the receiver, the order is
- Seq Num N, Head = true
- N + 1, Head = true

If the forwarding path sees `N + 1` first, the Head flag
when it sees `N` packet is incorrect and will lead to incorrect
behaviour.

The alternative check is very simple. So, remove `Head` flag.

* Remove unused field
2022-05-02 10:16:17 +05:30
Raja Subramanian ea61b588a2 Simplifying SN info cache in RTPStats module (#660)
* Simplifying SN info cache in RTPStats module

* Remove unnecessary field
2022-04-28 18:48:59 +05:30
Raja Subramanian 2e182afb61 Reduce memory used by RTPStats. (#645)
Keep track of last 2048 sequence numbers instead of full range
to reduce memory usage.
2022-04-22 17:17:37 +05:30
Raja Subramanian ed2a0011d9 Lock to receiver report for senders (#616) 2022-04-17 08:43:50 +05:30
Raja Subramanian a98d955284 Delta stats throughout (#615)
* Use delta stats throughout and avoid calculating deltas in telemetry

* Fix a few things after testing

* Remove debug

* Fix tests

* delete instead of setting to nil

* Point to the latest protocol
2022-04-16 21:11:32 +05:30
Raja Subramanian 73ae58bb42 Reduce chatty logs (#592) 2022-04-06 06:30:26 +05:30
David Colburn 0b8a180554 Code inspection (#581)
* Code inspection

* fix [4]int64 conversiong
2022-03-30 13:49:53 -07:00
Raja Subramanian f293de054d Fix large reported loss in RTPStats (#564)
Had to check for half the range to see if start needed to be moved back.
2022-03-24 12:17:36 +05:30
Raja Subramanian ab7c63a08a Remove padding double counting (#562) 2022-03-24 06:36:17 +05:30
Raja Subramanian ed9234f71b Removing unused functions and adding more logs (#560)
* Removing unused functions and adding more logs

* Do not include padding packets in Packets
2022-03-23 22:26:34 +05:30
Raja Subramanian 06ea1d2ad3 Log rtp stats for debugging large gaps or all packets getting reported lost (#559) 2022-03-23 15:12:45 +05:30
Raja Subramanian 076eb1c8ae Dampen stream allocator (#551)
* WIP commit

* WIP commit

* WIP commit

* format

* NACK window

* Remove layer when it is expected to stop

* Remove debug
2022-03-22 22:23:22 +05:30
Raja Subramanian 641858832a Address edge case stream allocation (#544)
* Handle an edge in layer lock.

A very edge case
- Available layer: [0, 1, 2], but bitrate is not yet available.
We set it to layer 2 awaiting measurement.
- Measurement for layers 0 and 1 come through.
- Still no key frame for layer 2.
- Finalize layers runs and sees that bitrate is available for 0 and 1.
It finalizes layer 1.
- Layer 1 key frame comes (because we asked key frame of layer 2,
publisher sends key frame for all layers). Locks to layer 1.
- No more events happen to switch to layer 2.

Changes
-------
- Move bit rate measurement to StreamTrackerManager. Allows re-use
in relay.
- Make bit rate availability (from zero -> non-zero) an event
and let it flow through the stream allocation flow so that we
always have an event when bit rate measurement becomes available.
This gets rid of finalization which I was unhappy with it anyway as
it was polling every second.
- Removing REMB stuff from buffer. We do not use it.
It is incorrect anyway. REMB should be ay peer connection level.

* Fix test

* fix test

* Simplify allocate

* Simplify/clean up
2022-03-21 14:53:31 +05:30
Raja Subramanian 8c9c1fe837 Always do stats update and header extension processing (#540)
Also, use Errorw for tracking large gap to get a back trace.
2022-03-19 21:29:10 +05:30
Raja Subramanian ed00146937 Fix packets/packetRate mismatch (#534)
This still does not address root cause of large loss, but at least
does not display crazy thing like packets = 0, but packet rate is 45/s.

Also, RLock in ToString() as there are bits of structure used in
stringification.
2022-03-19 01:06:52 +05:30
Raja Subramanian 13083a143f More logs (#533) 2022-03-19 00:38:22 +05:30
Raja Subramanian 3ce4010e89 Fix bracket (#531) 2022-03-18 11:25:20 +05:30
Raja Subramanian 64f82a6a73 Fix off by one packets expected (#529) 2022-03-18 10:03:09 +05:30
Raja Subramanian 33f9726b79 Key frames (#522)
* Key frames

- Keep track of key frame stats
- Split out PLI from down track used for purpose of layer locking.
This will give us a good picture of down stream issues forcing a PLI.
- Use key frame requester whenever there is a layer lock required.
Not just the first key frame. With the synchronous thing, the counter
was just ridiculously high like 150 or something because of all
the initial padding packets. Also, use RTT in key frame requester.

* send first PLI before waiting

* Turn off key frame requester when disabled

* simplify
2022-03-16 19:55:12 +05:30
Raja Subramanian f3368a567b Use overridden packet loss (#519) 2022-03-16 11:36:54 +05:30
Raja Subramanian ae85e55fd4 Using RTPStats across the board (#515)
* WIP commit

* Clean up
2022-03-15 17:47:19 +05:30