With push model (i. e. connection quality evaluation triggered
by reception of RTCP receiver report), it is possible that a report
is received quickly after a track is started (especially with video).
Those should not trigger a quality evaluation.
Set `lastStatsAt` in `Start` routine and ensure that start has been
called and enough time has passed since last stats time to avoid
small windows.
* Expected vs actual Layer based connection quality.
With VBR streams (like screen share), bit rate is not a good indicator
of whether desired layer (spatial/temporal) is achieved due to high
variance.
Using expected vs actual layer (i. e. distance to desired) can capture
any short fall and include it in quality scoring.
This PR uses distance to desired, i. e. how many steps it would take to
go from actual spatial/temporal -> desired spatial/temporal and that
distance is propotionally used (currently it is just linear) to decrease
score.
* wire up layer transitions for screen share tracks
* Use EWMA (Exponentially Weighted Moving Average) for score updates.
Makes code simpler, but makes it harder to test as the inflection points
are not exact.
Score falls a bit slower to be conservative on dropping quality too
quickly. Still fall factor is higher (i. e. newer scores get more
weight) than rise factor (i. e. newer scores get lower weight).
Slower rise factor to introduce hysteresis on things climibing back too
quickly.
In the extreme case, asympttotic conditions could cause unexpected
results. For example, having 4% loss of video continously will never
drop quality to `POOR`. It will get close to 60, but it will always
stay above 60 forever and hence quality will never drop to POOR.
Maybe, need some sort of variable thresholding to deal with that. But,
that is an extreme case and may not happen in real life.
* remove unused stuff
* Push/pull for connection stats/quality scoring.
Was not happy with pure pull method missing a window because
of RTCP RR timing is slightly off for audio and using a much
larger window of data in the next update.
That also resulted in RTP stats getting some bits of code.
As that is per-packet processing, was not a good idea.
Switching to push-pull method.
For up track, it is pull, i. e. connection stats worker will pull stats.
For down track, there is a new notification about receiver report
reception. Using this to check for time to run stats. And adding a bit
of tolerance for processing window (currently set so that as long as it
is > 95% of usual processing interval). This allows two things
- for video, RTCP RR are more frequent, but we will still not process
till enough time has passed
- for audio, RTCP RR could be once in 5 seconds or so. Can process when
it is available rather than miss a window and use a much larger window
later.
* uber atomic
* Connectino quality misc changes
1. Call scorer.Update() with nil stat when no data available so that
scorer can synthesise window with proper window time.
2. Substract out loss in interval to account for packets not sent at
all.
3. Fix `packetsNotFound` variable in `getIntervalStats`. I remember this
working at some point. Not sure if I fat fingered in another PR and
deleted the increment line.
4. Logging a bit more when no packets expected. Those can get noisy
especially when track is muted. But, seeing some unexplained
instances of no packets leading to quality drop. So, temporary logging
to get a bit more information.
* correct spelling
* Limit packet score minimum to 0.0
* Make connection quality not too optimistic.
With score normalization, the quality indicator showed good
under conditions which should have normally showed some badness.
So, a few things in this PR
- Do not normalize scores
- Pick the weakest link as the representative score (moving away from
averaging)
- For down track direction, when reporting delta stats, take the number
of packets sent actually. If there are holes in the feed (upstream
packet loss), down tracks should not be penalised for that loss.
State of things in connection quality feature
- Audio uses rtcscore-go (with a change to accommodate RED codec). This
follows the E-model.
- Camera uses rtcscore-go. No change here. NOTE: THe rtscore here is
purely based on bits per pixel per frame (bpf). This has the following
existing issues (no change, these were already there)
o Does not take packet loss, jitter, rtt into account
o Expected frame rate is not available. So, measured frame rate is
used as expected frame rate also. If expected frame rate were available,
the score could be reduced for lower frame rates.
- Screen share tracks: No change. This uses the very old simple loss
based thresholding for scoring. As the bit rate varies a lot based on
content and rtcscore video algorithm used for camera relies on
bits per pixel per frame, this could produce a very low value
(large width/height encoded in a small number of bits because of static content)
and hence a low score. So, the old loss based thresholding is used.
* clean up
* update rtcscore pointer
* fix tests
* log lines reformat
* WIP commit
* WIP commit
* update mute of receiver
* WIP commit
* WIP commit
* start adding tests
* take min score if quality matches
* start adding bytes based scoring
* clean up
* more clean up
* Use Fuse
* log quality drop
* clean up debug log
* - Use number of windows for wait to make things simpler
- track no layer expected case
- always update transition
- always call updateScore
* WIP commit
* Connection quality changes
- Fix Firefox showing poor quality
o The issue was that we were using max available layer and
calculating quality. The rationale being that even if
server sends dynacast messages, client may not implement
dynacast and still stream all layers. But, with Firefox
(maybe a Firefox bug), it sends some small amount of
data on layer 2 even when that layer is disabled.
Guessing it is probing (or actually we might be using
some small value for high layers as Firefox cannot turn off
layers). That higher layer gets used in quality calculation.
As the bit rate on that layer is extremely low, it yields low
score.
Fixed by considering the max expected layer. That is of most
interest. Yes, clients may ignore dynacast and stream all layers,
but, max expected is the one of interest. So, look for
quality in the max expected layer and not max available layer.
- Lots of clean up around connection quality stuff
o Use a dynamic scaling thing to ensure that we do not get bitten
by absolute values. Calculate best possible scenario score and
map that to maximum MOS score. This will ensure that different
codecs, different settings do not mess up the scoring. For example,
a client might use 1 Mbps for 720p, but a different client could
use 2 Mbps for 720p. As an SFU/infrastructure middlebox, we do
not have control over quality at those rates. We can only ensure
that streaming happens smoothly at those rates. So, in that
example, for client 1, 1 Mbps will map to MOS 5.0 and for client 2,
2 Mbps will map to MOS 5.0. Any impairments after that will
reflect in the score.
o Penalise for missing target layer by one level for one layer missed.
o Move tests to connection quality directory. The participant test
was not super useful.
* Add missed file
* Remove debug code
* use more constants and initialise normalisation factor
* rtcscore pointer
* Use media payload size in scoring.
Subtract out header bytes when calculating score.
This does not seem to affect the score (under perfect conditions),
but, using header bytes will inflate the bit rate and
will affect scoring.
* Add header bytes to ToProto
* protocol pointer
* fix test
With a small window, the quality is volatile even on small disturbances.
For example losing 2 audio packets in a 2 second window could
drop the quality metric.
* Set DtxDisabled from TrackInfo in score calculation.
Also, fix sending connection quality upate on a new subscription.
* comments tweaks
* Move TrackInfo into StreamTrackerManager as this is used by cloud as well
* WIP commit
* WIP commit
* Remove debug
* Revert to reduce diff
* Fix tests
* Determine spatial layer from track info quality if non-simulcast
* Adjust for invalid layer on no rid, previously that function was returning 0 for no rid case
* Fall back to top level width/height if there are no layers
* Use duration from RTPDeltaInfo
* Use rtcscore-go to calculate audio/video score
Signed-off-by: shishir gowda <shishir@livekit.io>
* Get max expected layer and find max actual layer from stream
Signed-off-by: shishir gowda <shishir@livekit.io>
* Cleanup unused methods
Signed-off-by: shishir gowda <shishir@livekit.io>
* Cleanup code - address review comments
Signed-off-by: shishir gowda <shishir@livekit.io>
* get expected layer info instead of just quality
Signed-off-by: shishir gowda <shishir@livekit.io>
* Move SpatialLayerForQuality to utils/helpers
method is required in rtc,sfu and connectionstats pkg
Moved to utils/helpers.go to remove cyclic deps
Signed-off-by: shishir gowda <shishir@livekit.io>
* update tests
Signed-off-by: shishir gowda <shishir@livekit.io>
* Pick stream stats with max layer
Signed-off-by: shishir gowda <shishir@livekit.io>
* Update rtcscore-go pkg to make rtt/jitter optional
when passing 0, rtcscore-go was setting default values
Signed-off-by: shishir gowda <shishir@livekit.io>
* update score to rating
Signed-off-by: shishir gowda <shishir@livekit.io>
* Update rtcscore-go pkg to use simulcast layer info for score
Signed-off-by: shishir gowda <shishir@livekit.io>
* Update score ratings to reflect rtcscore range
Signed-off-by: shishir gowda <shishir@livekit.io>
* update test params for new rtcscore
Signed-off-by: shishir gowda <shishir@livekit.io>
* Delay sending scores to connections only till full data is available
first interval can have partial data leading to lower scores
Signed-off-by: shishir gowda <shishir@livekit.io>
* Check for inf values in quality params
Signed-off-by: shishir gowda <shishir@livekit.io>
* Clean up initial score calculation. Default to 5
Signed-off-by: shishir gowda <shishir@livekit.io>
Co-authored-by: David Zhao <dz@livekit.io>
* Improve frequency of stats update
Prometheus stats are updated as the data becomes available, instead of
aggregated along with telemetry batches. Node availability decisions can
now react much faster to these stats.
* use the same intervals for connection quality updates
* Use delta stats throughout and avoid calculating deltas in telemetry
* Fix a few things after testing
* Remove debug
* Fix tests
* delete instead of setting to nil
* Point to the latest protocol
* Add a resync API to sfu.DownTrack
Also passing in logger with context into sfu package. More to do here
with proper logging context in all modules, but this is a start
* Remove debug code
* fix tests
* audio connection quality mos for publisher stats
Signed-off-by: shishir gowda <shishir@livekit.io>
* Update tests
Signed-off-by: shishir gowda <shishir@livekit.io>
* Change ratings range, increase default rtt to 80
Signed-off-by: shishir gowda <shishir@livekit.io>
* Use stats worker to get total packets to find %lost in window
Signed-off-by: shishir gowda <shishir@livekit.io>
* Update go dep
Signed-off-by: shishir gowda <shishir@livekit.io>
* Increase interval of score cal to 5 seconds
Signed-off-by: shishir gowda <shishir@livekit.io>
* use lastSequenceNumber in reports to find total packets
Signed-off-by: shishir gowda <shishir@livekit.io>
* Account for delay while calculating scores
Signed-off-by: shishir gowda <shishir@livekit.io>
* Fix minor typo
Signed-off-by: shishir gowda <shishir@livekit.io>
* Add connection stats/score to subscribed audio tracks
Signed-off-by: shishir gowda <shishir@livekit.io>
* Cleanup
Signed-off-by: shishir gowda <shishir@livekit.io>
* Ignore duplicate LastSequenceNumbers in rtcp reports
Ignore if sequence number is less than what was recieved
Signed-off-by: shishir gowda <shishir@livekit.io>
* Move video track score calc to media/downtracks
Signed-off-by: shishir gowda <shishir@livekit.io>
* Deprecate SubscribeLossPercentage() as score calc is now handled downstream
Signed-off-by: shishir gowda <shishir@livekit.io>
* Initialize connection score to excellent
score is calc at 5sec interval. Client fetches score before first
score is computed
* Update test cases for connection quality
Signed-off-by: shishir gowda <shishir@livekit.io>