Being trapped in a gap with Big Buck Bunny

Introduction

When I thought about how to introduce this blog post, the first thing that popped to my mind was to start with a warning. Something like “this blog post will be technical and focus on details that are only be relevant for player developers”. While the latter is probably still true, I am not sure anymore if I can call it “technical” after watching the Demuxed presentation by Matt Szatmary (Mux).

In any case, just a few days ago I stumbled upon a GitHub issue that was raised in 2016 by David Evans, a long-time contributor to the dash.js project. The issue deals with a limitation of the Media Source Extensions (MSE) that probably every media player developer has at least heard of: “Playing through unbuffered ranges and allowing the application to provide a buffered gap tolerance”. To the few readers that made it until this part of the blog post and have not heard about this limitation: First, thank you for not closing this tab and second, you are in for a real treat. Let’s look at a concrete example of what we mean by “unbuffered ranges” and “gaps”.

Gaps in the media buffer

When using the MSE we need to create SourceBuffers to which we append our media segments. The SourceBuffer object supports two append modes namely “segments” and “sequence”. Typically, the “segments” mode is used, and the media segments are placed in the media buffer according to their presentation timestamps. While this allows us to append the media segments in an arbitrary order, it can also cause non-alignments as illustrated below.

In this example, segment 1 and segment 2 are perfectly aligned, but there is a gap between segment 2 and segment 3. There are various reasons for these kinds of gaps, such as media segments being shifted out of a DASH period (negative @eptDelta), wrong presentation timestamps in the media segments or the total sample duration of a media segment not matching its duration. In “MSE: A story of append windows, presentation timestamps and video buffers” we examined such situations in more detail.

Regardless of the reason for the gap, native MSE implementation in all major browsers (Chrome, Firefox, Edge, Safari) stall at such gaps until they are filled with data. The problem is that in most cases there is no data to fill the gaps as they are usually not created intentionally. Due to that reason, media players such as dash.js and Shaka player implement a logic to detect and handle such gaps to continue the media playback.

Native gap handling of the MSE

Coming back to our original Github issue, the suggestion that Dave made was to leave the handling of the gaps up to the browser and expose an API to control the behavior. Unfortunately, this is still an issue today and there is no native solution for the problem. Matt Wolenetz (previously Google, W3C Invited Expert) pointed out that the issue was discussed again at the FOMS 2023. The second part of Matt`s comment is where it gets interesting:

Meanwhile, there is an ugly hack that might help coalesce unintended gaps in Chromium SourceBuffers: the coalescence heuristic currently used there is based on the largest frame duration buffered so far in that particular track in that particular SourceBuffer. Auto-coalescence of gaps can thus be achieved by manipulation of the inputs to this heuristic:

  1. For each track, first append a simple keyframe with huge duration. Ensure it is buffered. Then remove it from the SourceBuffer (or overlap it in your first appends to that track).
  2. This should trick the heuristic.

A native solution for Chrome and other browsers?

Now we are curious to try this workaround. Not only to verify that it is really working in Chrome, but there is also the slight hope that it might work in other browsers such as Firefox and Safari as well.

A simple MSE player

The first thing we need is a simple standalone MSE player. We would rather not implement this “hack” in dash.js or Shaka. It simply creates lots of overhead and requires us to change several classes. Our simple MSE player looks like this and supports the basic append and remove operations (Note I am not pasting the whole source code here but just the important parts).

App.prototype.createMediaSource = function () {
    var self = this;

    self.mediaSource = new MediaSource();
    self.video.src = URL.createObjectURL(self.mediaSource);
    self.mediaSource.addEventListener('sourceopen',
self.onMediaSourceOpen.bind(self));

};

App.prototype.onMediaSourceOpen = function () {
    var self = this;

    self.mediaSource.duration = (NUM_SEGMENTS - 1) * 2;
    self.sourceBuffer = self.mediaSource.addSourceBuffer('video/mp4; codecs="avc1.4d4028"');
    self.sourceBuffer.addEventListener('updateend', function () {
        self.printBufferRanges();
        if (ENABLE_GAP_WORKAROUND && self.currentSegment === 2 && !self.hasRemoved) {
            self.remove(0, 2000);
            self.hasRemoved = true;
            return;
        } 

     self.append();
});

App.prototype.append = function () {
    var self = this;

    if (self.currentSegment >= content.length - 1) {
        self.finished = true;
    } else {
        self.fetchSegment(content[self.currentSegment], function (arrayBuffer) {
            try {
                self.sourceBuffer.appendBuffer(arrayBuffer);
                self.currentSegment += 1;
            } catch (e) {
                console.log(e);
            }
        });
    }
};

App.prototype.remove = function (start, end) {
    var self = this;
    self.debug('Removing buffer from ' + start + ' to ' + end)
    self.sourceBuffer.remove(start, end);
}

Test Content

Next, we need to create test content that we can feed into our MSE player. Based on Matt’s comment above, we require two things: First, a media segment with a single IDR frame (IDR-segment) and a “huge duration”. Second, our “default” media content with a gap in the media buffer.

For that reason, we create a media segment with a single IDR frame and a duration of four seconds. That will be our “simple keyframe with a huge duration”. And to keep us motivated throughout the entire process we use our favorite bunny (the name is avoided here to not cause any mental issues with my fellow developer colleagues) as an input: There is no better feeling than replacing the bunny with something different.

The “default” content that we use to replace the IDR-segment with is encoded and packaged with

  • 2 second segments
  • 25 frames per second
  • GoP size of 50 frames

We only need a few seconds to test if the MSE implementation is jumping natively over the gaps, a total duration of 14 seconds is sufficient. If we append all our segments to the media buffer, it looks like this:

Creating gaps in the media buffer

Now that we have our media player and our test content ready, we need to think about a way to create gaps in the media buffer. (Un)fortunately that is very straightforward: We simply do not append segment 2 and segment 6. That way we create a gap between playback position 2 – 4 and 10 – 12.

Test results

Now we have everything in place to run some tests.

No gap workaround

First let’s check what happens if we append our default content with the two gaps illustrated above to the SourceBuffer. We expect our simple MSE player to stall right at the end of segment number 1:

BrowserVersionResults
Chrome118.0.5993.117Range 0 starts at 0 to 2
Range 1 starts at 4 to 6
Waiting for more data at 1.952094
Firefox119.0Range 0 starts at 0 to 2
Range 1 starts at 4 to 6
Waiting for more data at 1.979588
Safari 16.6 (18615.3.12.11.2)Range 0 starts at 0 to 2
Range 1 starts at 4 to 6
Waiting for more data at 2.258507333

As expected, Chrome and Firefox stall shortly before reaching the two-second mark. Interestingly, Safari claims to have a buffered range from 0 to 2 but plays until 2.25 seconds. I have no clue where it gets the additional 0.25 seconds from.

Active gap workaround

Now that we have confirmed that all browsers stall when encountering a gap, we can add our IDR frame workaround to the test:

BrowserVersionResults
Chrome118.0.5993.117Range 0 starts at 0 to 4
Removing buffer from 0 to 2000
No valid ranges
Range 0 starts at 0 to 2
Range 0 starts at 0 to 6
Firefox119.0Range 0 starts at 0 to 4
Removing buffer from 0 to 2000
No valid ranges
Range 0 starts at 0 to 2
Range 1 starts at 4 to 6
Safari 16.6 (18615.3.12.11.2)Range 0 starts at 0 to 4
Removing buffer from 0 to 2000
No valid ranges
Range 0 starts at 0 to 2
Range 1 starts at 4 to 6

Looking at the logs we can see that the media buffer is filled for a range from 0 to 4 by appending the IDR-segment. The media buffer is emptied again by removing all the data (results in “No valid ranges“). Next we append our default content and can identify a gap between 2 – 4 seconds in Firefox and Safari. The Chrome browser has a single range object with not gap between 2 – 4 seconds. This looks promising! Now let’s check what happens if we start the actual playback:

Ups, this is not what we expected. For some reason, we can still see our least favorite bunny for the first four seconds. How is this possible? We removed it from the buffer and the logs clearly stated the there was no data in the buffer before appending the default content. Moreover, even if we don’t remove the IDR segment before appending our default content it should still be replaced as we are adding data to the same buffer position (0-2 seconds).

IDR-segment only

Let’s take a step back. What happens if we only append the IDR segment and remove it from the buffer? There should be no data in the buffer and no playback possible, right?

BrowserVersionResults
Chrome118.0.5993.117Range 0 starts at 0 to 4
Removing buffer from 0 to 2000
No valid ranges
Current Time:0.255889
Current Time:0.519626


Waiting for more data at 3.980187
Firefox119.0No valid ranges
Range 0 starts at 0 to 4
Removing buffer from 0 to 2000
No valid ranges
Current Time:0
Waiting for more data at 0
Safari 16.6 (18615.3.12.11.2)No valid ranges
Range 0 starts at 0 to 4
Removing buffer from 0 to 2000
No valid ranges
Waiting for more data at 0

Well, what can I say? He is still here!

Although there is no valid range in the Chrome SourceBuffer the playback is still progressing for four seconds (duration of the IDR frame segment). I don’t have a reasonable explanation for this, it is probably a Chrome bug. But what does that mean for our gap workaround? Can we still get this to work?

Adjusting the gap workaround

Turns out we can still make the gap workaround work. The trick is to do a seek:

BrowserVersionResults
Chrome118.0.5993.117Range 0 starts at 0 to 4
Removing buffer from 0 to 2000
No valid ranges
Range 0 starts at 0 to 2
Seeking by 0.0001
Range 0 starts at 0 to 6
Waiting for more data at 17.92605
Firefox119.0Range 0 starts at 0 to 4
Removing buffer from 0 to 2000
No valid ranges
Range 0 starts at 0 to 2
Range 1 starts at 4 to 6
Seeking by 0.0001
Waiting for more data at 1.967075
Safari 16.6 (18615.3.12.11.2)Range 0 starts at 0 to 4
Removing buffer from 0 to 2000
No valid ranges
Range 0 starts at 0 to 2
Range 1 starts at 4 to 6
Seeking by 0.0001
Waiting for more data at 2.260509

After appending our first segment of the default content we do a minimal seek. This seems to force some kind of reset of the rendering pipeline. Now Chrome is really jumping over the gaps natively and showing us the right content:

Note that Chrome is playing but not seeking over the gap. This means that we see a still frame for two seconds while the playback still progresses. It would also be possible to simply seek over the gaps in the media buffer. However, this would lead to problems with live content as we would move closer and closer to the live edge with each seek.

Unfortunately, the gap “hack does not work for Firefox and Safari, they still stall around two seconds.

Conclusion

Native MSE implementations in all major browsers (Chrome, Firefox, Edge, Safari) stall if there are gaps in the media buffer. For that reason, media players such as dash.js and Shaka player typically implement their own logic to detect and handle gaps and continue the media playback.

Based on a GitHub comment by Matt Wolenetz (previously Google, W3C Invited Expert) we examined a solution to enable native MSE gap handling. The idea was to append a media segment with a single IDR frame and a large sample duration to the media buffer before adding the default content. We found that a programmatic seek is required to remove the content of the IDR-segment from the media buffer and make the gap fix work. While this solution works in Chrome, it is not an option for Firefox and Safari as the playback still stalls in these browsers. Consequently, it is not a promising option to add this workaround to real-world players unless the playback is limited to a Chromium-based engine.

If you have any question regarding our DASH activities or dash.js in particular, feel free to check out our website and contact us.

Leave a Reply

Your email address will not be published. Required fields are marked *