To some degree, we can compare a DASH MPD to a novel. A novel usually has multiple chapters (DASH periods) with exciting parts (high quality segments) and rather boring parts (low quality segments). Sometimes, we even skip some chapters (seek). Or, if you are like me and it is late (high presentation time), you lose concentration (your buffer is full) and fall asleep while reading (the end of the append window is reached). I admit, this was not the best analogy, but I could not think of a better transition to the main topic of this post: append windows, presentation timestamps, IDR frames and their impact on the media buffer.
In this blog post, we will examine how the position of Instant Decoder Refresh (IDR) frames and earliest presentation time (EPT) of a video segment interact can with Media Source Extensions (MSE) attributes appendWindowStart, appendWindowEnd and timestampOffset. Some of the fundamentals of this blog post are covered in more detail in our previous blog post “Common pitfalls in MPEG-DASH streaming”. For now, we assume that the reader is familiar with the concept of adaptive streaming and the basic structure of a DASH Media Presentation Description (MPD).
The append window
MSE-based DASH clients like dash.js and Shaka player use the appendWindowStart and appendWindowEnd attributes of the SourceBuffer object to define an append range. Coded frames with presentation timestamps within this append range are allowed to be appended to the SourceBuffer, while coded frames outside this range are filtered out.
A DASH client typically aligns the MPD period boundaries with the append window. The start of the period will match appendWindowStart while the end of a period matches appendWindowEnd. That way, all media segments that are appended to the buffer are guaranteed to be included in a period. Frames that overlap the period boundaries are filtered out.
In the basic example depicted below, the presentation timestamps of the first media segment are out of the append window. Consequently, the segment is filtered out.
Now, we can ask ourself what happens if we only partially exclude a segment. For instance, see what happens if we only want to exclude the first part of the first segment:
The MSE provides us with a specific note for such use cases:
Some implementations may choose to collect a few of these coded frames with presentation timestamp less than appendWindowStart and use them to generate a splice at the first coded frame that has a presentation timestamp greater than or equal to appendWindowStart even if that frame is not a random access point. Supporting this requires multiple decoders or faster than real-time decoding, so for now, this behavior will not be a normative requirement.
The significant part is marked in bold. A video segment usually has an IDR frame right at the start. In some cases where a single segment might contain multiple IDR frames, it is more common to include only a single IDR frame in one segment. So what actually happens if we try this out?
The test setup
Test content
First, we generate some test content:
Name | Seg length | EPT first segment | GoP size | FPS |
2_sec_gop | 6 sec | 3.000/90.000 | 60 | 30 |
6_sec_gop | 6 sec | 3.000/90.000 | 180 | 30 |
Our test consists of two different configurations for the video segments. Both segment types have a length of six seconds and a framerate of 30 frames per second. We use closed GoPs for both segments. The main difference is the GoP size. The “2_sec_gop” segment has an IDR frame every two seconds. This results in three IDR frames per segment. On the other hand, the “6_sec_gop” segment only has a single IDR frame at the start.
Interestingly enough, the encoder/packager combination we used in this test sets the EPT value of the first segment to 3.000/90.000. For some reason, this was not signaled in the corresponding MPD (no @presentationTimeOffset). In order to account for that, we can set the SourceBuffer@timestampOffset attribute to -3.000/90.000. That way, we offset the EPT value in the segments and our buffer starts at 0.
Test application
In order to check the buffer, we use the buffered attribute of the video element:
video.buffered.start(0) + '-' + self.video.buffered.end(0);
The MSE implementation is straight forward and not included in this post. Important changes to MSE attributes are explained in the respective test cases.
Test platform
The following two tests were conducted on a Macbook Pro device with macOS Catalina 10.15.7 and the following browser versions:
- Google Chrome Version 87.0.4280.141
- Mozilla Firefox 84.0.2 (64-Bit)
- Safari Version 14.0
Test Case 1: No append window
First, we test the behavior without changing the append window. Consequently, the only attribute we need to adjust is the timestampOffset to account for the non-zero EPT value:
self.sourceBuffer.timestampOffset = -0.03333333333;
We don’t observe any surprises here. As expected, all browsers buffer the segment correctly and we are able to play the first six seconds of our content:
Name | Chrome buffer | Firefox buffer | Safari buffer |
2_sec_gop | 0 – 6 | 0 – 6 | 0 – 6 |
6_sec_gop | 0 – 6 | 0 – 6 | 0 – 6 |
Test Case 2: Adjusted append window
Now, it get’s interesting: we’ve reached the climax of our story ;). We adjust the append window to cut off the first second of our video segments. This means that we are excluding the first IDR frame of both segments from the buffer:
self.sourceBuffer.appendWindowStart = 1;
self.sourceBuffer.appendWindowEnd = 6;
self.sourceBuffer.timestampOffset = -0.03333333333;
This time, our results look different:
Name | Chrome buffer | Firefox buffer | Safari buffer |
2_sec_gop | 2 – 6 | 2 – 6 | empty |
6_sec_gop | empty | empty | empty |
Chrome and Firefox behave in a similar fashion. Both browsers remove all video frames up until the next IDR frame in the buffer. For the 2_sec_gop content, we end up with a buffer from 2 – 6 seconds. For the 6_sec_gop content, the buffer remains empty, all media samples are discarded. Safari is even more strict, the buffer remains empty for both segments.
Conclusion
What can we conclude from our tests?
First of all, not all browsers behave the same way. While Chrome and Firefox were able to recover from the loss of the first IDR frame, Safari did not keep any data in the buffer.
As a consequence, content authors should pay close attention when creating their media segments and authoring their MPDs. A negative eptDelta (EPT – MPD@presentationTimeOffset) can lead to scenarios in which a media segment is partially shifted out of the start of a period. Worst case scenario, this leads to a large gap in the video buffer. A DASH player needs to manually jump over this gap as automatic gap handling is not supported by the MSE. If the player does not implement gap handling playback will not start. For more information on gap handling check out our previous blog post.
If you have any additional question regarding our DASH activities or dash.js in particular, feel free to check out our website.