Networking/Archive/DASH/Implementation
Work in progress
This design page for DASH implementation in Gecko is focused on the Networking/Necko code to be implemented.
Current Design and Behavior of non-adaptive streams
With current native video, Necko buffers the data until it is safe to start playing. nsMediaChannelStream downloads data via HTTP and puts that data in an nsMediaCache. nsMediaCache in turn makes this data available to decoder threads via read() and seek() APIs. The decoders then read the data and enqueue it in A/V queues.
Chris Pearce has a blog post [here] that describes the current architecture.
Figure 1 Current Gecko Video Architecture (taken from http://pearce.org.nz/uploaded_images/video-architecture.svg with additions for Necko code).
High Level Approaches - Segment Request and Delivery to MediaStream/MediaCoder
Two initial ideas were suggested by Rob O'Callahan:
- One nsMediaDASHStream class is created which manages the monitoring of local capabilities/load and adapts download by switching streams as necessary. Then one nsMediaCache provides data to a single nsWebMDecoder.
Diagram needed
Description needed
- One nsMediaDASHStream as before, but with multiple nsWebMDecoders, one for each encoded stream available on the server. Only one nsWebMDecoder would be used at a time.
Diagram needed
Description needed
Question: Are these approaches possible? Can a single MediaCoder handle multiple streams? Can a single MediaStream handle changes in bitstream coming from DASH?
- VP8/Video
- Chris Pearce: "For VP8 basically yes. In bug 626979 (and a follow-up fix in bug 661456) we implemented support WebM's track metadata DisplayWidth/DisplayHeight elements. We scale whatever contained video frames we encounter to DisplayWidth x DisplayHeight pixels, so you can change the dimensions of video frames at will while encoding a single track."
- Tim Terriberry: "Resolution is not the same thing as bitrate, but in general yes, you can change both in VP8 without re-initializing the decoder. The one caveat is that if you do want to switch resolution, the first frame has to be a keyframe. You should also start with a keyframe when changing between streams encoded at different bitrates, or you'll get artifacts caused by prediction mis-matches."
Question: Does the DASH Media Segment definition or media encoding process require that each new segment start with a keyframe?
- Vorbis/Audio
- From Chris, Tim; paraphrased: Vorbis is more complicated because of the way it is encoded. It uses two different block sizes, 'long' and 'short'; the decision of which to use depends on what the encoder decides is best. So, if we had two streams encoded at different rates, there is no guarantee that they blocks would line up. This is important because in order to decode the first half of a block, you need the last half of the previous block - different block sizes from disparate streams won't enable this. If we were to change stream, either a pause in audio would happen (due to the decoder being flushed), or we'd have to use some kind of extrapolation (LPC extrapolation).
- To avoid this, we could require that each segment include extra packets to allow correct decoding. Or, we can just support no audio adaptation to start with (similar to Apple HLS). Rob O'Callahan is working on a Media Streams API which includes cross-fading - this may be a longer term solution, after starting with non-adaptive audio.
MPD
Diagram needed
Description needed
MPD Classes and Objects
Diagram needed
Description needed
MPD Parsing and Behavior
Diagram needed
Description needed
'Question: What XML parsing capabilities do we have in Gecko?'
Capability/Load Monitoring
How will this be provided?
Media Segments
Question: Does the DASH Media Segments spec require that video encodings start with a keyframe?
Diagram needed
Description needed
Media Segment Classes and Objects
Diagram needed
Description needed