Revision as of 19:07, 8 April 2011

Streams, RTC, audio API and media controllers

Scenarios

These are higher-level than use-cases.

1) Play video with processing effect applied to the audio track

2) Play video with processing effects mixing in out-of-band audio tracks (in sync)

3) Capture microphone input and stream it out to a peer with a processing effect applied to the audio

4) Capture microphone input and visualize it as it is being streamed out to a peer and recorded

5) Capture microphone input, visualize it, mix in another audio track and stream the result to a peer and record

6) Receive audio streams from peers, mix them with spatialization effects, and play

7) Seamlessly chain from the end of one input stream to another

8) Seamlessly switch from one input stream to another, e.g. to implement adaptive streaming

9) Synthesize samples from JS data

10) Trigger a sound sample to be played through the effects graph ASAP but without causing any blocking

11) Synchronized MIDI + Audio capture

12) Synchronized MIDI + Audio playback (Would that just work if streams could contain MIDI data?)

13) Capture video from a camera and analyze it (e.g. face recognition)

14) Capture video, record it to a file and upload the file (e.g. Youtube)

Straw-man Proposal

Streams

The semantics of a stream:

A window of timecoded video and audio data.
The timecodes are in the stream's own internal timeline. The internal timeline can have any base offset but always advances at the same rate as real time, if it's advancing at all.
Not seekable, resettable etc. The window moves forward automatically in real time (or close to it).
A stream can be "blocked". While it's blocked, its timeline and data window does not advance.

Blocked state should be reflected in a new readyState value "BLOCKED". We should have a callback when the stream blocks and unblocks, too.

We do not allow streams to have independent timelines (e.g. no adjustable playback rate or seeking within an arbitrary Stream), because that leads to a single Stream being consumed at multiple different offsets at the same time, which requires either unbounded buffering or multiple internal decoders and streams for a single Stream. It seems simpler and more predictable in performance to require authors to create multiple streams (if necessary) and change the playback rate in the original stream sources.

Streams can end. The end state is reflected in the Stream readyState. A stream can never resume after it has ended.

Hard case:

Mix http://slow with http://fast, and mix http://fast with http://fast2; does the http://fast stream have to provide data at two different offsets?
Solution: if a (non-live) stream feeds into a blocking mixer, then it itself gets blocked. This has the same effect as the entire graph of (non-live) connected streams blocking as a unit.

Media elements

interface HTMLMediaElement {
 // Returns new stream of "what the element is playing" ---
 // whatever the element is currently playing, after its
 // volume and playbackrate are taken into account.
 // While the element is not playing (e.g. because it's paused
 // or buffering), the stream is blocked. This stream never
 // ends; if the element ends playback, the stream just blocks
 // and can resume if the element starts playing again.
 // When something else causes this stream to be blocked,
 // we block the output of the media element.
 Stream getStream();

 // Like getStream(), but also sets the streamaudio attribute.
 Stream captureStream();

 // When set, do not produce direct audio output. Audio output
 // is still produced when getStream() or captureStream() is called.
 attribute boolean streamaudio;

 // Can be set to a Stream. Blocked streams play silence and show the last video frame.
 attribute any src;
};

Stream extensions

Streams can have attributes that transform their output:

interface Stream {
  attribute double volume;

  // When set, destinations treat the stream as not blocking. While the stream is
  // blocked, its data are replaced with silence.
  attribute boolean live;

  // Time on its own timeline
  readonly double currentTime;

  // Create a new StreamProcessor with this Stream as the input.
  StreamProcessor createProcessor();
  // Create a new StreamProcessor with this Stream as the input,
  // initializing worker.
  StreamProcessor createProcessor(Worker worker);
};

Stream mixing and processing

[Constructor]
interface StreamProcessor : Stream {
 readonly attribute Stream[] inputs;
 void addStream(Stream input);
 void setInputParams(Stream input, any params);
 void removeStream(Stream input);

 // Causes this stream to enter the ended state.
 // No more worker callbacks will be issued.
 void end(double delay);

 attribute Worker worker;
};

This object combines multiple streams with synchronization to create a new stream. While any input stream is blocked and not live, the StreamProcessor is blocked. While the StreamProcessor is blocked, all its input streams are forced to be blocked. (Note that this can cause other StreamProcessors using the same input stream(s) to block, etc.)

The offset from the timeline of an input to the timeline of the StreamProcessor is set automatically when the stream is added to the StreamProcessor.

While 'worker' is null, the output is produced simply by adding the streams together. Video frames are composited with the last-added stream on top, everything letterboxed to the size of the last-added stream that has video. While there is no input stream, the StreamProcessor produces silence and no video.

While 'worker' is non-null, the results of mixing (or the default silence) are fed into the worker by dispatching onstream callbacks. Each onstream callback takes a StreamEvent as a parameter. A StreamEvent provides audio sample buffers and a list of video frames for each input stream; the event callback can write audio output buffers and a list of output video frames. If the callback does not output audio, default audio output is automatically generated as above; ditto for video. Each StreamEvent contains the inputParams for each input stream contributing to the StreamEvent.

An ended stream is treated as producing silence and no video. (Alternative: automatically remove the stream as an input. But this might confuse scripts.)

// XXX need to figure out the actual StreamEvent API: channel formats, etc.

Graph cycles

If a cycle is formed in the graph, the streams involved block until the cycle is removed.

Dynamic graph changes

Dynamic graph changes performed by a script take effect atomically after the script has run to completion. Effectively we post a task to the HTML event loop that makes all the pending changes. The exact timing is up to the implementation but the implementation should try to minimize the latency of changes.

Examples

1) Play video with processing effect applied to the audio track

<video src="foo.webm" id="v" controls streamaudio></video>
<audio id="out" autoplay></audio>
<script>
 document.getElementById("out").src =
   document.getElementById("v").getStream().createProcessor(new Worker("effect.js"));
</script>

2) Play video with processing effects mixing in out-of-band audio tracks (in sync)

<video src="foo.webm" id="v" streamaudio></video>
<audio src="back.webm" id="back"></audio>
<audio id="out" autoplay></audio>
<script>
 var mixer = document.getElementById("v").getStream().createProcessor(new Worker("audio-ducking.js"));
 mixer.addStream(document.getElementById("back").getStream());
 document.getElementById("out").src = mixer;
 function startPlaying() {
   document.getElementById("v").play();
   document.getElementById("back").play();
 }
 // We probably need additional API to more conveniently tie together
 // the controls for multiple media elements.
</script>

3) Capture microphone input and stream it out to a peer with a processing effect applied to the audio

<script>
 navigator.getUserMedia('audio', gotAudio);
 function gotAudio(stream) {
   peerConnection.addStream(stream.createProcessor(new Worker("effect.js")));
 }
</script>

4) Capture microphone input and visualize it as it is being streamed out to a peer and recorded

<canvas id="c"></canvas>
<script>
 navigator.getUserMedia('audio', gotAudio);
 var streamRecorder;
 function gotAudio(stream) {
   var worker = new Worker("visualizer.js");
   var processed = stream.createProcessor(worker);
   worker.onmessage = function(event) {
     drawSpectrumToCanvas(event.data, document.getElementById("c"));
   }
   streamRecorder = processed.record();
   peerConnection.addStream(processed);
 }
</script>

5) Capture microphone input, visualize it, mix in another audio track and stream the result to a peer and record

<canvas id="c"></canvas>
<mediaresource src="back.webm" id="back"></mediaresource>
<script>
 navigator.getUserMedia('audio', gotAudio);
 var streamRecorder;
 function gotAudio(stream) {
   var worker = new Worker("visualizer.js");
   var processed = stream.createProcessor(worker);
   worker.onmessage = function(event) {
     drawSpectrumToCanvas(event.data, document.getElementById("c"));
   }
   var mixer = processed.createProcessor();
   mixer.addStream(document.getElementById("back").startStream());
   streamRecorder = mixer.record();
   peerConnection.addStream(mixer);
 }
</script>

6) Receive audio streams from peers, mix them with spatialization effects, and play

<audio id="out" autoplay></audio>
<script>
 var worker = new Worker("spatializer.js");
 var spatialized = stream.createProcessor(worker);
 peerConnection.onaddstream = function (event) {
   spatialized.addStream(event.stream);
   spatialized.setInputParams(event.stream, {x:..., y:..., z:...});
 };
 document.getElementById("out").src = spatialized;   
</script>

7) Seamlessly chain from the end of one input stream to another

<mediaresource src="in1.webm" id="in1" preload></mediaresource>
<mediaresource src="in2.webm" id="in2"></mediaresource>
<audio id="out" autoplay></audio>
<script>
 var in1 = document.getElementById("in1");
 in1.onloadeddata = function() {
   var mixer = in1.startStream().createProcessor();
   var in2 = document.getElementById("in2");
   in2.delay = in1.duration;
   mixer.addStream(in2.startStream());
   document.getElementById("out").src = mixer;
 }
</script>

8) Seamlessly switch from one input stream to another, e.g. to implement adaptive streaming

<mediaresource src="in1.webm" id="in1" preload></mediaresource>
<mediaresource src="in2.webm" id="in2"></mediaresource>
<audio id="out" autoplay></audio>
<script>
 var stream1 = document.getElementById("in1").startStream();
 var mixer = stream1.createProcessor();
 document.getElementById("out").src = mixer;
 function switchStreams() {
   var in2 = document.getElementById("in2");
   in2.currentTime = stream1.currentTime;
   var stream2 = in2.startStream();
   stream2.volume = 0;
   stream2.live = true; // don't block while this stream is playing
   mixer.addStream(stream2);
   stream2.onplaying = function() {
     if (mixer.inputs[0] == stream1) {
       stream2.volume = 1.0;
       stream2.live = false; // allow output to block while this stream is playing
       mixer.removeStream(stream1);
     }
   }
 }
</script>

9) Synthesize samples from JS data

<audio id="out" autoplay></audio>
<script>
 document.getElementById("out").src =
   new StreamProcessor(new Worker("synthesizer.js"));
</script>

10) Trigger a sound sample to be played through the effects graph ASAP but without causing any blocking

<script>
 var effectsMixer = ...;
 function playSound(src) {
   var audio = new Audio(src);
   audio.oncanplaythrough = new function() {
     var stream = audio.getStream();
     stream.live = true;
     stream.onended = function() { effectsMixer.removeStream(stream); }
     effectsMixer.addStream(stream);
   }
 }
</script>

13) Capture video from a camera and analyze it (e.g. face recognition)

<script>
 navigator.getUserMedia('video', gotVideo);
 function gotVideo(stream) {
   stream.createProcessor(new Worker("face-recognizer.js"));
 }
</script>

14) Capture video, record it to a file and upload the file (e.g. Youtube)

<script>
 navigator.getUserMedia('video', gotVideo);
 var streamRecorder;
 function gotVideo(stream) {
   streamRecorder = stream.record();
 }
 function stopRecording() {
   streamRecorder.getRecordedData(gotData);
 }
 function gotData(blob) {
   var x = new XMLHttpRequest();
   x.open('POST', 'uploadMessage');
   x.send(blob);
 }
</script>

Related Proposals

W3C-RTC charter (Harald et. al.): RTCStreamAPI

WhatWG proposal (Ian et. al.): [1]

Chrome audio API: [2]

MediaStreamAPI: Difference between revisions

Revision as of 19:07, 8 April 2011

Contents

Streams, RTC, audio API and media controllers

Scenarios

Straw-man Proposal

Streams

Media elements

Stream extensions

Stream mixing and processing

Graph cycles

Dynamic graph changes

Examples

Related Proposals

Navigation menu

MediaStreamAPI: Difference between revisions

Revision as of 19:07, 8 April 2011

Streams, RTC, audio API and media controllers

Scenarios

Straw-man Proposal

Streams

Media elements

Stream extensions

Stream mixing and processing

Graph cycles

Dynamic graph changes

Examples

Related Proposals

Navigation menu

Search