MediaStreamAPI: Difference between revisions

Line 127: Line 127:
While 'worker' is null, the output is produced simply by adding the streams together. Video frames are composited with the last-added stream on top, everything letterboxed to the size of the last-added stream that has video. While there is no input stream, the StreamProcessor produces silence and no video.  
While 'worker' is null, the output is produced simply by adding the streams together. Video frames are composited with the last-added stream on top, everything letterboxed to the size of the last-added stream that has video. While there is no input stream, the StreamProcessor produces silence and no video.  


While 'worker' is non-null, the results of mixing (or the default silence) are fed into the worker by dispatching onstream callbacks. Each onstream callback takes a StreamEvent as a parameter. A StreamEvent provides audio sample buffers and a list of video frames for each input stream; the event callback can write audio output buffers and a list of output video frames. If the callback does not output audio, default audio output is automatically generated as above; ditto for video. Each StreamEvent contains the inputParams for each input stream contributing to the StreamEvent.
While 'worker' is non-null, the results of mixing (or the default silence) are fed into the worker by dispatching onstream callbacks. Each onstream callback takes a StreamEvent as a parameter. A StreamEvent provides audio sample buffers for each input stream; the event callback can write audio output buffers and a list of output video frames. If the callback does not output audio, default audio output is automatically generated as above. Each StreamEvent contains the parameters associated with each input stream contributing to the StreamEvent.


Note that 'worker' cannot be a SharedWorker. This ensures that the worker can run in the same process as the page in multiprocess browsers, so media streams can be confined to a single process.
Note that 'worker' cannot be a SharedWorker. This ensures that the worker can run in the same process as the page in multiprocess browsers, so media streams can be confined to a single process.
Line 136: Line 136:
   attribute Function onprocessstream;
   attribute Function onprocessstream;
   attribute float streamRewindMax;
   attribute float streamRewindMax;
   void setAudioFormat(long sampleRate, short channels);
   attribute boolean variableAudioFormats;
  };
  };


Line 142: Line 142:
   
   
  interface StreamEvent {
  interface StreamEvent {
   attribute any inputParams[];
   readonly attribute float rewind;
  attribute float rewind;
   
   
   attribute long audioSampleRate;    // e.g. 44100
   readonly attribute StreamBuffer inputs[];
  attribute short audioChannels;      // Mapping per Vorbis specification
   void writeAudio(long sampleRate, short channels, Float32Array data);
  attribute Float32Array audioInputSamples[];
   void writeAudio(Float32Array data);
  };
  };


'inputParams' provides access to structured clones of the latest parameters set for each input stream.
To support graph changes with low latency, we might need to throw out processed samples that have already been buffered and reprocess them. The 'rewind' attribute indicates how far back in the stream's history we have moved before the current inputs start. It is a non-negative value less than or equal to the value of streamRewindMax on entry to the event handler. The default value of streamRewindMax is zero so by default 'rewind' is always zero; filters that support rewinding need to opt into it.


'audioSampleRate' and 'audioChannels' represent the format of the input and output samples. The sample buffers for all input streams are automatically converted to a common format by the UA. By default the UA will choose a format based on the format of the input streams, typically the highest-fidelity format (to avoid lossy conversion). If there are no inputs, the UA will choose 44.1KHz stereo. The format is not allowed to vary across event handlers; the first time onprocessstream is fired, the format will be fixed. If 'setAudioFormat' is called before the first event handler fires, that format will override the UA's default choice.
'inputs' provides access to a StreamBuffer representing data produced by each input stream.


'audioInputs' gives access to the audio samples for each input stream. The length of each sample buffer will be a multiple of 'audioChannels'. The samples are floats ranging from -1 to 1. The lengths of the sample buffers will be equal. Streams with no audio produce a buffer containing silence.
interface StreamBuffer {
  readonly attribute any parameters;
  readonly attribute long audioSampleRate;
  readonly attribute short audioChannels;
  readonly attribute Float32Array audioSamples;
  // TODO something for video frames.
};


'writeAudio' writes audio data to the stream output. If 'writeAudio' is not called before the event handler returns, the inputs are automatically mixed and written to the output. The format of the output is the same as the inputs; the 'data' array length must be a multiple of audioChannels. 'writeAudio' can be called more than once during an event handler; the data will be appended to the output stream.
'parameters' returns a structured clone of the latest parameters set for each input stream.
 
'audioSampleRate' and 'audioChannels' represent the format of the samples. 'audioSampleRate' is the number of samples per second. 'audioChannels' is the number of channels; the channel mapping is as defined in the Vorbis specification.
 
If 'variableAudioFormats' is false (the default) when the event handler fires, the UA will convert all the input audio to a single common format before presenting them to the event handler. Typically the UA would choose the highest-fidelity format to avoid lossy conversion. If variableAudioFormats was false for the previous invocation of the event handler, the UA also ensures that the format stays the same as the format used by the previous invocation of the handler.
 
'audioSamples' gives access to the audio samples for each input stream. The length of each sample buffer will be a multiple of 'audioChannels'. The samples are floats ranging from -1 to 1. The durations of the input buffers for each input stream will be equal (or as equal as possible given varying sample rates).
 
Streams not containing audio will have audioChannels set to zero, and the audioSamples array will be empty --- unless variableAudioFormats is false and some input stream has audio.
 
'writeAudio' writes audio data to the stream output. If 'writeAudio' is not called before the event handler returns, the inputs are automatically mixed and written to the output. The 'data' array length must be a multiple of 'channels'. 'writeAudio' can be called more than once during an event handler; the data will be appended to the output stream.


There is no requirement that the amount of data output match the input buffer length. A filter with a delay will output less data than the size of the input buffer, at least during the first event; the UA will compensate by trying to buffer up more input data and firing the event again to get more output. A synthesizer with no inputs can output as much data as it wants; the UA will buffer data and fire events as necessary. Filters that misbehave, e.g. by continuously writing zero-length buffers, will cause the stream to block.
There is no requirement that the amount of data output match the input buffer length. A filter with a delay will output less data than the size of the input buffer, at least during the first event; the UA will compensate by trying to buffer up more input data and firing the event again to get more output. A synthesizer with no inputs can output as much data as it wants; the UA will buffer data and fire events as necessary. Filters that misbehave, e.g. by continuously writing zero-length buffers, will cause the stream to block.
To support graph changes with low latency, we might need to throw out processed samples that have already been buffered and reprocess them. The 'rewind' attribute indicates how far back in the stream's history we have moved before the current inputs start. It is a non-negative value less than or equal to the value of streamRewindMax on entry to the event handler. The default value of streamRewindMax is zero so by default 'rewind' is always zero; filters that support rewinding need to opt into it.


==== Graph cycles  ====
==== Graph cycles  ====
1,295

edits