User:Corban/AudioAPI

From MozillaWiki
Jump to navigation Jump to search

Defining an Enhanced API for Audio (Draft Recommendation)

Abstract

The HTML5 specification introduces the audio and video media elements, and with them the opportunity to dramatically change the way we integrate media on the web. The current API provides ways to play and get limited information about audio and video, but gives no way to programatically access or create such media. We present a new API for these media elements which allows web developers to read and write raw audio data.

Authors
  • David Humphrey
  • Corban Brook
  • Al MacDonald
  • Thomas Saunders
Status

This is a work in progress. This document reflects the current thinking of its authors, and is not an official specification. The goal of this specification is to experiment with audio data on the way to creating a more stable recommendation. It is hoped that this work, and the ideas it generates, will eventually find its way into Mozilla and other HTML5 compatible browsers.

The continuing work on this specification and API can be tracked here, and in Mozilla bug 490705. Comments, feedback, and collaboration is welcome.

Current API

We have developed a proof of concept, experimental build of Firefox which extends the HTMLMediaElement (e.g., affecting <video> and <audio>) and implements the following basic API for reading and writing raw audio data:

Reading Audio

Audio data is made available in real-time via an event-based API. As the audio is played, and therefore decoded, each frame is passed to content scripts for processing before being written to the audio layer. Playing, pausing, and stopping the audio all affect the streaming of this raw audio data as well.

onaudiowritten="callback(event);"

<audio src="song.ogg" onaudiowritten="audioWritten(event);"></audio>

mozFrameBuffer

var samples;

function audioWritten(event) {
  samples = event.mozFrameBuffer;
  // sample data is obtained using samples.item(n)
}
Getting FFT Spectrum

Most data visualizations or other uses of raw audio data begin by calculating a FFT. A pre-calculated FFT is available for each frame of audio decoded.

mozSpectrum

var spectrum;

function audioWritten(event) {
  spectrum = event.mozSpectrum;
  // spectrum data is obtained using spectrum.item(n)
}
Writing Audio

It is also possible to setup an audio element for raw writing from script (i.e., without a src attribute). Content scripts can specify the audio stream's characteristics, then write audio frames using the following methods.

mozSetup(channels, sampleRate, volume)

var audioOutput = new Audio();
audioOutput.mozSetup(2, 44100, 1);

mozAudioWrite(length, buffer)

var samples = [0.242, 0.127, 0.0, -0.058, -0.242, ...];
audioOutput.mozAudioWrite(samples.length, samples);

DOM Implementation

nsIDOMAudioData

Audio data (raw and spectrum) is currently returned in a pseudo-array named nsIDOMAudioData. In future this will be changed to use the much faster native WebGL Array.

interface nsIDOMAudioData : nsISupports
{
  readonly attribute unsigned long length;
  float              item(in unsigned long index);
};

The length attribute indicates the number of elements of data returned.

The item() method provides a getter for audio elements.

nsIDOMNotifyAudioWrittenEvent

Audio data is made available via the following event:

  • Event: AudioWrittenEvent
  • Event handler: onaudiowritten

The AudioWrittenEvent is defined as follows:

interface nsIDOMNotifyAudioWrittenEvent : nsIDOMEvent
{
  readonly attribute nsIDOMAudioData mozFrameBuffer;
  readonly attribute nsIDOMAudioData mozSpectrum;
};

The mozFrameBuffer attribute contains the raw audio data (float values) obtained from decoding a single frame of audio. This is of the form [left, right, left, right, ...]. All audio frames are normalized to a length of 4096 or greater, where shorter frames are padded with 0 (zero).

The mozSpectrum attribute contains a pre-calculated FFT for this frame of audio data. It is calculated on using the first 4096 float values in the current audio frame only, which may include zeros used to pad the buffer. It is always 2048 elements in length.

nsIDOMHTMLMediaElement additions
void mozSetup(in PRUint32 channels, in PRUint32 rate, in float volume);

void mozWriteAudio(in PRUint32 count, [array, size_is(count)] in float valueArray);

The mozSetup() method allows an <audio> or <video> element to be setup for writing from script. This method must be called before mozWriteAudio can be called, since an audio stream has to be created for the media element. It takes three arguments:

  1. channels - the number of audio channels (e.g., 2)
  2. rate - the audio's sample rate (e.g., 44100 samples per second)
  3. volume - the initial volume to use (e.g., 1.0)

The choices made for channel and rate are significant, because they determine the frame size you must use when passing data to mozWriteAudio().

The mozWriteAudio() method can be called after mozSetup(). It allows a frame of audio (or multiple frames, but whole frames) to be written directly from script. It takes two arguments:

  1. count - the number of elements in this frame (e.g., 4096)
  2. valueArray - an array of floats, which represent a complete frame of audio (or multiple frames, but whole frames).

Both mozWriteAudio() and mozSetup() will throw exceptions if called out of order, or if audio frame sizes do not match.

Additional Resources

A series of blog posts document the evolution and implementation of this API: http://vocamus.net/dave/?cat=25

A patch for Firefox 3.7 is available if you would like to experiment with this API. Mac builds can be downloaded here (10.5) and here (10.6).

A number of working demos have been created, including: