User:Corban/AudioAPI
Defining an Enhanced API for Audio (Draft Recommendation)
Abstract
The HTML5 specification introduces the audio and video media elements, and with them the opportunity to dramatically change the way we integrate media on the web. The current API provides ways to play and get limited information about audio and video, but gives no way to programatically access or create such media. We present a new API for these media elements which allows web developers to read and write raw audio data.
Authors
- David Humphrey
- Corban Brook
- Al MacDonald
- Thomas Saunders
Status
This is a work in progress. This document reflects the current thinking of its authors, and is not an official specification. The goal of this specification is to experiment with audio data on the way to creating a more stable recommendation. It is hoped that this work, and the ideas it generates, will eventually find its way into Mozilla and other HTML5 compatible browsers.
The continuing work on this specification and API can be tracked here, and in Mozilla bug 490705. Comments, feedback, and collaboration is welcome.
Current API
We have developed a proof of concept, experimental build of Firefox which extends the HTMLMediaElement (e.g., affecting <video> and <audio>) and implements the following basic API for reading and writing raw audio data:
Reading Audio
Audio data is made available in real-time via an event-based API. As the audio is played, and therefore decoded, each frame is passed to content scripts for processing before being written to the audio layer. Playing, pausing, and stopping the audio all affect the streaming of this raw audio data as well.
onaudiowritten="callback(event);"
<audio src="song.ogg" onaudiowritten="audioWritten(event);"></audio>
mozFrameBuffer
var samples; function audioWritten(event) { samples = event.mozFrameBuffer; // sample data is obtained using samples.item(n) }
Getting FFT Spectrum
Most data visualizations or other uses of raw audio data begin by calculating a FFT. A pre-calculated FFT is available for each frame of audio decoded.
mozSpectrum
var spectrum; function audioWritten(event) { spectrum = event.mozSpectrum; // spectrum data is obtained using spectrum.item(n) }
Writing Audio
It is also possible to setup an audio element for raw writing from script (i.e., without a src attribute). Content scripts can specify the audio stream's characteristics, then write audio frames using the following methods.
mozSetup(channels, sampleRate, volume)
var audioOutput = new Audio(); audioOutput.mozSetup(2, 44100, 1);
mozAudioWrite(length, buffer)
var samples = [0.242, 0.127, 0.0, -0.058, -0.242, ...]; audioOutput.mozAudioWrite(samples.length, samples);
DOM Implementation
nsIDOMAudioData
Audio data (raw and spectrum) is currently returned in a pseudo-array named nsIDOMAudioData. In future this will be changed to use the much faster native WebGL Array.
interface nsIDOMAudioData : nsISupports { readonly attribute unsigned long length; float item(in unsigned long index); };
The length attribute indicates the number of elements of data returned.
The item() method provides a getter for audio elements.
nsIDOMNotifyAudioWrittenEvent
Audio data is made available via the following event:
- Event: AudioWrittenEvent
- Event handler: onaudiowritten
The AudioWrittenEvent is defined as follows:
interface nsIDOMNotifyAudioWrittenEvent : nsIDOMEvent { readonly attribute nsIDOMAudioData mozFrameBuffer; readonly attribute nsIDOMAudioData mozSpectrum; };
The mozFrameBuffer attribute contains the raw audio data (float values) obtained from decoding a single frame of audio. This is of the form [left, right, left, right, ...]. All audio frames are normalized to a length of 4096 or greater, where shorter frames are padded with 0 (zero).
The mozSpectrum attribute contains a pre-calculated FFT for this frame of audio data. It is calculated on using the first 4096 float values in the current audio frame only, which may include zeros used to pad the buffer. It is always 2048 elements in length.
nsIDOMHTMLMediaElement additions
void mozSetup(in PRUint32 channels, in PRUint32 rate, in float volume); void mozWriteAudio(in PRUint32 count, [array, size_is(count)] in float valueArray);
The mozSetup() method allows an <audio> or <video> element to be setup for writing from script. This method must be called before mozWriteAudio can be called, since an audio stream has to be created for the media element. It takes three arguments:
- channels - the number of audio channels (e.g., 2)
- rate - the audio's sample rate (e.g., 44100 samples per second)
- volume - the initial volume to use (e.g., 1.0)
The choices made for channel and rate are significant, because they determine the frame size you must use when passing data to mozWriteAudio().
The mozWriteAudio() method can be called after mozSetup(). It allows a frame of audio (or multiple frames, but whole frames) to be written directly from script. It takes two arguments:
- count - the number of elements in this frame (e.g., 4096)
- valueArray - an array of floats, which represent a complete frame of audio (or multiple frames, but whole frames).
Both mozWriteAudio() and mozSetup() will throw exceptions if called out of order, or if audio frame sizes do not match.
Additional Resources
A series of blog posts document the evolution and implementation of this API: http://vocamus.net/dave/?cat=25
A patch for Firefox 3.7 is available if you would like to experiment with this API. Mac builds can be downloaded here (10.5) and here (10.6).
A number of working demos have been created, including:
- http://weare.buildingsky.net/processing/dft.js/audio.new.html (video here)
- http://bocoup.com/core/code/firefox-fft/audio-f1lt3r.html (video here)
- http://bocoup.com/core/code/firefox-audio/whale-fft2/whale-fft.html (video here)
- http://weare.buildingsky.net/processing/beat_detektor/beat_detektor.html
- http://bocoup.com/core/code/firefox-audio/html-sings/audio-out-music-gen-f1lt3r.html
- http://weare.buildingsky.net/processing/pjsaudio/examples/sequencer.html (video here)
- http://weare.buildingsky.net/processing/pjsaudio/examples/osc4.html
- http://weare.buildingsky.net/processing/pjsaudio/examples/osc5.html
- http://weare.buildingsky.net/processing/pjsaudio/examples/osc6.html
- http://mavra.perilith.com/~luser/test3.html