Audio Data API: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
 
(147 intermediate revisions by 21 users not shown)
Line 1: Line 1:
== Defining an Enhanced API for Audio (Draft Recommendation) ==
== Defining an Enhanced API for Audio (Draft Recommendation) ==
'''Note''': this API has been ''deprecated'' in favor of the [https://developer.mozilla.org/en-US/docs/Web_Audio_API Web Audio API] chosen by the W3C.


===== Abstract =====
===== Abstract =====


The HTML5 specification introduces the <audio> and <video> media elements, and with them the opportunity to dramatically change the way we integrate media on the web. The current HTML5 media API provides ways to play and get limited information about audio and video, but gives no way to programatically access or create such media.  We present a new extension to this API, which allows web developers to read and write raw audio data.
The HTML5 specification introduces the <audio> and <video> media elements, and with them the opportunity to dramatically change the way we integrate media on the web. The current HTML5 media API provides ways to play and get limited information about audio and video, but gives no way to programatically access or create such media.  We present a new Mozilla extension to this API, which allows web developers to read and write raw audio data.


===== Authors =====
===== Authors =====
Line 9: Line 11:
* David Humphrey ([http://twitter.com/humphd @humphd])
* David Humphrey ([http://twitter.com/humphd @humphd])
* Corban Brook ([http://twitter.com/corban @corban])
* Corban Brook ([http://twitter.com/corban @corban])
* Al MacDonald ([http://twitter.com/f1lt3r @f1lter])
* Al MacDonald ([http://twitter.com/f1lt3r @F1LT3R])
* Yury Delendik
* Yury Delendik
* Ricard Marxer ([http://twitter.com/ricardmp @ricardmp])
* Charles Cliffe ([http://twitter.com/ccliffe @ccliffe])


===== Other Contributors =====
===== Other Contributors =====
Line 16: Line 20:
* Thomas Saunders
* Thomas Saunders
* Ted Mielczarek
* Ted Mielczarek
* Felipe Gomes
* Ricard Marxer ([http://twitter.com/ricardmp @ricardmp])
===== Status =====


'''This is a work in progress.'''  This document reflects the current thinking of its authors, and is not an official specification.  The goal of this specification is to experiment with audio data on the way to creating a more stable recommendation.  It is hoped that this work, and the ideas it generates, will eventually find its way into Mozilla and other HTML5 compatible browsers.
== Standardization Note ==


The continuing work on this specification and API can be tracked here, and in Mozilla [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 bug 490705].  Comments, feedback, and collaboration are all welcome.
Please note that this document describes a non-standard experimental API.  This API is considered deprecated and may not be supported in future releases.  The World Wide Web Consortium (W3C) has chartered the [http://www.w3.org/2011/audio/ Audio Working Group] to develop standardized audio API specifications, including [[Web Audio API]].  Please refer to the Audio Working Group website for further details.


== API Tutorial ==
== API Tutorial ==


We have developed a proof of concept, experimental build of Firefox (see below) which extends the HTMLMediaElement (e.g., affecting <video> and <audio>) and implements the following basic API for reading and writing raw audio data:
This API extends the HTMLMediaElement and HTMLAudioElement (e.g., affecting <video> and <audio>), and implements the following basic API for reading and writing raw audio data:


===== Reading Audio =====
===== Reading Audio =====


Audio data is made available via an event-based API.  As the audio is played, and therefore decoded, each frame is passed to content scripts for processing before being written to the audio layer.  Playing, pausing, and stopping the audio all affect the streaming of this raw audio data as well.
Audio data is made available via an event-based API.  As the audio is played, and therefore decoded, sample data is passed to content scripts in a framebuffer for processing after becoming available to the audio layer--hence the name, '''MozAudioAvailable'''.  These samples may or may not have been played yet at the time of the event.  The audio samples returned in the event are raw, and have not been adjusted for mute/volume settings on the media element.  Playing, pausing, and seeking the audio also affect the streaming of this raw audio data.


<code>onaudiowritten="callback(event);"</code>
Users of this API can register two callbacks on the &lt;audio&gt; or &lt;video&gt; element in order to consume this data:


<pre>
<pre>
<audio src="song.ogg" onaudiowritten="audioWritten(event);"></audio>
&lt;audio id="audio" src="song.ogg"&gt;&lt;/audio&gt;
&lt;script&gt;
  var audio = document.getElementById("audio");
  audio.addEventListener('MozAudioAvailable', audioAvailableFunction, false);
  audio.addEventListener('loadedmetadata', loadedMetadataFunction, false);
&lt;/script&gt;
</pre>
</pre>


<code>mozFrameBuffer</code>
The '''loadedmetadata''' event is a standard part of HTML5.  It now indicates that a media element (audio or video) has useful metadata loaded, which can be accessed using three new attributes:
 
* mozChannels
* mozSampleRate
* mozFrameBufferLength
 
Prior to the '''loadedmetadata''' event, accessing these attributes will cause an exception to be thrown, indicating that they are not known, or there is no audio.  These attributes indicate the '''number of channels''', audio '''sample rate per second''', and the '''default size of the framebuffer''' that will be used in '''MozAudioAvailable''' events.  This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data.
 
The '''MozAudioAvailable''' event provides two pieces of data.  The first is a framebuffer (i.e., an array) containing decoded audio sample data (i.e., floats).  The second is the time for these samples measured from the start in seconds.  Web developers consume this event by registering an event listener in script like so:


<pre>
<pre>
var samples;
&lt;audio id="audio" src="song.ogg"&gt;&lt;/audio&gt;
&lt;script&gt;
  var audio = document.getElementById("audio");
  audio.addEventListener('MozAudioAvailable', someFunction, false);
&lt;/script&gt;
</pre>


function audioWritten(event) {
An audio or video element can also be created with script outside the DOM:
  samples = event.mozFrameBuffer;
 
  // sample data is obtained using samples.item(n)
<pre>
}
var audio = new Audio();
audio.src = "song.ogg";
audio.addEventListener('MozAudioAvailable', someFunction, false);
audio.play();
</pre>
</pre>


===== Getting FFT Spectrum =====
The following is an example of how both events might be used:


Most data visualizations or other uses of raw audio data begin by calculating a FFT.  A pre-calculated FFT is available for each frame of audio decoded.
<pre>
var channels,
    rate,
    frameBufferLength,
    samples;


<code>mozSpectrum</code>
function audioInfo() {
  var audio = document.getElementById('audio');


<pre>
  // After loadedmetadata event, following media element attributes are known:
var spectrum;
  channels          = audio.mozChannels;
  rate              = audio.mozSampleRate;
  frameBufferLength = audio.mozFrameBufferLength;
}
 
function audioAvailable(event) {
  var samples = event.frameBuffer;
  var time    = event.time;


function audioWritten(event) {
  for (var i = 0; i < frameBufferLength; i++) {
  spectrum = event.mozSpectrum;
    // Do something with the audio data as it is played.
  // spectrum data is obtained using spectrum.item(n)
    processSample(samples[i], channels, rate);
  }
}
}
</pre>
</pre>


===== Complete Example: Reading and Displaying FFT Spectrum =====
===== Complete Example: Visualizing Audio Spectrum =====


This example uses the native FFT data from <code>mozSpectrum</code> to display the frequency spectrum in a canvas:
This example calculates and displays FFT spectrum data for the playing audio:


[[File:fft.png]]
[[File:fft.png]]
Line 74: Line 108:
<!DOCTYPE html>
<!DOCTYPE html>
<html>
<html>
   <head>    
   <head>
     <title>JavaScript Spectrum Example</title>
     <title>JavaScript Spectrum Example</title>
   </head>
   </head>
   <body>
   <body>
     <audio src="song.ogg"  
     <audio id="audio-element"
          src="song.ogg"
           controls="true"
           controls="true"
          onaudiowritten="audioWritten(event);"
           style="width: 512px;">
           style="width: 512px;">
     </audio>
     </audio>
   
     <div><canvas id="fft" width="512" height="200"></canvas></div>
     <div><canvas id="fft" width="512" height="200"></canvas></div>
   
 
     <script>
     <script>
      var spectrum;
       var canvas = document.getElementById('fft'),
     
          ctx = canvas.getContext('2d'),
       var canvas = document.getElementById('fft');
          channels,
      var ctx = canvas.getContext('2d');
          rate,
          frameBufferLength,
          fft;
 
      function loadedMetadata() {
        channels          = audio.mozChannels;
        rate              = audio.mozSampleRate;
        frameBufferLength = audio.mozFrameBufferLength;
       
        fft = new FFT(frameBufferLength / channels, rate);
      }
 
      function audioAvailable(event) {
        var fb = event.frameBuffer,
            t  = event.time, /* unused, but it's there */
            signal = new Float32Array(fb.length / channels),
            magnitude;
 
        for (var i = 0, fbl = frameBufferLength / 2; i < fbl; i++ ) {
          // Assuming interlaced stereo channels,
          // need to split and merge into a stero-mix mono signal
          signal[i] = (fb[2*i] + fb[2*i+1]) / 2;
        }
 
        fft.forward(signal);


      function audioWritten(event) {
        spectrum = event.mozSpectrum;
       
        var specSize = spectrum.length, magnitude;
       
         // Clear the canvas before drawing spectrum
         // Clear the canvas before drawing spectrum
         ctx.clearRect(0,0, canvas.width, canvas.height);
         ctx.clearRect(0,0, canvas.width, canvas.height);
       
 
         for ( var i = 0; i < specSize; i++ ) {
         for (var i = 0; i < fft.spectrum.length; i++ ) {
           magnitude = spectrum.item(i) * 4000; // multiply spectrum by a zoom value
           // multiply spectrum by a zoom value
            
           magnitude = fft.spectrum[i] * 4000;
 
           // Draw rectangle bars for each frequency bin
           // Draw rectangle bars for each frequency bin
           ctx.fillRect(i * 4, canvas.height, 3, -magnitude);
           ctx.fillRect(i * 4, canvas.height, 3, -magnitude);
         }
         }
       }
       }
      var audio = document.getElementById('audio-element');
      audio.addEventListener('MozAudioAvailable', audioAvailable, false);
      audio.addEventListener('loadedmetadata', loadedMetadata, false);
      // FFT from dsp.js, see below
      var FFT = function(bufferSize, sampleRate) {
        this.bufferSize  = bufferSize;
        this.sampleRate  = sampleRate;
        this.spectrum    = new Float32Array(bufferSize/2);
        this.real        = new Float32Array(bufferSize);
        this.imag        = new Float32Array(bufferSize);
        this.reverseTable = new Uint32Array(bufferSize);
        this.sinTable    = new Float32Array(bufferSize);
        this.cosTable    = new Float32Array(bufferSize);
        var limit = 1,
            bit = bufferSize >> 1;
        while ( limit < bufferSize ) {
          for ( var i = 0; i < limit; i++ ) {
            this.reverseTable[i + limit] = this.reverseTable[i] + bit;
          }
          limit = limit << 1;
          bit = bit >> 1;
        }
        for ( var i = 0; i < bufferSize; i++ ) {
          this.sinTable[i] = Math.sin(-Math.PI/i);
          this.cosTable[i] = Math.cos(-Math.PI/i);
        }
      };
      FFT.prototype.forward = function(buffer) {
        var bufferSize  = this.bufferSize,
            cosTable    = this.cosTable,
            sinTable    = this.sinTable,
            reverseTable = this.reverseTable,
            real        = this.real,
            imag        = this.imag,
            spectrum    = this.spectrum;
        if ( bufferSize !== buffer.length ) {
          throw "Supplied buffer is not the same size as defined FFT. FFT Size: " + bufferSize + " Buffer Size: " + buffer.length;
        }
        for ( var i = 0; i < bufferSize; i++ ) {
          real[i] = buffer[reverseTable[i]];
          imag[i] = 0;
        }
        var halfSize = 1,
            phaseShiftStepReal,
            phaseShiftStepImag,
            currentPhaseShiftReal,
            currentPhaseShiftImag,
            off,
            tr,
            ti,
            tmpReal,
            i;
        while ( halfSize < bufferSize ) {
          phaseShiftStepReal = cosTable[halfSize];
          phaseShiftStepImag = sinTable[halfSize];
          currentPhaseShiftReal = 1.0;
          currentPhaseShiftImag = 0.0;
          for ( var fftStep = 0; fftStep < halfSize; fftStep++ ) {
            i = fftStep;
            while ( i < bufferSize ) {
              off = i + halfSize;
              tr = (currentPhaseShiftReal * real[off]) - (currentPhaseShiftImag * imag[off]);
              ti = (currentPhaseShiftReal * imag[off]) + (currentPhaseShiftImag * real[off]);
              real[off] = real[i] - tr;
              imag[off] = imag[i] - ti;
              real[i] += tr;
              imag[i] += ti;
              i += halfSize << 1;
            }
            tmpReal = currentPhaseShiftReal;
            currentPhaseShiftReal = (tmpReal * phaseShiftStepReal) - (currentPhaseShiftImag * phaseShiftStepImag);
            currentPhaseShiftImag = (tmpReal * phaseShiftStepImag) + (currentPhaseShiftImag * phaseShiftStepReal);
          }
          halfSize = halfSize << 1;
}
        i = bufferSize/2;
        while(i--) {
          spectrum[i] = 2 * Math.sqrt(real[i] * real[i] + imag[i] * imag[i]) / bufferSize;
}
      };
     </script>
     </script>
   </body>
   </body>
Line 114: Line 265:
===== Writing Audio =====
===== Writing Audio =====


It is also possible to setup an audio element for raw writing from script (i.e., without a src attribute).  Content scripts can specify the audio stream's characteristics, then write audio frames using the following methods.
It is also possible to setup an &lt;audio&gt; element for raw writing from script (i.e., without a ''src'' attribute).  Content scripts can specify the audio stream's characteristics, then write audio samples using the following methods:


<code>mozSetup(channels, sampleRate, volume)</code>
<code>mozSetup(channels, sampleRate)</code>


<pre>
<pre>
// Create a new audio element
var audioOutput = new Audio();
var audioOutput = new Audio();
audioOutput.mozSetup(2, 44100, 1);
// Set up audio element with 2 channel, 44.1KHz audio stream.
audioOutput.mozSetup(2, 44100);
</pre>
</pre>


<code>mozWriteAudio(length, buffer)</code>
<code>mozWriteAudio(buffer)</code>


<pre>
<pre>
// Write samples using a JS Array
var samples = [0.242, 0.127, 0.0, -0.058, -0.242, ...];
var samples = [0.242, 0.127, 0.0, -0.058, -0.242, ...];
var buffered = audioOutput.mozWriteAudio(samples.length, samples);
var numberSamplesWritten = audioOutput.mozWriteAudio(samples);
 
// Write samples using a Typed Array
var samples = new Float32Array([0.242, 0.127, 0.0, -0.058, -0.242, ...]);
var numberSamplesWritten = audioOutput.mozWriteAudio(samples);
</pre>
 
<code>mozCurrentSampleOffset()</code>
 
<pre>
// Get current audible position of the underlying audio stream, measured in samples.
var currentSampleOffset = audioOutput.mozCurrentSampleOffset();
</pre>
</pre>


'''Note:''' To copy the input samples of one audio stream directly to another audio output, you will need to convert event.samples array to a native JavaScript array like so:
Since the '''MozAudioAvailable''' event and the '''mozWriteAudio()''' method both use '''Float32Array''', it is possible to take the output of one audio stream and pass it directly (or process first and then pass) to a second:


<pre>
<pre>
var output;
<audio id="a1"
      src="song.ogg"
      controls>
</audio>
<script>
var a1 = document.getElementById('a1'),
    a2 = new Audio(),
    buffers = [];


function audioWritten(event){
function loadedMetadata() {
  // Mute a1 audio.
  a1.volume = 0;
  // Setup a2 to be identical to a1, and play through there.
  a2.mozSetup(a1.mozChannels, a1.mozSampleRate);
}


   samples = event.mozFrameBuffer;
function audioAvailable(event) {
  // Write the current framebuffer
   var frameBuffer = event.frameBuffer; // frameBuffer is Float32Array
  writeAudio(frameBuffer);
}


  outputSamples = [];
a1.addEventListener('MozAudioAvailable', audioAvailable, false);
a1.addEventListener('loadedmetadata', loadedMetadata, false);


  for(var i=0;i < samples.length; i++){
function writeAudio(audioBuffer) {
    outputSamples[i] = samples.item(i);
  // audioBuffer is Float32Array
  } 
  buffers.push({buffer: audioBuffer, position: 0});


   // outputSamples[] is now ready for writing to other audio element
   // If there's buffered data, write that
  while(buffers.length > 0) {
    var buffer = buffers[0].buffer;
    var position = buffers[0].position;
    var written = a2.mozWriteAudio(buffer.subarray(position));
    // // If all data wasn't written, keep it in the buffers:
    if(position + written < buffer.length) {
      buffers[0].position = position + written;
      break;
    }
    buffers.shift();
  }
}
</script>
</pre>


Audio data written using the '''mozWriteAudio()''' method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (the sample offset that is currently being played by the hardware can be obtained with '''mozCurrentSampleOffset()'''), where a little means something on the order of 500ms of samples.  For example, if working with 2 channels at 44100 samples per second, a writing interval of 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).
It's also possible to auto detect the minimal duration of the pre-buffer, such that the sound is played without interruptions, and lag between writing and playback is minimal. To do this start writing the data in small portions and wait for the value returned by '''mozCurrentSampleOffset()''' to be more than 0.
<pre>
var prebufferSize = sampleRate * 0.020; // Initial buffer is 20 ms
var autoLatency = true, started = new Date().valueOf();
...
// Auto latency detection
if (autoLatency) {
  prebufferSize = Math.floor(sampleRate * (new Date().valueOf() - started) / 1000);
  if (audio.mozCurrentSampleOffset()) { // Play position moved?
    autoLatency = false;
  }
}
}
</pre>
</pre>
Line 162: Line 372:
   <body>
   <body>
     <input type="text" size="4" id="freq" value="440"><label for="hz">Hz</label>
     <input type="text" size="4" id="freq" value="440"><label for="hz">Hz</label>
    <button onclick="generateWaveform()">set</button>
     <button onclick="start()">play</button>
     <button onclick="start()">play</button>
     <button onclick="stop()">stop</button>
     <button onclick="stop()">stop</button>


     <script type="text/javascript">
     <script type="text/javascript">    
       var sampledata = [];
       function AudioDataDestination(sampleRate, readFn) {
      var freq = 440;
        // Initialize the audio output.
      var interval = -1;
        var audio = new Audio();
      var audio;
        audio.mozSetup(1, sampleRate);
 
        var currentWritePosition = 0;
        var prebufferSize = sampleRate / 2; // buffer 500ms
        var tail = null, tailPosition;
 
        // The function called with regular interval to populate
        // the audio output buffer.
        setInterval(function() {
          var written;
          // Check if some data was not written in previous attempts.
          if(tail) {
            written = audio.mozWriteAudio(tail.subarray(tailPosition));
            currentWritePosition += written;
            tailPosition += written;
            if(tailPosition < tail.length) {
              // Not all the data was written, saving the tail...
              return; // ... and exit the function.
            }
            tail = null;
          }
 
          // Check if we need add some data to the audio output.
          var currentPosition = audio.mozCurrentSampleOffset();
          var available = currentPosition + prebufferSize - currentWritePosition;
          if(available > 0) {
            // Request some sound data from the callback function.
            var soundData = new Float32Array(available);
            readFn(soundData);


      function writeData() {
            // Writting the data.
        var n = Math.ceil(freq / 100);
            written = audio.mozWriteAudio(soundData);
        for(var i=0;i<n;i++)
            if(written < soundData.length) {
           audio.mozWriteAudio(sampledata.length, sampledata);
              // Not all the data was written, saving the tail.
              tail = soundData;
              tailPosition = written;
            }
            currentWritePosition += written;
           }
        }, 100);
       }
       }


       function start() {
       // Control and generate the sound.
        audio = new Audio();
 
        audio.mozSetup(1, 44100, 1);
      var frequency = 0, currentSoundSample;
        interval = setInterval(writeData, 10);
      var sampleRate = 44100;
      }


       function stop() {
       function requestSoundData(soundData) {
         if (interval != -1) {
         if (!frequency) {  
           clearInterval(interval);
           return; // no sound selected
          interval = -1;
         }
         }
        var k = 2* Math.PI * frequency / sampleRate;
        for (var i=0, size=soundData.length; i<size; i++) {
          soundData[i] = Math.sin(k * currentSoundSample++);
        }       
       }
       }


       function generateWaveform() {
      var audioDestination = new AudioDataDestination(sampleRate, requestSoundData);
         freq = parseFloat(document.getElementById("freq").value);
 
        // we're playing at 44.1kHz, so figure out how many samples
       function start() {
        // will give us one full period
         currentSoundSample = 0;
        var samples = 44100 / freq;
        frequency = parseFloat(document.getElementById("freq").value);
        sampledata = Array(Math.round(samples));
        for (var i=0; i<sampledata.length; i++) {
          sampledata[i] = Math.sin(2*Math.PI * (i / sampledata.length));
        }
       }
       }


       generateWaveform();
       function stop() {
        frequency = 0;
      }
   </script>
   </script>
   </body>
   </body>
Line 210: Line 454:
== DOM Implementation ==  
== DOM Implementation ==  


===== nsIDOMAudioData =====
===== nsIDOMNotifyAudioAvailableEvent =====
 
Audio data is made available via the following event:
 
* '''Event''': AudioAvailableEvent
* '''Event handler''': onmozaudioavailable


Audio data (raw and spectrum) is currently returned in a pseudo-array named '''nsIDOMAudioData'''.  In future this will be changed to use the much faster native WebGL Array.
The '''AudioAvailableEvent''' is defined as follows:


<pre>
<pre>
interface nsIDOMAudioData : nsISupports
interface nsIDOMNotifyAudioAvailableEvent : nsIDOMEvent
{
{
   readonly attribute unsigned long length;
  // frameBuffer is really a Float32Array
   float             item(in unsigned long index);
   readonly attribute jsval  frameBuffer;
   readonly attribute float time;
};
};
</pre>
</pre>


The '''length''' attribute indicates the number of elements of data returned.
The '''frameBuffer''' attribute contains a typed array ('''Float32Array''') with the raw audio data (32-bit float values) obtained from decoding the audio (e.g., the raw data being sent to the audio hardware vs. encoded audio).  This is of the form <nowiki>[channel1, channel2, ..., channelN, channel1, channel2, ..., channelN, ...]</nowiki>.  All audio frames are normalized to a length of channels * 1024 by default, but could be any length between 512 and 16384 if the user has set a different length using the '''mozFrameBufferLength''' attribute.


The '''item()''' method provides a getter for audio sample data (e.g., floats).
The '''time''' attribute contains a float representing the time in seconds of the first sample in the '''frameBuffer''' array since the start of the audio track.


===== nsIDOMNotifyAudioWrittenEvent =====
===== nsIDOMHTMLMediaElement additions =====
 
Audio data is made available via the following event:


* '''Event''': AudioWrittenEvent
Audio metadata is made available via three new attributes on the HTMLMediaElement.  By default these attributes throw if accessed before the '''loadedmetadata''' event occurs.  Users who need this info before the audio starts playing should not use '''autoplay''', since the audio might start before a loadmetadata handler has run.
* '''Event handler''': onaudiowritten


The '''AudioWrittenEvent''' is defined as follows:
The three new attributes are defined as follows:


<pre>
<pre>
interface nsIDOMNotifyAudioWrittenEvent : nsIDOMEvent
   readonly attribute unsigned long mozChannels;
{
   readonly attribute unsigned long mozSampleRate;
   readonly attribute nsIDOMAudioData mozFrameBuffer;
          attribute unsigned long mozFrameBufferLength;
   readonly attribute nsIDOMAudioData mozSpectrum;
};
</pre>
</pre>


The '''mozFrameBuffer''' attribute contains the raw audio data (float values) obtained from decoding a single frame of audioThis is of the form <nowiki>[left, right, left, right, ...]</nowiki>All audio frames are normalized to a length of 4096 or greater, where shorter frames are padded with 0 (zero).
The '''mozChannels''' attribute contains the number of channels in the audio resource (e.g., 2).  The '''mozSampleRate''' attribute contains the number of samples per second that will be played, for example 44100. Both are read-only.
 
The '''mozFrameBufferLength''' attribute indicates the number of samples that will be returned in the framebuffer of each '''MozAudioAvailable''' eventThis number is a total for all channels, and by default is set to be the number of channels * 1024 (e.g., 2 channels * 1024 samples = 2048 total).


The '''mozSpectrum''' attribute contains a pre-calculated FFT for this frame of audio data.  It is calculated using the first 4096 float values in the current audio frame only, which may include zeros used to pad the buffer.  It is always 1024 elements in length.
The '''mozFrameBufferLength''' attribute can also be set to a new value, if users want lower latency, or larger amounts of data, etcThe size given '''must''' be a number between 512 and 16384.  Using any other size will result in an exception being thrown.  The best time to set a new length is after the '''loadedmetadata''' event fires, when the audio info is known, but before the audio has started or '''MozAudioAvailable''' events begun firing.


===== nsIDOMHTMLMediaElement additions =====
===== nsIDOMHTMLAudioElement additions =====


Audio write access is achieved by adding two new methods to the HTML media element:
The HTMLAudioElement has also been extended to allow write access.  Audio writing is achieved by adding three new methods:


<pre>
<pre>
void mozSetup(in long channels, in long rate, in float volume);
  void mozSetup(in long channels, in long rate);
 
  unsigned long mozWriteAudio(array); // array is Array() or Float32Array()
void mozWriteAudio(in long count, [array, size_is(count)] in float valueArray);
  unsigned long long mozCurrentSampleOffset();
</pre>
</pre>


The '''mozSetup()''' method allows an &lt;audio&gt; or &lt;video&gt; element to be setup for writing from script.  This method '''must''' be called before '''mozWriteAudio''' can be called, since an audio stream has to be created for the media element.  It takes three arguments:
The '''mozSetup()''' method allows an &lt;audio&gt; element to be setup for writing from script.  This method '''must''' be called before '''mozWriteAudio''' or '''mozCurrentSampleOffset''' can be called, since an audio stream has to be created for the media element.  It takes two arguments:


# '''channels''' - the number of audio channels (e.g., 2)
# '''channels''' - the number of audio channels (e.g., 2)
# '''rate''' - the audio's sample rate (e.g., 44100 samples per second)
# '''rate''' - the audio's sample rate (e.g., 44100 samples per second)
# '''volume''' - the initial volume to use (e.g., 1.0)


The choices made for '''channel''' and '''rate''' are significant, because they determine the frame size you must use when passing data to '''mozWriteAudio()'''.
The choices made for '''channel''' and '''rate''' are significant, because they determine the amount of data you must pass to '''mozWriteAudio()'''.  That is, you must pass an array with enough data for each channel specified in '''mozSetup()'''.
 
The '''mozSetup()''' method, if called more than once, will recreate a new audio stream (destroying an existing one if present) with each call.  Thus it is safe to call this more than once, but unnecessary.
 
The '''mozWriteAudio()''' method can be called after '''mozSetup()'''.  It allows audio data to be written directly from script.  It takes one argument, '''array'''.  This is a JS Array (i.e., new Array()) or a typed float array (i.e., new Float32Array()) containing the audio data (floats) you wish to write.  It must be 0 or N elements in length, where N % channels == 0, otherwise an exception is thrown.  


The '''mozWriteAudio()''' method can be called after '''mozSetup()'''.  It allows a frame of audio (or multiple frames, but whole frames) to be written directly from script.  It takes two arguments:
The '''mozWriteAudio()''' method returns the number of samples that were just written, which may or may not be the same as the number in '''array'''.  Only the number of samples that can be written without blocking the audio hardware will be written.  It is the responsibility of the caller to deal with any samples that don't get written in the first pass (e.g., buffer and write in the next call).


# '''count''' - the number of elements in this frame (e.g., 4096)
The '''mozCurrentSampleOffset()''' method can be called after '''mozSetup()'''.  It returns the current position (measured in samples) of the audio stream.  This is useful when determining how much data to write with '''mozWriteAudio()'''.
# '''valueArray''' - an array of floats, which represent a complete frame of audio (or multiple frames, but whole frames).


Both '''mozWriteAudio()''' and '''mozSetup()''' will throw exceptions if called out of order, or if audio frame sizes do not match.
All of '''mozWriteAudio()''', '''mozCurrentSampleOffset()''', and '''mozSetup()''' will throw exceptions if called out of order.  '''mozSetup()''' will also throw if a ''src'' attribute has previously been set on the audio element (i.e., you can't do both at the same time).
 
===== Security =====
 
Similar to the &lt;canvas&gt; element and its '''getImageData''' method, the '''MozAudioAvailable''' event's '''frameBuffer''' attribute protects against information leakage between origins.
 
The '''MozAudioAvailable''' event's '''frameBuffer''' attribute will throw if the origin of audio resource does not match the document's origin.  NOTE: this will affect users who have the security.fileuri.strict_origin_policy set, and are working locally with file:/// URIs.
 
===== Compatibility with Audio Backends =====
 
The current MozAudioAvailable implementation integrates with Mozilla's decoder abstract base classes, and therefore, any audio decoder which uses these base classes automatically dispatches MozAudioAvailable events.  At the time of writing, this includes the Ogg, WebM, and Wave decoders.


== Additional Resources ==
== Additional Resources ==
Line 276: Line 535:
A series of blog posts document the evolution and implementation of this API: http://vocamus.net/dave/?cat=25.  Another overview by Al MacDonald is available [http://weblog.bocoup.com/web-audio-all-aboard here].
A series of blog posts document the evolution and implementation of this API: http://vocamus.net/dave/?cat=25.  Another overview by Al MacDonald is available [http://weblog.bocoup.com/web-audio-all-aboard here].


=== Obtaining Code and Builds ===
Al has also written 2 very good tutorials and video demos of [http://weblog.bocoup.com/read-html5-audio-data-with-firefox-4 reading] and [http://weblog.bocoup.com/generate-sound-with-javascript-in-firefox-4 writing] audio with the API.
 
The BBC Research and Development Blog has also done an excellent overview of the API [http://www.bbc.co.uk/blogs/researchanddevelopment/2010/11/mozilla-audio-data-api.shtml here].


A patch is available in the [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 bug], if you would like to experiment with this API.  We have also created builds you can download and run locally:
=== Bug ===


'''NOTE: the API and implementation are changing rapidly. We aren't able to post builds as quickly as we'd like, but will put them here as changes mature.
The work on this API is available in Mozilla [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 bug 490705].


* [http://scotland.proximity.on.ca/dxr/tmp/firefox-3.7a1pre.en-US.mac.dmg Mac OS X 10.6]
=== Obtaining Builds ===
* [http://scotland.proximity.on.ca/dxr/tmp/firefox-3.7a1pre.en-US.mac.dmg-10.5-tgz Mac OS X 10.5]
* [http://scotland.proximity.on.ca/dxr/tmp/firefox-3.7a1pre.en-US.win32.zip Windows 32-bit]
* [http://scotland.proximity.on.ca/dxr/tmp/firefox-3.7a1pre.en-US.linux-i686.tar.bz2 Linux 32-bit]


A version of Firefox combining [https://bugzilla.mozilla.org/show_bug.cgi?id=508906 Multi-Touch screen input from Felipe Gomes] and audio data access from David Humphrey can be downloaded [http://gul.ly/5q here].
[http://nightly.mozilla.org/ Firefox trunk nightlies] include the Audio Data API (starting with 2010-08-26 builds).


=== JavaScript Audio Libraries ===
=== JavaScript Audio Libraries ===


We have started work on a JavaScript library to make building audio web apps easier.  Details are [[Audio Data API JS Library|here]].
* We have started work on a JavaScript library to make building audio web apps easier.  Details are [[Audio Data API JS Library|here]] and https://github.com/corbanbrook/dsp.js.
* [https://github.com/corbanbrook/audionode.js audionode.js] acts as a javascript bridge between the Web Audio API and the Audio Data API allowing us to run the examples http://weare.buildingsky.net/processing/audionode.js/examples/index.html.
* [https://github.com/notmasteryet/audiodata Audio Data API Objects] - An high level abstraction (and an usage example) of the Audio Data API.
* [https://github.com/bfirsh/dynamicaudio.js dynamicaudio.js] - An interface for writing audio with a Flash fall back for older browsers.
* [https://beatdetektor.svn.sourceforge.net/svnroot/beatdetektor/trunk/core/js/beatdetektor.js Beat Detektor] by Charles Cliffe, uses dsp.js to add beat detection.
* [https://github.com/jussi-kalliokoski/audiolib.js audiolib.js] by Jussi Kalliokoski, a powerful audio tools library for JavaScript, compatible with the Audio Data API and Chrome's Audio API.
* [https://github.com/oampo/Audiolet Audiolet] - A JavaScript library for real-time audio synthesis and composition from within the browser
* [https://github.com/grantgalitz/XAudioJS XAudioJS] - A JavaScript library that provides a raw audio sample writing access to the mozilla audio data and web audio APIs. Provides a basic write and callback system so the developer can be assured to have gapless audio for these two APIs. Also provides a fallback WAV PCM data URI generator that is not guaranteed to be gapless.
* [http://www.gregjopa.com/2011/05/calculate-note-frequencies-in-javascript-with-music-js/ Music.js, library containing functions and data sets to generate notes, intervals, chords, and scales]
* [https://github.com/BillyWM/jsmodplayer Javascript .MOD and .XM music player]


=== Working Audio Data Demos ===
=== Working Audio Data Demos ===
'''NOTE: we recently took down two servers that were hosting many of these demos.  We are working to find a new home for them.'''


A number of working demos have been created, including:
A number of working demos have been created, including:


* FFT visualization (calculated with js)
* [http://videos.mozilla.org/serv/blizzard/audio-slideshow Overview slideshow demo of various features] ([http://www.youtube.com/watch?v=oJ1UsLoPX3E video here])
 
* Audio Visualizations
** http://tllabs.io/audiopaper/ paper.js audio visualization
** http://traction.untergrund.net/slamdown/
** http://www.nihilogic.dk/labs/pocket_full_of_html5/ (Demo by Jacob Seidelin)
** http://weare.buildingsky.net/processing/dsp.js/examples/fft.html
** http://weare.buildingsky.net/processing/dsp.js/examples/fft.html
** http://weare.buildingsky.net/processing/dft.js/audio.new.html (video [http://vimeo.com/8525101 here])
** http://www.storiesinflight.com/jsfft/visualizer/index.html (Demo by Thomas Sturm)
** http://www.grantgalitz.org/sound_test/ WAV Decoder & Visualizer (Pre-loaded)
** http://www.grantgalitz.org/wav_player/ WAV Decoder & Visualizer (Load in your own .wav)
 
* Applying Realtime Audio Effects
** Volume, pitch, etc. UI for audio - https://developer.mozilla.org/en-US/demos/detail/voron (homepage: http://kievII.net)
** JS IIR Filter http://weare.buildingsky.net/processing/dsp.js/examples/filter.html (video [http://vimeo.com/11335434 here])
** Vocodes a formant with a carrier wave http://weare.buildingsky.net/processing/dsp.js/examples/vocoder.html
** Biquad Filter example http://weare.buildingsky.net/processing/dsp.js/examples/biquad.html
** Graphic EQ example http://weare.buildingsky.net/processing/dsp.js/examples/grapheq.html
** Delay effect http://code.almeros.com/code-examples/delay-firefox-audio-api/ (video of older version [http://vimeo.com/11780707 here])
** Reverb effect http://code.almeros.com/code-examples/reverb-firefox-audio-api/ (video [http://vimeo.com/13386796 here])


* FFT visualization (calculated with C++ - mozSpectrum)
* Generating and Playing Audio
** http://bocoup.com/core/code/firefox-fft/audio-f1lt3r.html (video [http://vimeo.com/8872704 here])
** http://bitterspring.net/blog/2012/01/25/morning-star-synth-0-1-released/
** http://www.storiesinflight.com/jsfft/visualizer/index.html (Demo by Thomas Sturm)
** http://onlinetonegenerator.com/
** http://blog.nihilogic.dk/2010/04/html5-audio-visualizations.html (Demo and API by Jacob Seidelin -- video [http://vimeo.com/11355121 here])
** [http://jsmad.org/ mp3 decoder in js]
** http://ondras.zarovi.cz/demos/audio/
** [http://cosinusoidally.github.com/mp2dec.js/ mp2 decoder in js]
** [http://www.oampo.co.uk/2011/05/technocrat/ Ambient techno machine]
** [http://www.gregjopa.com/2011/05/calculate-note-frequencies-in-javascript-with-music-js/ Music.js, library containing functions and data sets to generate notes, intervals, chords, and scales]
** [https://hacks.mozilla.org/2011/01/html5guitar/ HTML5 Guitar Tab Player]
** [http://automata.cc/src/vivace/experiments/matrix.html Tone matrix using Audiolet.js]
** [http://www.oampo.co.uk/labs/audiolet-demo/ Generating music in JS via audiolet.js], [http://www.oampo.co.uk/labs/breakbeat/ breakbeat demo]
** [http://humphd.github.com/sfxr.js/ sfxr.js] - sound effect generator/editor for video games.
** [http://jonbro.tk/blog/2010/09/19/html_5_chip_tracker.html JavaScript Chip Tracker app] (demo by Jonathan Brodsky)
** JavaScript Audio Sampler http://weare.buildingsky.net/processing/dsp.js/examples/sampler.html
** SamplePlayer, SampleLoader, Sequencer and Keyboard http://code.almeros.com/code-examples/sampler-firefox-audio-api/ (video [http://vimeo.com/13805893 here])
** Square Wave Generation http://weare.buildingsky.net/processing/dsp.js/examples/squarewave.html
** Random Noise Generation http://weare.buildingsky.net/processing/dsp.js/examples/nowave.html
** JS Multi-Oscillator Synthesizer http://weare.buildingsky.net/processing/dsp.js/examples/synthesizer.html (video [http://vimeo.com/11411533 here])
** Bloop http://async5.org/audiodata/examples/bloop-ea/bloop-audiodata.html
** JavaScript Text to Speech engine http://async5.org/audiodata/tts/index.html
** Toy Piano http://async5.org/audiodata/examples/piano.html (and the sample-based piano http://async5.org/audiodata/examples/piano-s/piano2.html)
** Csound Shaker Instrument http://async5.org/audiodata/csound/shaker.htm and Bar Instrument http://async5.org/audiodata/csound/bar.htm
** Canon Theremin Piano http://mtg.upf.edu/static/media/canon-theremin-piano.html (by Zacharias Vamvakousis zackbam@gmail.com).
** Manipulate music example using mouse and accelerometer http://blog.dt.in.th/stuff/audiodata/ (Thai Pangsakulyanont)
** Tuning exploration, Wicki keyboard and Karplus-Strong synthesizer http://www.toverlamp.org/static/wickisynth/wickisynth.html (Piers Titus van der Torren)
** Modular Synthesizer with MIDI, control and audio ports. http://www.niiden.com/jstmodular/ (Jussi Kalliokoski)
** Dual-axis Theremin controlling pitch and volume with cursor position. http://stu.ie/?p=2599 (Stuart Gilbert)
** JavaScript "Image to Sound" generator http://zhangjw.bai-hua.org/audio_test6.html (ZhangJW)
** XAudioJS library test page http://www.grantgalitz.org/sound_test/


* Beat Detection (also showing use of WebGL for 3D visualizations)
* Beat Detection (also showing use of WebGL for 3D visualizations)
** http://cubicvr.org/CubicVR.js/BeatDetektor1HD.html (video [http://vimeo.com/11345262 here])
** http://people.mozilla.com/~prouget/demos/boomboom/index.html
** http://cubicvr.org/CubicVR.js/BeatDetektor2HD.html (video [http://vimeo.com/11345685 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor1HD.html (video [http://vimeo.com/11345262 here])
** http://weare.buildingsky.net/processing/beat_detektor/beat_detektor.html
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor2HD.html (video of older version [http://vimeo.com/11345685 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HD.html (video [http://www.youtube.com/watch?v=OxoFcyKYwr0&fmt=22 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HDFX.html (same, but with more effects)
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor4HD.html (video [http://www.youtube.com/watch?v=dym4DqpJuDk&fmt=22 here])
** http://cubicvr.org/CubicVR.js/bd_fluid_sim/BD_GPUFluid.html
 
* Writing Audio from JavaScript, Digital Signal Processing
** API Example: [http://audioscene.org/?p=171 Inverted Waveform Cancellation]
** API Example: [http://audioscene.org/?p=255 Stereo Splitting and Panning]
** API Example: [http://audioscene.org/?p=267/ Mid-Side Microphone Decoder]
** API Example: [http://audioscene.org/?p=279 Ambient Extraction Mixer]
** API Example: [http://audioscene.org/?p=302 Worker Thread Audio Processing]
 
* Audio Games
** http://www.oampo.co.uk/labs/fireflies/
** http://www.oampo.co.uk/labs/siren-song/
 
=== Demos Needing to be Updated to New API ===
 
* FFT visualization (calculated with js)
** Experimental JavaScript port Pure Data http://mccormick.cx/dev/webpd/ with demo http://mccormick.cx/dev/webpd/demos/processingjs-basic-example-with-audio/index.html
 
** http://ondras.zarovi.cz/demos/audio/
** http://code.bocoup.com/processing-js/3d-fft/viz.xhtml
** http://code.bocoup.com/processing-js/3d-fft/viz.xhtml


Line 316: Line 646:
** http://bocoup.com/core/code/firefox-audio/whale-fft2/whale-fft.html (video [http://vimeo.com/8872808 here])
** http://bocoup.com/core/code/firefox-audio/whale-fft2/whale-fft.html (video [http://vimeo.com/8872808 here])


* Writing Audio from JavaScript, Digital Signal Processing
<!-- ** Simple Tone Generator http://mavra.perilith.com/~luser/test3.html
** Simple Tone Generator http://mavra.perilith.com/~luser/test3.html
** Playing Scales http://bocoup.com/core/code/firefox-audio/html-sings/audio-out-music-gen-f1lt3r.html (video [http://www.youtube.com/watch?v=HLkOgy1yO14&feature=player_embedded here])
** Playing Scales http://bocoup.com/core/code/firefox-audio/html-sings/audio-out-music-gen-f1lt3r.html (video [http://www.youtube.com/watch?v=HLkOgy1yO14&feature=player_embedded here])
** Square Wave Generation http://weare.buildingsky.net/processing/dsp.js/examples/squarewave.html
** Interactive Audio Application, Bloom http://code.bocoup.com/bloop/color/bloop.html (video [http://vimeo.com/11346141 here] and [http://vimeo.com/11345133 here]) -->
** Random Noise Generation http://weare.buildingsky.net/processing/dsp.js/examples/nowave.html
 
** JS Multi-Oscillator Synthesizer http://weare.buildingsky.net/processing/dsp.js/examples/synthesizer.html (video [http://vimeo.com/11411533 here])
=== Third Party Discussions ===
** JS IIR Filter http://weare.buildingsky.net/processing/dsp.js/examples/filter.html (video [http://vimeo.com/11335434 here])  
 
** Csound shaker instrument ported to JavaScript via Processing.js http://scotland.proximity.on.ca/dxr/tmp/audio/shaker/
A number of people have written about our work, including:
** API Example: [http://code.bocoup.com/audio-data-api/examples/inverted-waveform-cancellation Inverted Waveform Cancellation]
 
** API Example: [http://code.bocoup.com/audio-data-api/examples/stereo-splitting-and-panning Stereo Splitting and Panning]
* http://ajaxian.com/archives/amazing-audio-sampling-in-javascript-with-firefox
** API Example: [http://code.bocoup.com/audio-data-api/examples/mid-side-microphone-decoder/ Mid-Side Microphone Decoder]
* http://createdigitalmusic.com/2010/05/03/real-sound-synthesis-now-an-open-standard-in-the-browser/
** API Example: [http://code.bocoup.com/audio-data-api/examples/ambient-extraction-mixer/ Ambient Extraction Mixer]
* http://www.webmonkey.com/2010/05/new-html5-tools-make-your-browser-sing-and-dance/
** Biquad filter http://www.ricardmarxer.com/audioapi/biquad/ (demo by Ricard Marxer)
* http://www.wired.co.uk/news/archive/2010-05/04/new-html5-tools-give-adobe-flash-the-finger
** Interactive Audio Application, Bloom http://code.bocoup.com/bloop/color/bloop.html (video [http://vimeo.com/11346141 here] and [http://vimeo.com/11345133 here])
* http://hacks.mozilla.org/2010/04/beyond-html5-experiments-with-interactive-audio/
* http://schepers.cc/?p=212
* http://createdigitalmusic.com/2010/05/27/browser-madness-3d-music-mountainscapes-web-based-pd-patching/
* http://news.slashdot.org/story/10/05/26/1936224/Breakthroughs-In-HTML-Audio-Via-Manipulation-With-JavaScript
* http://ajaxian.com/archives/amazing-audio-api-javascript-demos
* http://www.webmonkey.com/2010/08/sampleplayer-makes-your-browser-sing-sans-flash/

Latest revision as of 07:22, 18 May 2013

Defining an Enhanced API for Audio (Draft Recommendation)

Note: this API has been deprecated in favor of the Web Audio API chosen by the W3C.

Abstract

The HTML5 specification introduces the <audio> and <video> media elements, and with them the opportunity to dramatically change the way we integrate media on the web. The current HTML5 media API provides ways to play and get limited information about audio and video, but gives no way to programatically access or create such media. We present a new Mozilla extension to this API, which allows web developers to read and write raw audio data.

Authors
Other Contributors
  • Thomas Saunders
  • Ted Mielczarek

Standardization Note

Please note that this document describes a non-standard experimental API. This API is considered deprecated and may not be supported in future releases. The World Wide Web Consortium (W3C) has chartered the Audio Working Group to develop standardized audio API specifications, including Web Audio API. Please refer to the Audio Working Group website for further details.

API Tutorial

This API extends the HTMLMediaElement and HTMLAudioElement (e.g., affecting <video> and <audio>), and implements the following basic API for reading and writing raw audio data:

Reading Audio

Audio data is made available via an event-based API. As the audio is played, and therefore decoded, sample data is passed to content scripts in a framebuffer for processing after becoming available to the audio layer--hence the name, MozAudioAvailable. These samples may or may not have been played yet at the time of the event. The audio samples returned in the event are raw, and have not been adjusted for mute/volume settings on the media element. Playing, pausing, and seeking the audio also affect the streaming of this raw audio data.

Users of this API can register two callbacks on the <audio> or <video> element in order to consume this data:

<audio id="audio" src="song.ogg"></audio>
<script>
  var audio = document.getElementById("audio");
  audio.addEventListener('MozAudioAvailable', audioAvailableFunction, false);
  audio.addEventListener('loadedmetadata', loadedMetadataFunction, false);
</script>

The loadedmetadata event is a standard part of HTML5. It now indicates that a media element (audio or video) has useful metadata loaded, which can be accessed using three new attributes:

  • mozChannels
  • mozSampleRate
  • mozFrameBufferLength

Prior to the loadedmetadata event, accessing these attributes will cause an exception to be thrown, indicating that they are not known, or there is no audio. These attributes indicate the number of channels, audio sample rate per second, and the default size of the framebuffer that will be used in MozAudioAvailable events. This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data.

The MozAudioAvailable event provides two pieces of data. The first is a framebuffer (i.e., an array) containing decoded audio sample data (i.e., floats). The second is the time for these samples measured from the start in seconds. Web developers consume this event by registering an event listener in script like so:

<audio id="audio" src="song.ogg"></audio>
<script>
  var audio = document.getElementById("audio");
  audio.addEventListener('MozAudioAvailable', someFunction, false);
</script>

An audio or video element can also be created with script outside the DOM:

var audio = new Audio();
audio.src = "song.ogg";
audio.addEventListener('MozAudioAvailable', someFunction, false);
audio.play();

The following is an example of how both events might be used:

var channels,
    rate,
    frameBufferLength,
    samples;

function audioInfo() {
  var audio = document.getElementById('audio');

  // After loadedmetadata event, following media element attributes are known:
  channels          = audio.mozChannels;
  rate              = audio.mozSampleRate;
  frameBufferLength = audio.mozFrameBufferLength;
}

function audioAvailable(event) {
  var samples = event.frameBuffer;
  var time    = event.time;

  for (var i = 0; i < frameBufferLength; i++) {
    // Do something with the audio data as it is played.
    processSample(samples[i], channels, rate);
  }
}
Complete Example: Visualizing Audio Spectrum

This example calculates and displays FFT spectrum data for the playing audio:

Fft.png

<!DOCTYPE html>
<html>
  <head>
    <title>JavaScript Spectrum Example</title>
  </head>
  <body>
    <audio id="audio-element"
           src="song.ogg"
           controls="true"
           style="width: 512px;">
    </audio>
    <div><canvas id="fft" width="512" height="200"></canvas></div>

    <script>
      var canvas = document.getElementById('fft'),
          ctx = canvas.getContext('2d'),
          channels,
          rate,
          frameBufferLength,
          fft;

      function loadedMetadata() {
        channels          = audio.mozChannels;
        rate              = audio.mozSampleRate;
        frameBufferLength = audio.mozFrameBufferLength;
         
        fft = new FFT(frameBufferLength / channels, rate);
      }

      function audioAvailable(event) {
        var fb = event.frameBuffer,
            t  = event.time, /* unused, but it's there */
            signal = new Float32Array(fb.length / channels),
            magnitude;

        for (var i = 0, fbl = frameBufferLength / 2; i < fbl; i++ ) {
          // Assuming interlaced stereo channels,
          // need to split and merge into a stero-mix mono signal
          signal[i] = (fb[2*i] + fb[2*i+1]) / 2;
        }

        fft.forward(signal);

        // Clear the canvas before drawing spectrum
        ctx.clearRect(0,0, canvas.width, canvas.height);

        for (var i = 0; i < fft.spectrum.length; i++ ) {
          // multiply spectrum by a zoom value
          magnitude = fft.spectrum[i] * 4000;

          // Draw rectangle bars for each frequency bin
          ctx.fillRect(i * 4, canvas.height, 3, -magnitude);
        }
      }

      var audio = document.getElementById('audio-element');
      audio.addEventListener('MozAudioAvailable', audioAvailable, false);
      audio.addEventListener('loadedmetadata', loadedMetadata, false);

      // FFT from dsp.js, see below
      var FFT = function(bufferSize, sampleRate) {
        this.bufferSize   = bufferSize;
        this.sampleRate   = sampleRate;
        this.spectrum     = new Float32Array(bufferSize/2);
        this.real         = new Float32Array(bufferSize);
        this.imag         = new Float32Array(bufferSize);
        this.reverseTable = new Uint32Array(bufferSize);
        this.sinTable     = new Float32Array(bufferSize);
        this.cosTable     = new Float32Array(bufferSize);

        var limit = 1,
            bit = bufferSize >> 1;

        while ( limit < bufferSize ) {
          for ( var i = 0; i < limit; i++ ) {
            this.reverseTable[i + limit] = this.reverseTable[i] + bit;
          }

          limit = limit << 1;
          bit = bit >> 1;
        }

        for ( var i = 0; i < bufferSize; i++ ) {
          this.sinTable[i] = Math.sin(-Math.PI/i);
          this.cosTable[i] = Math.cos(-Math.PI/i);
        }
      };

      FFT.prototype.forward = function(buffer) {
        var bufferSize   = this.bufferSize,
            cosTable     = this.cosTable,
            sinTable     = this.sinTable,
            reverseTable = this.reverseTable,
            real         = this.real,
            imag         = this.imag,
            spectrum     = this.spectrum;

        if ( bufferSize !== buffer.length ) {
          throw "Supplied buffer is not the same size as defined FFT. FFT Size: " + bufferSize + " Buffer Size: " + buffer.length;
        }

        for ( var i = 0; i < bufferSize; i++ ) {
          real[i] = buffer[reverseTable[i]];
          imag[i] = 0;
        }

        var halfSize = 1,
            phaseShiftStepReal,	
            phaseShiftStepImag,
            currentPhaseShiftReal,
            currentPhaseShiftImag,
            off,
            tr,
            ti,
            tmpReal,	
            i;

        while ( halfSize < bufferSize ) {
          phaseShiftStepReal = cosTable[halfSize];
          phaseShiftStepImag = sinTable[halfSize];
          currentPhaseShiftReal = 1.0;
          currentPhaseShiftImag = 0.0;

          for ( var fftStep = 0; fftStep < halfSize; fftStep++ ) {
            i = fftStep;

            while ( i < bufferSize ) {
              off = i + halfSize;
              tr = (currentPhaseShiftReal * real[off]) - (currentPhaseShiftImag * imag[off]);
              ti = (currentPhaseShiftReal * imag[off]) + (currentPhaseShiftImag * real[off]);

              real[off] = real[i] - tr;
              imag[off] = imag[i] - ti;
              real[i] += tr;
              imag[i] += ti;

              i += halfSize << 1;
            }

            tmpReal = currentPhaseShiftReal;
            currentPhaseShiftReal = (tmpReal * phaseShiftStepReal) - (currentPhaseShiftImag * phaseShiftStepImag);
            currentPhaseShiftImag = (tmpReal * phaseShiftStepImag) + (currentPhaseShiftImag * phaseShiftStepReal);
          }

          halfSize = halfSize << 1;
	}

        i = bufferSize/2;
        while(i--) {
          spectrum[i] = 2 * Math.sqrt(real[i] * real[i] + imag[i] * imag[i]) / bufferSize;
	}
      };
    </script>
  </body>
</html>
Writing Audio

It is also possible to setup an <audio> element for raw writing from script (i.e., without a src attribute). Content scripts can specify the audio stream's characteristics, then write audio samples using the following methods:

mozSetup(channels, sampleRate)

// Create a new audio element
var audioOutput = new Audio();
// Set up audio element with 2 channel, 44.1KHz audio stream.
audioOutput.mozSetup(2, 44100);

mozWriteAudio(buffer)

// Write samples using a JS Array
var samples = [0.242, 0.127, 0.0, -0.058, -0.242, ...];
var numberSamplesWritten = audioOutput.mozWriteAudio(samples);

// Write samples using a Typed Array
var samples = new Float32Array([0.242, 0.127, 0.0, -0.058, -0.242, ...]);
var numberSamplesWritten = audioOutput.mozWriteAudio(samples);

mozCurrentSampleOffset()

// Get current audible position of the underlying audio stream, measured in samples.
var currentSampleOffset = audioOutput.mozCurrentSampleOffset();

Since the MozAudioAvailable event and the mozWriteAudio() method both use Float32Array, it is possible to take the output of one audio stream and pass it directly (or process first and then pass) to a second:

<audio id="a1" 
       src="song.ogg"
       controls>
</audio>
<script>
var a1 = document.getElementById('a1'),
    a2 = new Audio(),
    buffers = [];

function loadedMetadata() {
  // Mute a1 audio.
  a1.volume = 0;
  // Setup a2 to be identical to a1, and play through there.
  a2.mozSetup(a1.mozChannels, a1.mozSampleRate);
}

function audioAvailable(event) {
  // Write the current framebuffer
  var frameBuffer = event.frameBuffer; // frameBuffer is Float32Array
  writeAudio(frameBuffer);
}

a1.addEventListener('MozAudioAvailable', audioAvailable, false);
a1.addEventListener('loadedmetadata', loadedMetadata, false);

function writeAudio(audioBuffer) {
  // audioBuffer is Float32Array
  buffers.push({buffer: audioBuffer, position: 0});

  // If there's buffered data, write that
  while(buffers.length > 0) {
    var buffer = buffers[0].buffer;
    var position = buffers[0].position;
    var written = a2.mozWriteAudio(buffer.subarray(position));
    // // If all data wasn't written, keep it in the buffers:
    if(position + written < buffer.length) {
      buffers[0].position = position + written;
      break;
    }
    buffers.shift();
  }
}
</script>

Audio data written using the mozWriteAudio() method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (the sample offset that is currently being played by the hardware can be obtained with mozCurrentSampleOffset()), where a little means something on the order of 500ms of samples. For example, if working with 2 channels at 44100 samples per second, a writing interval of 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).

It's also possible to auto detect the minimal duration of the pre-buffer, such that the sound is played without interruptions, and lag between writing and playback is minimal. To do this start writing the data in small portions and wait for the value returned by mozCurrentSampleOffset() to be more than 0.

var prebufferSize = sampleRate * 0.020; // Initial buffer is 20 ms
var autoLatency = true, started = new Date().valueOf();
...
// Auto latency detection
if (autoLatency) {
  prebufferSize = Math.floor(sampleRate * (new Date().valueOf() - started) / 1000);
  if (audio.mozCurrentSampleOffset()) { // Play position moved?
    autoLatency = false;
  }
}
Complete Example: Creating a Web Based Tone Generator

This example creates a simple tone generator, and plays the resulting tone.

<!DOCTYPE html>
<html>
  <head>
    <title>JavaScript Audio Write Example</title>
  </head>
  <body>
    <input type="text" size="4" id="freq" value="440"><label for="hz">Hz</label>
    <button onclick="start()">play</button>
    <button onclick="stop()">stop</button>

    <script type="text/javascript">      
      function AudioDataDestination(sampleRate, readFn) {
        // Initialize the audio output.
        var audio = new Audio();
        audio.mozSetup(1, sampleRate);

        var currentWritePosition = 0;
        var prebufferSize = sampleRate / 2; // buffer 500ms
        var tail = null, tailPosition;

        // The function called with regular interval to populate 
        // the audio output buffer.
        setInterval(function() {
          var written;
          // Check if some data was not written in previous attempts.
          if(tail) {
            written = audio.mozWriteAudio(tail.subarray(tailPosition));
            currentWritePosition += written;
            tailPosition += written;
            if(tailPosition < tail.length) {
              // Not all the data was written, saving the tail...
              return; // ... and exit the function.
            }
            tail = null;
          }

          // Check if we need add some data to the audio output.
          var currentPosition = audio.mozCurrentSampleOffset();
          var available = currentPosition + prebufferSize - currentWritePosition;
          if(available > 0) {
            // Request some sound data from the callback function.
            var soundData = new Float32Array(available);
            readFn(soundData);

            // Writting the data.
            written = audio.mozWriteAudio(soundData);
            if(written < soundData.length) {
              // Not all the data was written, saving the tail.
              tail = soundData;
              tailPosition = written;
            }
            currentWritePosition += written;
          }
        }, 100);
      }

      // Control and generate the sound.

      var frequency = 0, currentSoundSample;
      var sampleRate = 44100;

      function requestSoundData(soundData) {
        if (!frequency) { 
          return; // no sound selected
        }

        var k = 2* Math.PI * frequency / sampleRate;
        for (var i=0, size=soundData.length; i<size; i++) {
          soundData[i] = Math.sin(k * currentSoundSample++);
        }        
      }

      var audioDestination = new AudioDataDestination(sampleRate, requestSoundData);

      function start() {
        currentSoundSample = 0;
        frequency = parseFloat(document.getElementById("freq").value);
      }

      function stop() {
        frequency = 0;
      }
  </script>
  </body>
</html>

DOM Implementation

nsIDOMNotifyAudioAvailableEvent

Audio data is made available via the following event:

  • Event: AudioAvailableEvent
  • Event handler: onmozaudioavailable

The AudioAvailableEvent is defined as follows:

interface nsIDOMNotifyAudioAvailableEvent : nsIDOMEvent
{
  // frameBuffer is really a Float32Array
  readonly attribute jsval  frameBuffer;
  readonly attribute float  time;
};

The frameBuffer attribute contains a typed array (Float32Array) with the raw audio data (32-bit float values) obtained from decoding the audio (e.g., the raw data being sent to the audio hardware vs. encoded audio). This is of the form [channel1, channel2, ..., channelN, channel1, channel2, ..., channelN, ...]. All audio frames are normalized to a length of channels * 1024 by default, but could be any length between 512 and 16384 if the user has set a different length using the mozFrameBufferLength attribute.

The time attribute contains a float representing the time in seconds of the first sample in the frameBuffer array since the start of the audio track.

nsIDOMHTMLMediaElement additions

Audio metadata is made available via three new attributes on the HTMLMediaElement. By default these attributes throw if accessed before the loadedmetadata event occurs. Users who need this info before the audio starts playing should not use autoplay, since the audio might start before a loadmetadata handler has run.

The three new attributes are defined as follows:

  readonly attribute unsigned long mozChannels;
  readonly attribute unsigned long mozSampleRate;
           attribute unsigned long mozFrameBufferLength;

The mozChannels attribute contains the number of channels in the audio resource (e.g., 2). The mozSampleRate attribute contains the number of samples per second that will be played, for example 44100. Both are read-only.

The mozFrameBufferLength attribute indicates the number of samples that will be returned in the framebuffer of each MozAudioAvailable event. This number is a total for all channels, and by default is set to be the number of channels * 1024 (e.g., 2 channels * 1024 samples = 2048 total).

The mozFrameBufferLength attribute can also be set to a new value, if users want lower latency, or larger amounts of data, etc. The size given must be a number between 512 and 16384. Using any other size will result in an exception being thrown. The best time to set a new length is after the loadedmetadata event fires, when the audio info is known, but before the audio has started or MozAudioAvailable events begun firing.

nsIDOMHTMLAudioElement additions

The HTMLAudioElement has also been extended to allow write access. Audio writing is achieved by adding three new methods:

  void mozSetup(in long channels, in long rate);
  unsigned long mozWriteAudio(array); // array is Array() or Float32Array()
  unsigned long long mozCurrentSampleOffset();

The mozSetup() method allows an <audio> element to be setup for writing from script. This method must be called before mozWriteAudio or mozCurrentSampleOffset can be called, since an audio stream has to be created for the media element. It takes two arguments:

  1. channels - the number of audio channels (e.g., 2)
  2. rate - the audio's sample rate (e.g., 44100 samples per second)

The choices made for channel and rate are significant, because they determine the amount of data you must pass to mozWriteAudio(). That is, you must pass an array with enough data for each channel specified in mozSetup().

The mozSetup() method, if called more than once, will recreate a new audio stream (destroying an existing one if present) with each call. Thus it is safe to call this more than once, but unnecessary.

The mozWriteAudio() method can be called after mozSetup(). It allows audio data to be written directly from script. It takes one argument, array. This is a JS Array (i.e., new Array()) or a typed float array (i.e., new Float32Array()) containing the audio data (floats) you wish to write. It must be 0 or N elements in length, where N % channels == 0, otherwise an exception is thrown.

The mozWriteAudio() method returns the number of samples that were just written, which may or may not be the same as the number in array. Only the number of samples that can be written without blocking the audio hardware will be written. It is the responsibility of the caller to deal with any samples that don't get written in the first pass (e.g., buffer and write in the next call).

The mozCurrentSampleOffset() method can be called after mozSetup(). It returns the current position (measured in samples) of the audio stream. This is useful when determining how much data to write with mozWriteAudio().

All of mozWriteAudio(), mozCurrentSampleOffset(), and mozSetup() will throw exceptions if called out of order. mozSetup() will also throw if a src attribute has previously been set on the audio element (i.e., you can't do both at the same time).

Security

Similar to the <canvas> element and its getImageData method, the MozAudioAvailable event's frameBuffer attribute protects against information leakage between origins.

The MozAudioAvailable event's frameBuffer attribute will throw if the origin of audio resource does not match the document's origin. NOTE: this will affect users who have the security.fileuri.strict_origin_policy set, and are working locally with file:/// URIs.

Compatibility with Audio Backends

The current MozAudioAvailable implementation integrates with Mozilla's decoder abstract base classes, and therefore, any audio decoder which uses these base classes automatically dispatches MozAudioAvailable events. At the time of writing, this includes the Ogg, WebM, and Wave decoders.

Additional Resources

A series of blog posts document the evolution and implementation of this API: http://vocamus.net/dave/?cat=25. Another overview by Al MacDonald is available here.

Al has also written 2 very good tutorials and video demos of reading and writing audio with the API.

The BBC Research and Development Blog has also done an excellent overview of the API here.

Bug

The work on this API is available in Mozilla bug 490705.

Obtaining Builds

Firefox trunk nightlies include the Audio Data API (starting with 2010-08-26 builds).

JavaScript Audio Libraries

Working Audio Data Demos

NOTE: we recently took down two servers that were hosting many of these demos. We are working to find a new home for them.

A number of working demos have been created, including:

Demos Needing to be Updated to New API


Third Party Discussions

A number of people have written about our work, including: