User:David.humphrey/Audio Data API 2: Difference between revisions

 
(31 intermediate revisions by the same user not shown)
Line 10: Line 10:
* Corban Brook ([http://twitter.com/corban @corban])
* Corban Brook ([http://twitter.com/corban @corban])
* Al MacDonald ([http://twitter.com/f1lt3r @F1LT3R])
* Al MacDonald ([http://twitter.com/f1lt3r @F1LT3R])
* Yury Delendik ([http://twitter.com/notmasteryet @notmasteryet])
* Yury Delendik
* Ricard Marxer ([http://twitter.com/ricardmp @ricardmp])


===== Other Contributors =====
===== Other Contributors =====
Line 17: Line 18:
* Ted Mielczarek
* Ted Mielczarek
* Felipe Gomes
* Felipe Gomes
* Ricard Marxer ([http://twitter.com/ricardmp @ricardmp])


===== Status =====
===== Status =====
Line 23: Line 23:
'''This is a work in progress.'''  This document reflects the current thinking of its authors, and is not an official specification.  The original goal of this specification was to experiment with web audio data on the way to creating a more stable recommendation.  The authors hoped that this work, and the ideas it generated, would eventually find their way into Mozilla and other HTML5 compatible browsers.  Both of these goals are within reach now, with work ramping up in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 this Mozilla bug], and the announcement of an official [http://www.w3.org/2005/Incubator/audio/ W3C Audio Incubator Group] chaired by one of the authors.
'''This is a work in progress.'''  This document reflects the current thinking of its authors, and is not an official specification.  The original goal of this specification was to experiment with web audio data on the way to creating a more stable recommendation.  The authors hoped that this work, and the ideas it generated, would eventually find their way into Mozilla and other HTML5 compatible browsers.  Both of these goals are within reach now, with work ramping up in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 this Mozilla bug], and the announcement of an official [http://www.w3.org/2005/Incubator/audio/ W3C Audio Incubator Group] chaired by one of the authors.


The continuing work on this specification and API can be tracked here, and in [[https://bugzilla.mozilla.org/show_bug.cgi?id=490705 the bug].  Comments, feedback, and collaboration are all welcome.  You can reach the authors on irc in the [irc://irc.mozilla.org/audio #audio channel] on irc.mozilla.org.
The continuing work on this specification and API can be tracked here, and in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 the bug].  Comments, feedback, and collaboration are all welcome.  You can reach the authors on irc in the [irc://irc.mozilla.org/audio #audio channel] on irc.mozilla.org.


===== Version =====
===== Version =====


This is the second major version of this API (referred to by the developers as audio12)--the previous version is available here.  The primary improvements and changes are:
This is the second major version of this API (referred to by the developers as audio13)--the previous version is available here.  The primary improvements and changes are:


* Removal of '''mozSpectrum''' (i.e., native FFT calculation) -- will be done in JS now.
* Removal of '''mozSpectrum''' (i.e., native FFT calculation) -- will be done in JS now.
Line 33: Line 33:
* Native array interfaces instead of using accessors and IDL array arguments.
* Native array interfaces instead of using accessors and IDL array arguments.
* No zero padding of audio data occurs anymore.  All frames are exactly 4096 elements in length.
* No zero padding of audio data occurs anymore.  All frames are exactly 4096 elements in length.
* Added '''mozSampleOffset()'''
* Added '''mozCurrentSampleOffset()'''
* Removed undocumented position/buffer methods on audio element.
* Removed undocumented position/buffer methods on audio element.
* Added '''mozChannels''', '''mozRate''', '''mozFrameBufferLength''' to '''loadedmetadata' event.


Demos written for the previous version are '''not''' compatible, though can be made to be quite easily.  See details below.
Demos written for the previous version are '''not''' compatible, though can be made to be quite easily.  See details below.
Line 46: Line 47:
Audio data is made available via an event-based API.  As the audio is played, and therefore decoded, each frame is passed to content scripts for processing after being written to the audio layer--hence the name, '''AudioWritten'''.  Playing and pausing the audio all affect the streaming of this raw audio data as well.
Audio data is made available via an event-based API.  As the audio is played, and therefore decoded, each frame is passed to content scripts for processing after being written to the audio layer--hence the name, '''AudioWritten'''.  Playing and pausing the audio all affect the streaming of this raw audio data as well.


Consumers of this raw audio data register a callback on the <audio> or <video> element like so:
Consumers of this raw audio data register two callbacks on the <audio> or <video> element like in order to consume this data:


<pre>
<pre>
<audio src="song.ogg" onaudiowritten="audioWritten(event);"></audio>
<audio src="song.ogg"
      onloadedmetadata="audioInfo(event);"
      onaudiowritten="audioWritten(event);">
</audio>
</pre>
</pre>


The AudioWritten event provides two pieces of data.  The first is a framebuffer (i.e., an array) containing sample data for the current frame.  The second is the time (e.g., milliseconds) for the start of this frame.
The '''LoadedMetadata''' event is a standard part of HTML5, and has been extended to provide more detailed information about the audio stream.  Specifically, developers can obtain the number of channels and sample rate per second of the audio.  This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data.
 
The '''AudioWritten''' event provides two pieces of data.  The first is a framebuffer (i.e., an array) containing sample data for the current frame.  The second is the time (e.g., milliseconds) for the start of this frame.
 
The following is an example of how both events might be used:


<pre>
<pre>
var samples;
var channels,
    rate,
    frameBufferLength,
    samples;
 
function audioInfo(event) {
  channels          = event.mozChannels;
  rate              = event.mozRate;
  frameBufferLength = event.mozFrameBufferLength;
}


function audioWritten(event) {
function audioWritten(event) {
Line 62: Line 79:


   for (var i=0, slen=samples.length; i<slen; i++) {
   for (var i=0, slen=samples.length; i<slen; i++) {
     processSample(samples[i]);
    // Do something with the audio data as it is played.
     processSample(samples[i], channels, rate);
   }
   }
}
}
Line 69: Line 87:
===== Complete Example: Visualizing Audio Spectrum =====
===== Complete Example: Visualizing Audio Spectrum =====


This example uses the native FFT data from <code>mozSpectrum</code> to display the frequency spectrum in a canvas:
This example calculates and displays FFT spectrum data for the playing audio:


[[File:fft.png]]
[[File:fft.png]]
Line 76: Line 94:
<!DOCTYPE html>
<!DOCTYPE html>
<html>
<html>
   <head>    
   <head>
     <title>JavaScript Spectrum Example</title>
     <title>JavaScript Spectrum Example</title>
   </head>
   </head>
   <body>
   <body>
     <audio src="song.ogg"  
     <audio src="song.ogg"
           controls="true"
           controls="true"
          onloadedmetadata="loadedMetadata(event);"
           onaudiowritten="audioWritten(event);"
           onaudiowritten="audioWritten(event);"
           style="width: 512px;">
           style="width: 512px;">
     </audio>
     </audio>
   
     <div><canvas id="fft" width="512" height="200"></canvas></div>
     <div><canvas id="fft" width="512" height="200"></canvas></div>
   
 
     <script>
     <script>
      var spectrum;
       var canvas = document.getElementById('fft'),
     
          ctx = canvas.getContext('2d'),
       var canvas = document.getElementById('fft');
          fft;
      var ctx = canvas.getContext('2d');
 
      function loadedMetadata(event) {
        var channels          = event.mozChannels,
            rate              = event.mozRate,
            frameBufferLength = event.mozFrameBufferLength;
       
        fft = new FFT(frameBufferLength / channels, rate),
      }


       function audioWritten(event) {
       function audioWritten(event) {
         spectrum = event.mozSpectrum;
         var fb = event.mozFrameBuffer,
       
            signal = new Float32Array(fb.length / channels),
         var specSize = spectrum.length, magnitude;
            magnitude;
          
 
         for (var i = 0, fbl = fb.length / 2; i < fbl; i++ ) {
          // Assuming interlaced stereo channels,
          // need to split and merge into a stero-mix mono signal
          signal[i] = (fb[2*i] + fb[2*i+1]) / 2;
         }
 
        fft.forward(signal);
 
         // Clear the canvas before drawing spectrum
         // Clear the canvas before drawing spectrum
         ctx.clearRect(0,0, canvas.width, canvas.height);
         ctx.clearRect(0,0, canvas.width, canvas.height);
       
 
         for ( var i = 0; i < specSize; i++ ) {
         for (var i = 0; i < fft.spectrum.length; i++ ) {
           magnitude = spectrum.item(i) * 4000; // multiply spectrum by a zoom value
           // multiply spectrum by a zoom value
            
           magnitude = fft.spectrum[i] * 4000;
 
           // Draw rectangle bars for each frequency bin
           // Draw rectangle bars for each frequency bin
           ctx.fillRect(i * 4, canvas.height, 3, -magnitude);
           ctx.fillRect(i * 4, canvas.height, 3, -magnitude);
         }
         }
       }
       }
      // FFT from dsp.js, see below
      var FFT = function(bufferSize, sampleRate) {
        this.bufferSize  = bufferSize;
        this.sampleRate  = sampleRate;
        this.spectrum    = new Float32Array(bufferSize/2);
        this.real        = new Float32Array(bufferSize);
        this.imag        = new Float32Array(bufferSize);
        this.reverseTable = new Uint32Array(bufferSize);
        this.sinTable    = new Float32Array(bufferSize);
        this.cosTable    = new Float32Array(bufferSize);
        var limit = 1,
            bit = bufferSize >> 1;
        while ( limit < bufferSize ) {
          for ( var i = 0; i < limit; i++ ) {
            this.reverseTable[i + limit] = this.reverseTable[i] + bit;
          }
          limit = limit << 1;
          bit = bit >> 1;
        }
        for ( var i = 0; i < bufferSize; i++ ) {
          this.sinTable[i] = Math.sin(-Math.PI/i);
          this.cosTable[i] = Math.cos(-Math.PI/i);
        }
      };
      FFT.prototype.forward = function(buffer) {
        var bufferSize  = this.bufferSize,
            cosTable    = this.cosTable,
            sinTable    = this.sinTable,
            reverseTable = this.reverseTable,
            real        = this.real,
            imag        = this.imag,
            spectrum    = this.spectrum;
        if ( bufferSize !== buffer.length ) {
          throw "Supplied buffer is not the same size as defined FFT. FFT Size: " +
                bufferSize + " Buffer Size: " + buffer.length;
        }
        for ( var i = 0; i < bufferSize; i++ ) {
          real[i] = buffer[reverseTable[i]];
          imag[i] = 0;
        }
        var halfSize = 1,
            phaseShiftStepReal,
            phaseShiftStepImag,
            currentPhaseShiftReal,
            currentPhaseShiftImag,
            off,
            tr,
            ti,
            tmpReal,
            i;
        while ( halfSize < bufferSize ) {
          phaseShiftStepReal = cosTable[halfSize];
          phaseShiftStepImag = sinTable[halfSize];
          currentPhaseShiftReal = 1.0;
          currentPhaseShiftImag = 0.0;
          for ( var fftStep = 0; fftStep < halfSize; fftStep++ ) {
            i = fftStep;
            while ( i < bufferSize ) {
              off = i + halfSize;
              tr = (currentPhaseShiftReal * real[off]) - (currentPhaseShiftImag * imag[off]);
              ti = (currentPhaseShiftReal * imag[off]) + (currentPhaseShiftImag * real[off]);
              real[off] = real[i] - tr;
              imag[off] = imag[i] - ti;
              real[i] += tr;
              imag[i] += ti;
              i += halfSize << 1;
            }
            tmpReal = currentPhaseShiftReal;
            currentPhaseShiftReal = (tmpReal * phaseShiftStepReal) - (currentPhaseShiftImag * phaseShiftStepImag);
            currentPhaseShiftImag = (tmpReal * phaseShiftStepImag) + (currentPhaseShiftImag * phaseShiftStepReal);
          }
          halfSize = halfSize << 1;
}
        i = bufferSize/2;
        while(i--) {
          spectrum[i] = 2 * Math.sqrt(real[i] * real[i] + imag[i] * imag[i]) / bufferSize;
}
      };
     </script>
     </script>
   </body>
   </body>
Line 139: Line 268:
</pre>
</pre>


<code>mozSampleOffset()</code>
<code>mozCurrentSampleOffset()</code>


<pre>
<pre>
// Get current position of the underlying audio stream, measured in samples written.
// Get current position of the underlying audio stream, measured in samples written.
var currentSampleOffset = audioOutput.mozSampleOffset();
var currentSampleOffset = audioOutput.mozCurrentSampleOffset();
</pre>
</pre>


Line 149: Line 278:


<pre>
<pre>
// Create a new audio stream for writing
<audio id="a1"
var aout = new Audio();
      src="song.ogg"
aout.mozSetup(2, 44100, 1);
      onloadedmetadata="loadedMetadata(event);"
      onaudiowritten="audioWritten(event);"
      controls="controls">
</audio>
<script>
var a1 = document.getElementById('a1'),
    a2 = new Audio(),


function audioWritten(event){
function loadedMetadata(event) {
   samples = event.mozFrameBuffer;
   // Mute a1 audio.
  a1.volume = 0;
  // Setup a2 to be identical to a1, and play through there.
  a2.mozSetup(event.mozChannels, event.mozRate, 1);
}


  // Do any filtering, signal processing, etc.
function audioWritten(event) {
  for (var i=0, slen=samples.length; i < slen; i++){
   // Write the current frame to a2
    process(samples[i]);
   a2.mozWriteAudio(event.mozFrameBuffer);
  
 
   aout.mozWriteAudio(samples);
}
}
</script>
</pre>
</pre>
Audio data written using the '''mozWriteAudio()''' method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (current sample offset of hardware can be obtained with '''mozCurrentSampleOffset()'''), where a little means something on the order of 500ms of samples.  For example, if working with 2 channels at 44100 samples per second, and a writing interval chosen that is equal to 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).


===== Complete Example: Creating a Web Based Tone Generator =====
===== Complete Example: Creating a Web Based Tone Generator =====
Line 177: Line 316:
   <body>
   <body>
     <input type="text" size="4" id="freq" value="440"><label for="hz">Hz</label>
     <input type="text" size="4" id="freq" value="440"><label for="hz">Hz</label>
    <button onclick="generateWaveform()">set</button>
     <button onclick="start()">play</button>
     <button onclick="start()">play</button>
     <button onclick="stop()">stop</button>
     <button onclick="stop()">stop</button>


     <script type="text/javascript">
     <script type="text/javascript">
       var sampledata = [];
       var sampleRate = 44100,
      var freq = 440;
          portionSize = sampleRate / 10,
      var interval = -1;
          prebufferSize = sampleRate / 2,
      var audio;
          freq = undefined; // no sound


       function writeData() {
       var audio = new Audio();
        var n = Math.ceil(freq / 100);
      audio.mozSetup(1, sampleRate, 1);
        for(var i=0;i<n;i++)
       var currentWritePosition = 0;
          audio.mozWriteAudio(sampledata);
       }


       function start() {
       function getSoundData(t, size) {
         audio = new Audio();
         var soundData = new Float32Array(size);
         audio.mozSetup(1, 44100, 1);
         if (freq) {
        interval = setInterval(writeData, 10);
          var k = 2* Math.PI * freq / sampleRate;
          for (var i=0; i<size; i++) {
            soundData[i] = Math.sin(k * (i + t));
          }
        }
        return soundData;
       }
       }


       function stop() {
       function writeData() {
         if (interval != -1) {
         while(audio.mozCurrentSampleOffset() + prebufferSize >= currentWritePosition) {
           clearInterval(interval);
           var soundData = getSoundData(currentWritePosition, portionSize);
           interval = -1;
          audio.mozWriteAudio(soundData);
           currentWritePosition += portionSize;
         }
         }
       }
       }


       function generateWaveform() {
      // initial write
      writeData();
      var writeInterval = Math.floor(1000 * portionSize / sampleRate);
      setInterval(writeData, writeInterval);
 
       function start() {
         freq = parseFloat(document.getElementById("freq").value);
         freq = parseFloat(document.getElementById("freq").value);
        // we're playing at 44.1kHz, so figure out how many samples
        // will give us one full period
        var samples = 44100 / freq;
        sampledata = Array(Math.round(samples));
        for (var i=0; i<sampledata.length; i++) {
          sampledata[i] = Math.sin(2*Math.PI * (i / sampledata.length));
        }
       }
       }


       generateWaveform();
       function stop() {
        freq = undefined;
      }
   </script>
   </script>
   </body>
   </body>
Line 224: Line 366:


== DOM Implementation ==  
== DOM Implementation ==  
===== nsIDOMNotifyAudioMetadataEvent =====
Audio metadata is provided via custom properties of the media element's '''loadedmetadata''' event.  This event occurs once when the browser first aquires information about the media resource.  The event details are as follows:
* '''Event''': LoadedMetadata
* '''Event handler''': onloadedmetadata
The '''LoadedMetadataEvent''' is defined as follows:
<pre>
interface nsIDOMNotifyAudioMetadataEvent : nsIDOMEvent
{
  readonly attribute unsigned long mozChannels;
  readonly attribute unsigned long mozRate;
  readonly attribute unsigned long mozFrameBufferLength;
};
</pre>
The '''mozChannels''' attribute contains a the number of channels in this audio resource (e.g., 2).  The '''mozRate''' attribute contains the number of samples per second that will be played, for example 44100.  The '''mozFrameBufferLength''' attribute contains the number of samples that will be returned in each '''AudioWritten''' event.  This number is a total for all channels (e.g., 2 channels * 2048 samples = 4096 total).


===== nsIDOMNotifyAudioWrittenEvent =====
===== nsIDOMNotifyAudioWrittenEvent =====
Line 271: Line 433:
The '''mozWriteAudio()''' method can be called after '''mozSetup()'''.  It allows audio data to be written directly from script.  It takes one argument:
The '''mozWriteAudio()''' method can be called after '''mozSetup()'''.  It allows audio data to be written directly from script.  It takes one argument:


# '''array''' - this is a JS Array (e.g., new Array()) or a typed array (e.g., new Float32Array()) containing the audio data (floats) you wish to write.  It must be 0 or N (where N % channels == 0) elements in length, otherwise a DOM error occurs.  
# '''array''' - this is a JS Array (i.e., new Array()) or a typed float array (i.e., new Float32Array()) containing the audio data (floats) you wish to write.  It must be 0 or N (where N % channels == 0) elements in length, otherwise a DOM error occurs.  


The '''mozCurrentSampleOffset()''' method can be called after '''mozSetup()'''.  It returns the current position (measured in samples) of the audio stream.  This is useful when determining how much data to write with '''mozWriteAudio()'''.
The '''mozCurrentSampleOffset()''' method can be called after '''mozSetup()'''.  It returns the current position (measured in samples) of the audio stream.  This is useful when determining how much data to write with '''mozWriteAudio()'''.
Line 298: Line 460:
=== JavaScript Audio Libraries ===
=== JavaScript Audio Libraries ===


We have started work on a JavaScript library to make building audio web apps easier.  Details are [[Audio Data API JS Library|here]].
* We have started work on a JavaScript library to make building audio web apps easier.  Details are [[Audio Data API JS Library|here]].
* [http://github.com/bfirsh/dynamicaudio.js dynamicaudio.js] - An interface for writing audio with a Flash fall back for older browsers.


=== Working Audio Data Demos ===
=== Working Audio Data Demos ===
Line 308: Line 471:
==== Demos Working on Current API ====
==== Demos Working on Current API ====


* ...
* FFT visualization (calculated with js)
** http://weare.buildingsky.net/processing/dsp.js/examples/fft.html
 
* Beat Detection (also showing use of WebGL for 3D visualizations)
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor1HD-13a.html (video [http://vimeo.com/11345262 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor2HD-13a.html (video of older version [http://vimeo.com/11345685 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HD-13a.html (video [http://www.youtube.com/watch?v=OxoFcyKYwr0&fmt=22 here])
 
* Writing Audio from JavaScript, Digital Signal Processing
** Csound shaker instrument ported to JavaScript via Processing.js http://scotland.proximity.on.ca/dxr/tmp/audio/shaker/


==== Demos Needing to be Updated to New API ====
==== Demos Needing to be Updated to New API ====


* FFT visualization (calculated with js)
** http://weare.buildingsky.net/processing/dsp.js/examples/fft.html
** http://weare.buildingsky.net/processing/dft.js/audio.new.html (video [http://vimeo.com/8525101 here])
** http://weare.buildingsky.net/processing/dft.js/audio.new.html (video [http://vimeo.com/8525101 here])


Line 322: Line 492:
** http://ondras.zarovi.cz/demos/audio/
** http://ondras.zarovi.cz/demos/audio/


* Beat Detection (also showing use of WebGL for 3D visualizations)
** http://cubicvr.org/CubicVR.js/BeatDetektor1HD.html (video [http://vimeo.com/11345262 here])
** http://cubicvr.org/CubicVR.js/BeatDetektor2HD.html (video [http://vimeo.com/11345685 here])
** http://weare.buildingsky.net/processing/beat_detektor/beat_detektor.html
** http://weare.buildingsky.net/processing/beat_detektor/beat_detektor.html
** http://code.bocoup.com/processing-js/3d-fft/viz.xhtml
** http://code.bocoup.com/processing-js/3d-fft/viz.xhtml
Line 338: Line 505:
** JS Multi-Oscillator Synthesizer http://weare.buildingsky.net/processing/dsp.js/examples/synthesizer.html (video [http://vimeo.com/11411533 here])
** JS Multi-Oscillator Synthesizer http://weare.buildingsky.net/processing/dsp.js/examples/synthesizer.html (video [http://vimeo.com/11411533 here])
** JS IIR Filter http://weare.buildingsky.net/processing/dsp.js/examples/filter.html (video [http://vimeo.com/11335434 here])  
** JS IIR Filter http://weare.buildingsky.net/processing/dsp.js/examples/filter.html (video [http://vimeo.com/11335434 here])  
** Csound shaker instrument ported to JavaScript via Processing.js http://scotland.proximity.on.ca/dxr/tmp/audio/shaker/
** API Example: [http://code.bocoup.com/audio-data-api/examples/inverted-waveform-cancellation Inverted Waveform Cancellation]
** API Example: [http://code.bocoup.com/audio-data-api/examples/inverted-waveform-cancellation Inverted Waveform Cancellation]
** API Example: [http://code.bocoup.com/audio-data-api/examples/stereo-splitting-and-panning Stereo Splitting and Panning]
** API Example: [http://code.bocoup.com/audio-data-api/examples/stereo-splitting-and-panning Stereo Splitting and Panning]
Confirmed users
656

edits