Audio Data API Review Version: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Redirected page to Audio Data API)
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
#REDIRECT [[Audio Data API]]
'''NOTE''': this page is outdated, please see [[Audio Data API]] for the latest documentation.
== Defining an Enhanced API for Audio (Draft Recommendation) ==
== Defining an Enhanced API for Audio (Draft Recommendation) ==


===== Abstract =====
===== Abstract =====


The HTML5 specification introduces the <audio> and <video> media elements, and with them the opportunity to dramatically change the way we integrate media on the web. The current HTML5 media API provides ways to play and get limited information about audio and video, but gives no way to programatically access or create such media.  We present a new extension to this API, which allows web developers to read and write raw audio data.
The HTML5 specification introduces the <audio> and <video> media elements, and with them the opportunity to dramatically change the way we integrate media on the web. The current HTML5 media API provides ways to play and get limited information about audio and video, but gives no way to programatically access or create such media.  We present a new Mozilla extension to this API, which allows web developers to read and write raw audio data.


===== Authors =====
===== Authors =====
Line 18: Line 23:
* Thomas Saunders
* Thomas Saunders
* Ted Mielczarek
* Ted Mielczarek
* Felipe Gomes ([http://twitter.com/felipc @felipc])
===== Status =====
'''This is a work in progress.'''  This document reflects the current thinking of its authors, and is not an official specification.  The original goal of this specification was to experiment with web audio data on the way to creating a more stable recommendation.  The authors hoped that this work, and the ideas it generated, would eventually find their way into Mozilla and other HTML5 compatible browsers.  Both of these goals are within reach now, with work ramping up in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 this Mozilla bug], and the announcement of an official [http://www.w3.org/2005/Incubator/audio/ W3C Audio Incubator Group] chaired by one of the authors.
The continuing work on this specification and API can be tracked here, and in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 the bug].  Comments, feedback, and collaboration are all welcome.  You can reach the authors on irc in the [irc://irc.mozilla.org/audio #audio channel] on irc.mozilla.org.
===== Version =====
'''NOTE:''' ''This is a working/review version of the documentation, and will change more frequently in response to discussions in the bug.  A more stable version is [[Audio Data API|available here]].
The history of this page provides the most complete view of what has changed since the review process began.  However, here are some notable changes since audio13h:
* Removed nsIDOMNotifyAudioMetadataEvent
* Added mozChannels, mozSampleRate, mozFrameBufferLength to nsHTMLMediaElement
* Removed mozSetFrameBufferLength from nsHTMLAudioElement
* Converted mozTime (in AudioWAvailable event) to float value (seconds instead of ms)
Demos written for the previous version are '''not''' compatible, though can be made to be quite easily.  See details below.


== API Tutorial ==
== API Tutorial ==


We have developed a proof of concept, experimental build of Firefox ([[#Obtaining_Code_and_Builds|builds provided below]]) which extends the HTMLMediaElement (e.g., affecting <video> and <audio>) and HTMLAudioElement, and implements the following basic API for reading and writing raw audio data:
This API extends the HTMLMediaElement and HTMLAudioElement (e.g., affecting <video> and <audio>), and implements the following basic API for reading and writing raw audio data:


===== Reading Audio =====
===== Reading Audio =====


Audio data is made available via an event-based API.  As the audio is played, and therefore decoded, sample data is passed to content scripts in a framebuffer for processing after becoming available to the audio layer--hence the name, '''AudioAvailable'''.  These samples may or may not have been played yet at the time of the event.  The audio samples returned in the event are raw, and have not been adjusted for mute/volume settings on the media element.  Playing and pausing the audio also affect the streaming of this raw audio data.
Audio data is made available via an event-based API.  As the audio is played, and therefore decoded, sample data is passed to content scripts in a framebuffer for processing after becoming available to the audio layer--hence the name, '''AudioAvailable'''.  These samples may or may not have been played yet at the time of the event.  The audio samples returned in the event are raw, and have not been adjusted for mute/volume settings on the media element.  Playing, pausing, and seeking the audio also affect the streaming of this raw audio data.


Users of this API can register two callbacks on the <audio> or <video> element in order to consume this data:
Users of this API can register two callbacks on the <audio> or <video> element in order to consume this data:
Line 51: Line 36:
<pre>
<pre>
<audio src="song.ogg"
<audio src="song.ogg"
       onloadedmetadata="audioInfo();"
       onloadedmetadata="audioInfo();"&gt;
      onaudioavailable="audioAvailable(event);">
</audio>
</audio>
</pre>
</pre>
Line 62: Line 46:
* mozFrameBufferLength
* mozFrameBufferLength


Prior to the '''LoadedMetadata''' event, these attributes will return 0 (zero), indicating that they are not known, or there is no audio.  These attributes indicate the '''number of channels''', audio '''sample rate per second''', and the '''default size of the framebuffer''' that will be used in '''AudioAvailable''' events.  This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data.
Prior to the '''LoadedMetadata''' event, accessing these attributes will cause an exception to be thrown, indicating that they are not known, or there is no audio.  These attributes indicate the '''number of channels''', audio '''sample rate per second''', and the '''default size of the framebuffer''' that will be used in '''MozAudioAvailable''' events.  This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data.
 
The '''MozAudioAvailable''' event provides two pieces of data.  The first is a framebuffer (i.e., an array) containing decoded audio sample data (i.e., floats).  The second is the time for these samples measured from the start in seconds.  Web developers consume this event by registering an event listener in script like so:
 
<pre>
&lt;audio id="audio" src="song.ogg"&gt;&lt;/audio&gt;
&lt;script&gt;
  var audio = document.getElementById("audio");
  audio.addEventListener('MozAudioAvailable', someFunction, false);
&lt;/script&gt;
</pre>
 
An audio or video element can also be created with script outside the DOM:


The '''AudioAvailable''' event provides two pieces of data. The first is a framebuffer (i.e., an array) containing decoded audio sample data (i.e., floats).  The second is the time for these samples measured from the start in seconds.
<pre>
var audio = new Audio();
audio.src = "song.ogg";
audio.addEventListener('MozAudioAvailable', someFunction, false);
audio.play();
</pre>


The following is an example of how both events might be used:
The following is an example of how both events might be used:
Line 84: Line 85:


function audioAvailable(event) {
function audioAvailable(event) {
   var samples = event.mozFrameBuffer;
   var samples = event.frameBuffer;
   var time    = event.mozTime;
   var time    = event.time;


   for (var i = 0; i < frameBufferLength; i++) {
   for (var i = 0; i < frameBufferLength; i++) {
Line 111: Line 112:
           controls="true"
           controls="true"
           onloadedmetadata="loadedMetadata();"
           onloadedmetadata="loadedMetadata();"
          onaudioavailable="audioAvailable(event);"
           style="width: 512px;">
           style="width: 512px;">
     </audio>
     </audio>
Line 119: Line 119:
       var canvas = document.getElementById('fft'),
       var canvas = document.getElementById('fft'),
           ctx = canvas.getContext('2d'),
           ctx = canvas.getContext('2d'),
          channels,
          rate,
          frameBufferLength,
           fft;
           fft;


       function loadedMetadata() {
       function loadedMetadata() {
         var audio = document.getElementById('audio-element');
         channels          = audio.mozChannels;
        var channels          = audio.mozChannels,
        rate              = audio.mozSampleRate;
            rate              = audio.mozSampleRate,
        frameBufferLength = audio.mozFrameBufferLength;
            frameBufferLength = audio.mozFrameBufferLength;
          
          
         fft = new FFT(frameBufferLength / channels, rate),
         fft = new FFT(frameBufferLength / channels, rate);
       }
       }


       function audioAvailable(event) {
       function audioAvailable(event) {
         var fb = event.mozFrameBuffer,
         var fb = event.frameBuffer,
            t  = event.time, /* unused, but it's there */
             signal = new Float32Array(fb.length / channels),
             signal = new Float32Array(fb.length / channels),
             magnitude;
             magnitude;
Line 154: Line 157:
         }
         }
       }
       }
      var audio = document.getElementById('audio-element');
      audio.addEventListener('MozAudioAvailable', audioAvailable, false);


       // FFT from dsp.js, see below
       // FFT from dsp.js, see below
Line 194: Line 200:


         if ( bufferSize !== buffer.length ) {
         if ( bufferSize !== buffer.length ) {
           throw "Supplied buffer is not the same size as defined FFT. FFT Size: " +
           throw "Supplied buffer is not the same size as defined FFT. FFT Size: " + bufferSize + " Buffer Size: " + buffer.length;
                bufferSize + " Buffer Size: " + buffer.length;
         }
         }


Line 286: Line 291:
</pre>
</pre>


Since the '''AudioAvailable''' event and the '''mozWriteAudio()''' method both use '''Float32Array''', it is possible to take the output of one audio stream and pass it directly (or process first and then pass) to a second:
Since the '''MozAudioAvailable''' event and the '''mozWriteAudio()''' method both use '''Float32Array''', it is possible to take the output of one audio stream and pass it directly (or process first and then pass) to a second:


<pre>
<pre>
Line 292: Line 297:
       src="song.ogg"  
       src="song.ogg"  
       onloadedmetadata="loadedMetadata();"
       onloadedmetadata="loadedMetadata();"
      onaudioavailable="audioAvailable(event);"
       controls>
       controls="controls">
</audio>
</audio>
<script>
<script>
Line 312: Line 316:
   writeAudio(frameBuffer);
   writeAudio(frameBuffer);
}
}
a1.addEventListener('a1', audioAvailable, false);


function writeAudio(audio) {
function writeAudio(audio) {
Line 330: Line 335:


Audio data written using the '''mozWriteAudio()''' method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (current sample offset of hardware can be obtained with '''mozCurrentSampleOffset()'''), where a little means something on the order of 500ms of samples.  For example, if working with 2 channels at 44100 samples per second, a writing interval of 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).
Audio data written using the '''mozWriteAudio()''' method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (current sample offset of hardware can be obtained with '''mozCurrentSampleOffset()'''), where a little means something on the order of 500ms of samples.  For example, if working with 2 channels at 44100 samples per second, a writing interval of 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).
===== Other Approachs to Writing Audio =====
1) Connect Decode Thread and Worker
'''Idea:'''
Pass audio data from the decoding thread to a worker for processing.
Create a worker thread, and pass it to the HTMLMediaElement.  The decode thread gets the worker via the HTMLMediaElement, and passes messages to the worker as audio is decoded, bypassing the main thread.  The worker processes the audio, and returns, allowing audio to be modified before it is played.
'''TODO:'''
* Modify HTMLMediaElement so you can pass it a Worker as the destination for audio data.
* Allow the decoder to get a thread-safe reference to this worker, so it can be used by the decoder thread.  The worker would be handed off to the audio decoding thread as an nsIWorker interface pointer
* Have the decode thread pass a message to the worker (ideally synchronously) so that the worker can return modified audio data, which then gets played as normal.  Perhaps pass some kind of event object to the worker.  When the worker is done modifying the data, it can call a method on the event object to say "hey, put this data back in the audio decoder thread's buffers now and let the audio decoder thread proceed."
'''Issues:'''
* The worker code  was written with the assumption that the code is either running on the main thread or on the worker thread.  Teaching it about a third thread would be necessary.
* May need to add a C++ PostMessage call to the nsIWorker interface, as the current one assumes it's being called through XPConnect.
* How do we get the data back from the worker to the decode thread?
* Keep a reference to the worker in the HTMLMediaElement use cycle collection to handle lifetimes
'''Notes:'''
* DecodeAudioData function in the webm backend is what you need to decode vorbis
* nsOggReader::DecodeAudioData pushes audio decoded data onto the audio queue.  Instead of this, push to worker.  Maybe shift that out of the nsOggReader (the adding to the audio queue) and into the nsBuiltin code.
2) Introduce a Callback to Decode Thread
'''Idea:'''
Allow js to push data to the Audio Queue, whether modified data from the decoder, or generated data.  Have a callback that gets the data as soon as the decode thread is done with it, and the callback modifies it (synch) before it goes into the queue to be played.  This callback could then postMessage to a worker, which processes it on another thread, and then puts it back into the audio queue (audio thread).  The audio thread no longer queues things directly.
'''Notes:'''
* This could replace the current mozWriteAudio() method.


===== Complete Example: Creating a Web Based Tone Generator =====
===== Complete Example: Creating a Web Based Tone Generator =====
Line 444: Line 404:


* '''Event''': AudioAvailableEvent
* '''Event''': AudioAvailableEvent
* '''Event handler''': onaudioavailable
* '''Event handler''': onmozaudioavailable


The '''AudioAvailableEvent''' is defined as follows:
The '''AudioAvailableEvent''' is defined as follows:
Line 451: Line 411:
interface nsIDOMNotifyAudioAvailableEvent : nsIDOMEvent
interface nsIDOMNotifyAudioAvailableEvent : nsIDOMEvent
{
{
   // mozFrameBuffer is really a Float32Array, via dom_quickstubs
   // mozFrameBuffer is really a Float32Array
   readonly attribute nsIVariant mozFrameBuffer;
   readonly attribute jsval  frameBuffer;
   readonly attribute float     mozTime;
   readonly attribute float time;
};
};
</pre>
</pre>


The '''mozFrameBuffer''' attribute contains a typed array ('''Float32Array''') with the raw audio data (32-bit float values) obtained from decoding the audio (e.g., the raw data being sent to the audio hardware vs. encoded audio).  This is of the form <nowiki>[channel1, channel2, ..., channelN, channel1, channel2, ..., channelN, ...]</nowiki>.  All audio frames are normalized to a length of channels * 1024 by default, but could be any power of 2 between 512 and 32768 if the user has set a different length using the '''mozFrameBufferLength''' attribute.
The '''frameBuffer''' attribute contains a typed array ('''Float32Array''') with the raw audio data (32-bit float values) obtained from decoding the audio (e.g., the raw data being sent to the audio hardware vs. encoded audio).  This is of the form <nowiki>[channel1, channel2, ..., channelN, channel1, channel2, ..., channelN, ...]</nowiki>.  All audio frames are normalized to a length of channels * 1024 by default, but could be any power of 2 between 512 and 32768 if the user has set a different length using the '''mozFrameBufferLength''' attribute.


The '''mozTime''' attribute contains a float representing the time in seconds since the start.
The '''time''' attribute contains a float representing the time in seconds since the start.


===== nsIDOMHTMLMediaElement additions =====
===== nsIDOMHTMLMediaElement additions =====


Audio metadata is made available via three new attributes on the HTMLMediaElement.  By default these attributes have a value of 0 (zero), until the '''LoadedMetadata''' occurs.  Users who need this info before the audio starts playing should not use '''autoplay''', since the audio might start before a loadmetadata handler has run.
Audio metadata is made available via three new attributes on the HTMLMediaElement.  By default these attributes throw if accessed before the '''LoadedMetadata''' event occurs.  Users who need this info before the audio starts playing should not use '''autoplay''', since the audio might start before a loadmetadata handler has run.


The three new attributes are defined as follows:
The three new attributes are defined as follows:
Line 473: Line 433:
</pre>
</pre>


The '''mozChannels''' attribute contains the number of channels in the audio resource (e.g., 2).  The '''mozSampleRate''' attribute contains the number of samples per second that will be played, for example 44100.  Both are readonly.
The '''mozChannels''' attribute contains the number of channels in the audio resource (e.g., 2).  The '''mozSampleRate''' attribute contains the number of samples per second that will be played, for example 44100.  Both are read-only.


The '''mozFrameBufferLength''' attribute indicates the number of samples that will be returned in the framebuffer of each '''AudioAvailable''' event.  This number is a total for all channels, and by default is set to be the number of channels * 1024 (e.g., 2 channels * 1024 samples = 2048 total).
The '''mozFrameBufferLength''' attribute indicates the number of samples that will be returned in the framebuffer of each '''MozAudioAvailable''' event.  This number is a total for all channels, and by default is set to be the number of channels * 1024 (e.g., 2 channels * 1024 samples = 2048 total).


The '''mozFrameBufferLength''' attribute can also be set to a new value, if users want lower latency, or larger amounts of data, etc.  The size you give '''must''' be a power of 2 between 512 and 32768.  The following are all valid lengths:
The '''mozFrameBufferLength''' attribute can also be set to a new value, if users want lower latency, or larger amounts of data, etc.  The size given '''must''' be a power of 2 between 512 and 32768.  The following are all valid lengths:


* 512
* 512
Line 487: Line 447:
* 32768
* 32768


Using any other size will result in an exception being thrown.  The best time to set a new length is after the '''loadedmetadata''' event fires, when the audio info is known, but before the audio has started or '''AudioAvailable''' events begun firing.
Using any other size will result in an exception being thrown.  The best time to set a new length is after the '''loadedmetadata''' event fires, when the audio info is known, but before the audio has started or '''MozAudioAvailable''' events begun firing.


===== nsIDOMHTMLAudioElement additions =====
===== nsIDOMHTMLAudioElement additions =====
Line 499: Line 459:
</pre>
</pre>


The '''mozSetup()''' method allows an &lt;audio&gt; element to be setup for writing from script.  This method '''must''' be called before '''mozWriteAudio''' can be called, since an audio stream has to be created for the media element.  It takes three arguments:
The '''mozSetup()''' method allows an &lt;audio&gt; element to be setup for writing from script.  This method '''must''' be called before '''mozWriteAudio''' or '''mozCurrentSampleOffset''' can be called, since an audio stream has to be created for the media element.  It takes two arguments:


# '''channels''' - the number of audio channels (e.g., 2)
# '''channels''' - the number of audio channels (e.g., 2)
Line 508: Line 468:
The '''mozSetup()''' method, if called more than once, will recreate a new audio stream (destroying an existing one if present) with each call.  Thus it is safe to call this more than once, but unnecessary.  
The '''mozSetup()''' method, if called more than once, will recreate a new audio stream (destroying an existing one if present) with each call.  Thus it is safe to call this more than once, but unnecessary.  


The '''mozWriteAudio()''' method can be called after '''mozSetup()'''.  It allows audio data to be written directly from script.  It takes one argument, '''array'''.  This is a JS Array (i.e., new Array()) or a typed float array (i.e., new Float32Array()) containing the audio data (floats) you wish to write.  It must be 0 or N elements in length, where N % channels == 0, otherwise a DOM error occurs.  
The '''mozWriteAudio()''' method can be called after '''mozSetup()'''.  It allows audio data to be written directly from script.  It takes one argument, '''array'''.  This is a JS Array (i.e., new Array()) or a typed float array (i.e., new Float32Array()) containing the audio data (floats) you wish to write.  It must be 0 or N elements in length, where N % channels == 0, otherwise an exception is thrown.  


The '''mozWriteAudio()''' method returns the number of samples that were just written, which may or may not be the same as the number in '''array'''.  Only the number of samples that can be written without blocking the audio hardware will be written.  It is the responsibility of the caller to deal with any samples that don't get written in the first pass (e.g., buffer and write in the next call).
The '''mozWriteAudio()''' method returns the number of samples that were just written, which may or may not be the same as the number in '''array'''.  Only the number of samples that can be written without blocking the audio hardware will be written.  It is the responsibility of the caller to deal with any samples that don't get written in the first pass (e.g., buffer and write in the next call).
Line 515: Line 475:


All of '''mozWriteAudio()''', '''mozCurrentSampleOffset()''', and '''mozSetup()''' will throw exceptions if called out of order.  '''mozSetup()''' will also throw if a ''src'' attribute has previously been set on the audio element (i.e., you can't do both at the same time).
All of '''mozWriteAudio()''', '''mozCurrentSampleOffset()''', and '''mozSetup()''' will throw exceptions if called out of order.  '''mozSetup()''' will also throw if a ''src'' attribute has previously been set on the audio element (i.e., you can't do both at the same time).
===== Security =====
Similar to the &lt;canvas&gt; element and its '''getImageData''' method, the '''MozAudioAvailable''' event's '''frameBuffer''' attribute protects against information leakage between origins.
The '''MozAudioAvailable''' event's '''frameBuffer''' attribute will throw if the origin of audio resource does not match the document's origin.  NOTE: this will affect users who have the security.fileuri.strict_origin_policy set, and are working locally with file:/// URIs.
===== Compatibility with Audio Backends =====
The current MozAudioAvailable implementation integrates with Mozilla's decoder abstract base classes, and therefore, any audio decoder which uses these base classes automatically dispatches MozAudioAvailable events.  At the time of writing, this includes the Ogg and WebM decoders but '''not''' the Wave decoder.


== Additional Resources ==
== Additional Resources ==


A series of blog posts document the evolution and implementation of this API: http://vocamus.net/dave/?cat=25.  Another overview by Al MacDonald is available [http://weblog.bocoup.com/web-audio-all-aboard here].
A series of blog posts document the evolution and implementation of this API: http://vocamus.net/dave/?cat=25.  Another overview by Al MacDonald is available [http://weblog.bocoup.com/web-audio-all-aboard here].
=== Bug ===
The work on this API is available in Mozilla [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 bug 490705].


=== Obtaining Code and Builds ===
=== Obtaining Code and Builds ===
Line 524: Line 498:
'''Latest Try Server Builds:'''
'''Latest Try Server Builds:'''


http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/david.humphrey@senecac.on.ca-d886b29dcd0b/
http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/david.humphrey@senecac.on.ca-ecf5c7f4e806/
 
A patch is available in the [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 bug], if you would like to experiment with this API. No builds of this patch have been created.
 
'''Win7 Build with Audio-Data-API + Multi-Touch-API:'''
 
This build combines David Humprey's Audio Data API with [http://felipe.wordpress.com/2009/08/21/sneak-peak-on-multitouch-events/ Felipe Gomez's Multi-Touch API] and has been tested on the HP-TX2 and the HP-TM2.
 
http://code.bocoup.com/firefox-3.7a1pre.felipe.multitouch.win7.zip


=== JavaScript Audio Libraries ===
=== JavaScript Audio Libraries ===


* We have started work on a JavaScript library to make building audio web apps easier.  Details are [[Audio Data API JS Library|here]].
* We have started work on a JavaScript library to make building audio web apps easier.  Details are [[Audio Data API JS Library|here]].
* [http://github.com/bfirsh/dynamicaudio.js dynamicaudio.js] - An interface for writing audio with a Flash fall back for older browsers.
* [http://github.com/bfirsh/dynamicaudio.js dynamicaudio.js] - An interface for writing audio with a Flash fall back for older browsers.  ''NOTE:'' not necessarily up-to-date with this version of the API.


=== Working Audio Data Demos ===
=== Working Audio Data Demos ===
Line 549: Line 515:
** API Example: [http://code.bocoup.com/audio-data-api/examples/ambient-extraction-mixer/ Ambient Extraction Mixer]
** API Example: [http://code.bocoup.com/audio-data-api/examples/ambient-extraction-mixer/ Ambient Extraction Mixer]
** API Example: [http://code.bocoup.com/audio-data-api/examples/worker-thread-audio-processing/ Worker Thread Audio Processing]
** API Example: [http://code.bocoup.com/audio-data-api/examples/worker-thread-audio-processing/ Worker Thread Audio Processing]
* Beat Detection (also showing use of WebGL for 3D visualizations)
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor1HD.html (video [http://vimeo.com/11345262 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor2HD.html (video of older version [http://vimeo.com/11345685 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HD.html (video [http://www.youtube.com/watch?v=OxoFcyKYwr0&fmt=22 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor4HD.html (video [http://www.youtube.com/watch?v=dym4DqpJuDk&fmt=22 here])


'''NOTE:''' ''If you try to run demos created with the original API using a build that implements the new API, you may encounter [https://bugzilla.mozilla.org/show_bug.cgi?id=560212 bug 560212].  We are aware of this, as is Mozilla, and it is being investigated.''
'''NOTE:''' ''If you try to run demos created with the original API using a build that implements the new API, you may encounter [https://bugzilla.mozilla.org/show_bug.cgi?id=560212 bug 560212].  We are aware of this, as is Mozilla, and it is being investigated.''


==== Demos Needing to be Updated to New API ====
=== Demos Needing to be Updated to New API ===


* FFT visualization (calculated with js)
* FFT visualization (calculated with js)
** http://weare.buildingsky.net/processing/dsp.js/examples/fft.html
** http://weare.buildingsky.net/processing/dsp.js/examples/fft.html
* Beat Detection (also showing use of WebGL for 3D visualizations)
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor1HD-13a.html (video [http://vimeo.com/11345262 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor2HD-13a.html (video of older version [http://vimeo.com/11345685 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HD-13a.html (video [http://www.youtube.com/watch?v=OxoFcyKYwr0&fmt=22 here])
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor4HD.html (video [http://www.youtube.com/watch?v=dym4DqpJuDk&fmt=22 here])


* Writing Audio from JavaScript, Digital Signal Processing
* Writing Audio from JavaScript, Digital Signal Processing
Line 608: Line 574:
* http://news.slashdot.org/story/10/05/26/1936224/Breakthroughs-In-HTML-Audio-Via-Manipulation-With-JavaScript
* http://news.slashdot.org/story/10/05/26/1936224/Breakthroughs-In-HTML-Audio-Via-Manipulation-With-JavaScript
* http://ajaxian.com/archives/amazing-audio-api-javascript-demos
* http://ajaxian.com/archives/amazing-audio-api-javascript-demos
* http://www.webmonkey.com/2010/08/sampleplayer-makes-your-browser-sing-sans-flash/

Latest revision as of 00:27, 17 August 2010

Redirect to:

NOTE: this page is outdated, please see Audio Data API for the latest documentation.


Defining an Enhanced API for Audio (Draft Recommendation)

Abstract

The HTML5 specification introduces the <audio> and <video> media elements, and with them the opportunity to dramatically change the way we integrate media on the web. The current HTML5 media API provides ways to play and get limited information about audio and video, but gives no way to programatically access or create such media. We present a new Mozilla extension to this API, which allows web developers to read and write raw audio data.

Authors
Other Contributors
  • Thomas Saunders
  • Ted Mielczarek

API Tutorial

This API extends the HTMLMediaElement and HTMLAudioElement (e.g., affecting <video> and <audio>), and implements the following basic API for reading and writing raw audio data:

Reading Audio

Audio data is made available via an event-based API. As the audio is played, and therefore decoded, sample data is passed to content scripts in a framebuffer for processing after becoming available to the audio layer--hence the name, AudioAvailable. These samples may or may not have been played yet at the time of the event. The audio samples returned in the event are raw, and have not been adjusted for mute/volume settings on the media element. Playing, pausing, and seeking the audio also affect the streaming of this raw audio data.

Users of this API can register two callbacks on the <audio> or <video> element in order to consume this data:

<audio src="song.ogg"
       onloadedmetadata="audioInfo();">
</audio>

The LoadedMetadata event is a standard part of HTML5. It now indicates that a media element (audio or video) has useful metadata loaded, which can be accessed using three new attributes:

  • mozChannels
  • mozSampleRate
  • mozFrameBufferLength

Prior to the LoadedMetadata event, accessing these attributes will cause an exception to be thrown, indicating that they are not known, or there is no audio. These attributes indicate the number of channels, audio sample rate per second, and the default size of the framebuffer that will be used in MozAudioAvailable events. This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data.

The MozAudioAvailable event provides two pieces of data. The first is a framebuffer (i.e., an array) containing decoded audio sample data (i.e., floats). The second is the time for these samples measured from the start in seconds. Web developers consume this event by registering an event listener in script like so:

<audio id="audio" src="song.ogg"></audio>
<script>
  var audio = document.getElementById("audio");
  audio.addEventListener('MozAudioAvailable', someFunction, false);
</script>

An audio or video element can also be created with script outside the DOM:

var audio = new Audio();
audio.src = "song.ogg";
audio.addEventListener('MozAudioAvailable', someFunction, false);
audio.play();

The following is an example of how both events might be used:

var channels,
    rate,
    frameBufferLength,
    samples;

function audioInfo() {
  var audio = document.getElementById('audio');

  // After loadedmetadata event, following media element attributes are known:
  channels          = audio.mozChannels;
  rate              = audio.mozSampleRate;
  frameBufferLength = audio.mozFrameBufferLength;
}

function audioAvailable(event) {
  var samples = event.frameBuffer;
  var time    = event.time;

  for (var i = 0; i < frameBufferLength; i++) {
    // Do something with the audio data as it is played.
    processSample(samples[i], channels, rate);
  }
}
Complete Example: Visualizing Audio Spectrum

This example calculates and displays FFT spectrum data for the playing audio:

Fft.png

<!DOCTYPE html>
<html>
  <head>
    <title>JavaScript Spectrum Example</title>
  </head>
  <body>
    <audio id="audio-element"
           src="song.ogg"
           controls="true"
           onloadedmetadata="loadedMetadata();"
           style="width: 512px;">
    </audio>
    <div><canvas id="fft" width="512" height="200"></canvas></div>

    <script>
      var canvas = document.getElementById('fft'),
          ctx = canvas.getContext('2d'),
          channels,
          rate,
          frameBufferLength,
          fft;

      function loadedMetadata() {
        channels          = audio.mozChannels;
        rate              = audio.mozSampleRate;
        frameBufferLength = audio.mozFrameBufferLength;
         
        fft = new FFT(frameBufferLength / channels, rate);
      }

      function audioAvailable(event) {
        var fb = event.frameBuffer,
            t  = event.time, /* unused, but it's there */
            signal = new Float32Array(fb.length / channels),
            magnitude;

        for (var i = 0, fbl = frameBufferLength / 2; i < fbl; i++ ) {
          // Assuming interlaced stereo channels,
          // need to split and merge into a stero-mix mono signal
          signal[i] = (fb[2*i] + fb[2*i+1]) / 2;
        }

        fft.forward(signal);

        // Clear the canvas before drawing spectrum
        ctx.clearRect(0,0, canvas.width, canvas.height);

        for (var i = 0; i < fft.spectrum.length; i++ ) {
          // multiply spectrum by a zoom value
          magnitude = fft.spectrum[i] * 4000;

          // Draw rectangle bars for each frequency bin
          ctx.fillRect(i * 4, canvas.height, 3, -magnitude);
        }
      }

      var audio = document.getElementById('audio-element');
      audio.addEventListener('MozAudioAvailable', audioAvailable, false);

      // FFT from dsp.js, see below
      var FFT = function(bufferSize, sampleRate) {
        this.bufferSize   = bufferSize;
        this.sampleRate   = sampleRate;
        this.spectrum     = new Float32Array(bufferSize/2);
        this.real         = new Float32Array(bufferSize);
        this.imag         = new Float32Array(bufferSize);
        this.reverseTable = new Uint32Array(bufferSize);
        this.sinTable     = new Float32Array(bufferSize);
        this.cosTable     = new Float32Array(bufferSize);

        var limit = 1,
            bit = bufferSize >> 1;

        while ( limit < bufferSize ) {
          for ( var i = 0; i < limit; i++ ) {
            this.reverseTable[i + limit] = this.reverseTable[i] + bit;
          }

          limit = limit << 1;
          bit = bit >> 1;
        }

        for ( var i = 0; i < bufferSize; i++ ) {
          this.sinTable[i] = Math.sin(-Math.PI/i);
          this.cosTable[i] = Math.cos(-Math.PI/i);
        }
      };

      FFT.prototype.forward = function(buffer) {
        var bufferSize   = this.bufferSize,
            cosTable     = this.cosTable,
            sinTable     = this.sinTable,
            reverseTable = this.reverseTable,
            real         = this.real,
            imag         = this.imag,
            spectrum     = this.spectrum;

        if ( bufferSize !== buffer.length ) {
          throw "Supplied buffer is not the same size as defined FFT. FFT Size: " + bufferSize + " Buffer Size: " + buffer.length;
        }

        for ( var i = 0; i < bufferSize; i++ ) {
          real[i] = buffer[reverseTable[i]];
          imag[i] = 0;
        }

        var halfSize = 1,
            phaseShiftStepReal,	
            phaseShiftStepImag,
            currentPhaseShiftReal,
            currentPhaseShiftImag,
            off,
            tr,
            ti,
            tmpReal,	
            i;

        while ( halfSize < bufferSize ) {
          phaseShiftStepReal = cosTable[halfSize];
          phaseShiftStepImag = sinTable[halfSize];
          currentPhaseShiftReal = 1.0;
          currentPhaseShiftImag = 0.0;

          for ( var fftStep = 0; fftStep < halfSize; fftStep++ ) {
            i = fftStep;

            while ( i < bufferSize ) {
              off = i + halfSize;
              tr = (currentPhaseShiftReal * real[off]) - (currentPhaseShiftImag * imag[off]);
              ti = (currentPhaseShiftReal * imag[off]) + (currentPhaseShiftImag * real[off]);

              real[off] = real[i] - tr;
              imag[off] = imag[i] - ti;
              real[i] += tr;
              imag[i] += ti;

              i += halfSize << 1;
            }

            tmpReal = currentPhaseShiftReal;
            currentPhaseShiftReal = (tmpReal * phaseShiftStepReal) - (currentPhaseShiftImag * phaseShiftStepImag);
            currentPhaseShiftImag = (tmpReal * phaseShiftStepImag) + (currentPhaseShiftImag * phaseShiftStepReal);
          }

          halfSize = halfSize << 1;
	}

        i = bufferSize/2;
        while(i--) {
          spectrum[i] = 2 * Math.sqrt(real[i] * real[i] + imag[i] * imag[i]) / bufferSize;
	}
      };
    </script>
  </body>
</html>
Writing Audio

It is also possible to setup an <audio> element for raw writing from script (i.e., without a src attribute). Content scripts can specify the audio stream's characteristics, then write audio samples using the following methods:

mozSetup(channels, sampleRate)

// Create a new audio element
var audioOutput = new Audio();
// Set up audio element with 2 channel, 44.1KHz audio stream.
audioOutput.mozSetup(2, 44100);

mozWriteAudio(buffer)

// Write samples using a JS Array
var samples = [0.242, 0.127, 0.0, -0.058, -0.242, ...];
var numberSamplesWritten = audioOutput.mozWriteAudio(samples);

// Write samples using a Typed Array
var samples = new Float32Array([0.242, 0.127, 0.0, -0.058, -0.242, ...]);
var numberSamplesWritten = audioOutput.mozWriteAudio(samples);

mozCurrentSampleOffset()

// Get current position of the underlying audio stream, measured in samples available.
var currentSampleOffset = audioOutput.mozCurrentSampleOffset();

Since the MozAudioAvailable event and the mozWriteAudio() method both use Float32Array, it is possible to take the output of one audio stream and pass it directly (or process first and then pass) to a second:

<audio id="a1" 
       src="song.ogg" 
       onloadedmetadata="loadedMetadata();"
       controls>
</audio>
<script>
var a1 = document.getElementById('a1'),
    a2 = new Audio(),
    buffer = [];

function loadedMetadata() {
  // Mute a1 audio.
  a1.volume = 0;
  // Setup a2 to be identical to a1, and play through there.
  a2.mozSetup(a1.mozChannels, a1.mozSampleRate);
}

function audioAvailable(event) {
  // Write the current framebuffer
  var frameBuffer = event.mozFrameBuffer;
  writeAudio(frameBuffer);
}
a1.addEventListener('a1', audioAvailable, false);

function writeAudio(audio) {
  // If there's buffered data, write that first
  buffer = (buffer.length === 0) ? audio :
    buffer.concat(audio);

  var written = a2.mozWriteAudio(buffer);
  // If all data wasn't written, buffer it:
  if (written < buffer.length) {
    buffer = buffer.slice(written);
  } else {
    buffer.length = 0;
  }
}
</script>

Audio data written using the mozWriteAudio() method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (current sample offset of hardware can be obtained with mozCurrentSampleOffset()), where a little means something on the order of 500ms of samples. For example, if working with 2 channels at 44100 samples per second, a writing interval of 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).

Complete Example: Creating a Web Based Tone Generator

This example creates a simple tone generator, and plays the resulting tone.

<!DOCTYPE html>
<html>
  <head>
    <title>JavaScript Audio Write Example</title>
  </head>
  <body>
    <input type="text" size="4" id="freq" value="440"><label for="hz">Hz</label>
    <button onclick="start()">play</button>
    <button onclick="stop()">stop</button>

    <script type="text/javascript">
      var sampleRate = 44100,
          portionSize = sampleRate / 10, 
          prebufferSize = sampleRate / 2,
          freq = undefined; // no sound

      var audio = new Audio();
      audio.mozSetup(1, sampleRate);
      var currentWritePosition = 0;

      function getSoundData(t, size) {
        var soundData = new Float32Array(size);
        if (freq) {
          var k = 2* Math.PI * freq / sampleRate;
          for (var i=0; i<size; i++) {
            soundData[i] = Math.sin(k * (i + t));
          }
        }
        return soundData;
      }

      function writeData() {
        while(audio.mozCurrentSampleOffset() + prebufferSize >= currentWritePosition) {
          var soundData = getSoundData(currentWritePosition, portionSize);
          audio.mozWriteAudio(soundData);
          currentWritePosition += portionSize;
        }
      }

      // initial write
      writeData(); 
      var writeInterval = Math.floor(1000 * portionSize / sampleRate);
      setInterval(writeData, writeInterval);

      function start() {
        freq = parseFloat(document.getElementById("freq").value);
      }

      function stop() {
        freq = undefined;
      }
  </script>
  </body>
</html>

DOM Implementation

nsIDOMNotifyAudioAvailableEvent

Audio data is made available via the following event:

  • Event: AudioAvailableEvent
  • Event handler: onmozaudioavailable

The AudioAvailableEvent is defined as follows:

interface nsIDOMNotifyAudioAvailableEvent : nsIDOMEvent
{
  // mozFrameBuffer is really a Float32Array
  readonly attribute jsval  frameBuffer;
  readonly attribute float  time;
};

The frameBuffer attribute contains a typed array (Float32Array) with the raw audio data (32-bit float values) obtained from decoding the audio (e.g., the raw data being sent to the audio hardware vs. encoded audio). This is of the form [channel1, channel2, ..., channelN, channel1, channel2, ..., channelN, ...]. All audio frames are normalized to a length of channels * 1024 by default, but could be any power of 2 between 512 and 32768 if the user has set a different length using the mozFrameBufferLength attribute.

The time attribute contains a float representing the time in seconds since the start.

nsIDOMHTMLMediaElement additions

Audio metadata is made available via three new attributes on the HTMLMediaElement. By default these attributes throw if accessed before the LoadedMetadata event occurs. Users who need this info before the audio starts playing should not use autoplay, since the audio might start before a loadmetadata handler has run.

The three new attributes are defined as follows:

  readonly attribute unsigned long mozChannels;
  readonly attribute unsigned long mozSampleRate;
           attribute unsigned long mozFrameBufferLength;

The mozChannels attribute contains the number of channels in the audio resource (e.g., 2). The mozSampleRate attribute contains the number of samples per second that will be played, for example 44100. Both are read-only.

The mozFrameBufferLength attribute indicates the number of samples that will be returned in the framebuffer of each MozAudioAvailable event. This number is a total for all channels, and by default is set to be the number of channels * 1024 (e.g., 2 channels * 1024 samples = 2048 total).

The mozFrameBufferLength attribute can also be set to a new value, if users want lower latency, or larger amounts of data, etc. The size given must be a power of 2 between 512 and 32768. The following are all valid lengths:

  • 512
  • 1024
  • 2048
  • 4096
  • 8192
  • 16384
  • 32768

Using any other size will result in an exception being thrown. The best time to set a new length is after the loadedmetadata event fires, when the audio info is known, but before the audio has started or MozAudioAvailable events begun firing.

nsIDOMHTMLAudioElement additions

The HTMLAudioElement has also been extended to allow write access. Audio writing is achieved by adding three new methods:

  void mozSetup(in long channels, in long rate);
  unsigned long mozWriteAudio(array); // array is Array() or Float32Array()
  unsigned long long mozCurrentSampleOffset();

The mozSetup() method allows an <audio> element to be setup for writing from script. This method must be called before mozWriteAudio or mozCurrentSampleOffset can be called, since an audio stream has to be created for the media element. It takes two arguments:

  1. channels - the number of audio channels (e.g., 2)
  2. rate - the audio's sample rate (e.g., 44100 samples per second)

The choices made for channel and rate are significant, because they determine the amount of data you must pass to mozWriteAudio(). That is, you must pass either an array with 0 elements--similar to flushing the audio stream--or enough data for each channel specified in mozSetup().

The mozSetup() method, if called more than once, will recreate a new audio stream (destroying an existing one if present) with each call. Thus it is safe to call this more than once, but unnecessary.

The mozWriteAudio() method can be called after mozSetup(). It allows audio data to be written directly from script. It takes one argument, array. This is a JS Array (i.e., new Array()) or a typed float array (i.e., new Float32Array()) containing the audio data (floats) you wish to write. It must be 0 or N elements in length, where N % channels == 0, otherwise an exception is thrown.

The mozWriteAudio() method returns the number of samples that were just written, which may or may not be the same as the number in array. Only the number of samples that can be written without blocking the audio hardware will be written. It is the responsibility of the caller to deal with any samples that don't get written in the first pass (e.g., buffer and write in the next call).

The mozCurrentSampleOffset() method can be called after mozSetup(). It returns the current position (measured in samples) of the audio stream. This is useful when determining how much data to write with mozWriteAudio().

All of mozWriteAudio(), mozCurrentSampleOffset(), and mozSetup() will throw exceptions if called out of order. mozSetup() will also throw if a src attribute has previously been set on the audio element (i.e., you can't do both at the same time).

Security

Similar to the <canvas> element and its getImageData method, the MozAudioAvailable event's frameBuffer attribute protects against information leakage between origins.

The MozAudioAvailable event's frameBuffer attribute will throw if the origin of audio resource does not match the document's origin. NOTE: this will affect users who have the security.fileuri.strict_origin_policy set, and are working locally with file:/// URIs.

Compatibility with Audio Backends

The current MozAudioAvailable implementation integrates with Mozilla's decoder abstract base classes, and therefore, any audio decoder which uses these base classes automatically dispatches MozAudioAvailable events. At the time of writing, this includes the Ogg and WebM decoders but not the Wave decoder.

Additional Resources

A series of blog posts document the evolution and implementation of this API: http://vocamus.net/dave/?cat=25. Another overview by Al MacDonald is available here.

Bug

The work on this API is available in Mozilla bug 490705.

Obtaining Code and Builds

Latest Try Server Builds:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/david.humphrey@senecac.on.ca-ecf5c7f4e806/

JavaScript Audio Libraries

  • We have started work on a JavaScript library to make building audio web apps easier. Details are here.
  • dynamicaudio.js - An interface for writing audio with a Flash fall back for older browsers. NOTE: not necessarily up-to-date with this version of the API.

Working Audio Data Demos

A number of working demos have been created, including:

NOTE: If you try to run demos created with the original API using a build that implements the new API, you may encounter bug 560212. We are aware of this, as is Mozilla, and it is being investigated.

Demos Needing to be Updated to New API

Third Party Discussions

A number of people have written about our work, including: