Confirmed users
656
edits
(31 intermediate revisions by the same user not shown) | |||
Line 10: | Line 10: | ||
* Corban Brook ([http://twitter.com/corban @corban]) | * Corban Brook ([http://twitter.com/corban @corban]) | ||
* Al MacDonald ([http://twitter.com/f1lt3r @F1LT3R]) | * Al MacDonald ([http://twitter.com/f1lt3r @F1LT3R]) | ||
* Yury Delendik ([http://twitter.com/ | * Yury Delendik | ||
* Ricard Marxer ([http://twitter.com/ricardmp @ricardmp]) | |||
===== Other Contributors ===== | ===== Other Contributors ===== | ||
Line 17: | Line 18: | ||
* Ted Mielczarek | * Ted Mielczarek | ||
* Felipe Gomes | * Felipe Gomes | ||
===== Status ===== | ===== Status ===== | ||
Line 23: | Line 23: | ||
'''This is a work in progress.''' This document reflects the current thinking of its authors, and is not an official specification. The original goal of this specification was to experiment with web audio data on the way to creating a more stable recommendation. The authors hoped that this work, and the ideas it generated, would eventually find their way into Mozilla and other HTML5 compatible browsers. Both of these goals are within reach now, with work ramping up in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 this Mozilla bug], and the announcement of an official [http://www.w3.org/2005/Incubator/audio/ W3C Audio Incubator Group] chaired by one of the authors. | '''This is a work in progress.''' This document reflects the current thinking of its authors, and is not an official specification. The original goal of this specification was to experiment with web audio data on the way to creating a more stable recommendation. The authors hoped that this work, and the ideas it generated, would eventually find their way into Mozilla and other HTML5 compatible browsers. Both of these goals are within reach now, with work ramping up in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 this Mozilla bug], and the announcement of an official [http://www.w3.org/2005/Incubator/audio/ W3C Audio Incubator Group] chaired by one of the authors. | ||
The continuing work on this specification and API can be tracked here, and in | The continuing work on this specification and API can be tracked here, and in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 the bug]. Comments, feedback, and collaboration are all welcome. You can reach the authors on irc in the [irc://irc.mozilla.org/audio #audio channel] on irc.mozilla.org. | ||
===== Version ===== | ===== Version ===== | ||
This is the second major version of this API (referred to by the developers as | This is the second major version of this API (referred to by the developers as audio13)--the previous version is available here. The primary improvements and changes are: | ||
* Removal of '''mozSpectrum''' (i.e., native FFT calculation) -- will be done in JS now. | * Removal of '''mozSpectrum''' (i.e., native FFT calculation) -- will be done in JS now. | ||
Line 33: | Line 33: | ||
* Native array interfaces instead of using accessors and IDL array arguments. | * Native array interfaces instead of using accessors and IDL array arguments. | ||
* No zero padding of audio data occurs anymore. All frames are exactly 4096 elements in length. | * No zero padding of audio data occurs anymore. All frames are exactly 4096 elements in length. | ||
* Added ''' | * Added '''mozCurrentSampleOffset()''' | ||
* Removed undocumented position/buffer methods on audio element. | * Removed undocumented position/buffer methods on audio element. | ||
* Added '''mozChannels''', '''mozRate''', '''mozFrameBufferLength''' to '''loadedmetadata' event. | |||
Demos written for the previous version are '''not''' compatible, though can be made to be quite easily. See details below. | Demos written for the previous version are '''not''' compatible, though can be made to be quite easily. See details below. | ||
Line 46: | Line 47: | ||
Audio data is made available via an event-based API. As the audio is played, and therefore decoded, each frame is passed to content scripts for processing after being written to the audio layer--hence the name, '''AudioWritten'''. Playing and pausing the audio all affect the streaming of this raw audio data as well. | Audio data is made available via an event-based API. As the audio is played, and therefore decoded, each frame is passed to content scripts for processing after being written to the audio layer--hence the name, '''AudioWritten'''. Playing and pausing the audio all affect the streaming of this raw audio data as well. | ||
Consumers of this raw audio data register | Consumers of this raw audio data register two callbacks on the <audio> or <video> element like in order to consume this data: | ||
<pre> | <pre> | ||
<audio src="song.ogg" onaudiowritten="audioWritten(event);"></audio> | <audio src="song.ogg" | ||
onloadedmetadata="audioInfo(event);" | |||
onaudiowritten="audioWritten(event);"> | |||
</audio> | |||
</pre> | </pre> | ||
The AudioWritten event provides two pieces of data. The first is a framebuffer (i.e., an array) containing sample data for the current frame. The second is the time (e.g., milliseconds) for the start of this frame. | The '''LoadedMetadata''' event is a standard part of HTML5, and has been extended to provide more detailed information about the audio stream. Specifically, developers can obtain the number of channels and sample rate per second of the audio. This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data. | ||
The '''AudioWritten''' event provides two pieces of data. The first is a framebuffer (i.e., an array) containing sample data for the current frame. The second is the time (e.g., milliseconds) for the start of this frame. | |||
The following is an example of how both events might be used: | |||
<pre> | <pre> | ||
var samples; | var channels, | ||
rate, | |||
frameBufferLength, | |||
samples; | |||
function audioInfo(event) { | |||
channels = event.mozChannels; | |||
rate = event.mozRate; | |||
frameBufferLength = event.mozFrameBufferLength; | |||
} | |||
function audioWritten(event) { | function audioWritten(event) { | ||
Line 62: | Line 79: | ||
for (var i=0, slen=samples.length; i<slen; i++) { | for (var i=0, slen=samples.length; i<slen; i++) { | ||
processSample(samples[i]); | // Do something with the audio data as it is played. | ||
processSample(samples[i], channels, rate); | |||
} | } | ||
} | } | ||
Line 69: | Line 87: | ||
===== Complete Example: Visualizing Audio Spectrum ===== | ===== Complete Example: Visualizing Audio Spectrum ===== | ||
This example | This example calculates and displays FFT spectrum data for the playing audio: | ||
[[File:fft.png]] | [[File:fft.png]] | ||
Line 76: | Line 94: | ||
<!DOCTYPE html> | <!DOCTYPE html> | ||
<html> | <html> | ||
<head> | <head> | ||
<title>JavaScript Spectrum Example</title> | <title>JavaScript Spectrum Example</title> | ||
</head> | </head> | ||
<body> | <body> | ||
<audio src="song.ogg" | <audio src="song.ogg" | ||
controls="true" | controls="true" | ||
onloadedmetadata="loadedMetadata(event);" | |||
onaudiowritten="audioWritten(event);" | onaudiowritten="audioWritten(event);" | ||
style="width: 512px;"> | style="width: 512px;"> | ||
</audio> | </audio> | ||
<div><canvas id="fft" width="512" height="200"></canvas></div> | <div><canvas id="fft" width="512" height="200"></canvas></div> | ||
<script> | <script> | ||
var canvas = document.getElementById('fft'), | |||
ctx = canvas.getContext('2d'), | |||
var canvas = document.getElementById('fft') | fft; | ||
function loadedMetadata(event) { | |||
var channels = event.mozChannels, | |||
rate = event.mozRate, | |||
frameBufferLength = event.mozFrameBufferLength; | |||
fft = new FFT(frameBufferLength / channels, rate), | |||
} | |||
function audioWritten(event) { | function audioWritten(event) { | ||
var fb = event.mozFrameBuffer, | |||
signal = new Float32Array(fb.length / channels), | |||
var | magnitude; | ||
for (var i = 0, fbl = fb.length / 2; i < fbl; i++ ) { | |||
// Assuming interlaced stereo channels, | |||
// need to split and merge into a stero-mix mono signal | |||
signal[i] = (fb[2*i] + fb[2*i+1]) / 2; | |||
} | |||
fft.forward(signal); | |||
// Clear the canvas before drawing spectrum | // Clear the canvas before drawing spectrum | ||
ctx.clearRect(0,0, canvas.width, canvas.height); | ctx.clearRect(0,0, canvas.width, canvas.height); | ||
for ( var i = 0; i < | for (var i = 0; i < fft.spectrum.length; i++ ) { | ||
// multiply spectrum by a zoom value | |||
magnitude = fft.spectrum[i] * 4000; | |||
// Draw rectangle bars for each frequency bin | // Draw rectangle bars for each frequency bin | ||
ctx.fillRect(i * 4, canvas.height, 3, -magnitude); | ctx.fillRect(i * 4, canvas.height, 3, -magnitude); | ||
} | } | ||
} | } | ||
// FFT from dsp.js, see below | |||
var FFT = function(bufferSize, sampleRate) { | |||
this.bufferSize = bufferSize; | |||
this.sampleRate = sampleRate; | |||
this.spectrum = new Float32Array(bufferSize/2); | |||
this.real = new Float32Array(bufferSize); | |||
this.imag = new Float32Array(bufferSize); | |||
this.reverseTable = new Uint32Array(bufferSize); | |||
this.sinTable = new Float32Array(bufferSize); | |||
this.cosTable = new Float32Array(bufferSize); | |||
var limit = 1, | |||
bit = bufferSize >> 1; | |||
while ( limit < bufferSize ) { | |||
for ( var i = 0; i < limit; i++ ) { | |||
this.reverseTable[i + limit] = this.reverseTable[i] + bit; | |||
} | |||
limit = limit << 1; | |||
bit = bit >> 1; | |||
} | |||
for ( var i = 0; i < bufferSize; i++ ) { | |||
this.sinTable[i] = Math.sin(-Math.PI/i); | |||
this.cosTable[i] = Math.cos(-Math.PI/i); | |||
} | |||
}; | |||
FFT.prototype.forward = function(buffer) { | |||
var bufferSize = this.bufferSize, | |||
cosTable = this.cosTable, | |||
sinTable = this.sinTable, | |||
reverseTable = this.reverseTable, | |||
real = this.real, | |||
imag = this.imag, | |||
spectrum = this.spectrum; | |||
if ( bufferSize !== buffer.length ) { | |||
throw "Supplied buffer is not the same size as defined FFT. FFT Size: " + | |||
bufferSize + " Buffer Size: " + buffer.length; | |||
} | |||
for ( var i = 0; i < bufferSize; i++ ) { | |||
real[i] = buffer[reverseTable[i]]; | |||
imag[i] = 0; | |||
} | |||
var halfSize = 1, | |||
phaseShiftStepReal, | |||
phaseShiftStepImag, | |||
currentPhaseShiftReal, | |||
currentPhaseShiftImag, | |||
off, | |||
tr, | |||
ti, | |||
tmpReal, | |||
i; | |||
while ( halfSize < bufferSize ) { | |||
phaseShiftStepReal = cosTable[halfSize]; | |||
phaseShiftStepImag = sinTable[halfSize]; | |||
currentPhaseShiftReal = 1.0; | |||
currentPhaseShiftImag = 0.0; | |||
for ( var fftStep = 0; fftStep < halfSize; fftStep++ ) { | |||
i = fftStep; | |||
while ( i < bufferSize ) { | |||
off = i + halfSize; | |||
tr = (currentPhaseShiftReal * real[off]) - (currentPhaseShiftImag * imag[off]); | |||
ti = (currentPhaseShiftReal * imag[off]) + (currentPhaseShiftImag * real[off]); | |||
real[off] = real[i] - tr; | |||
imag[off] = imag[i] - ti; | |||
real[i] += tr; | |||
imag[i] += ti; | |||
i += halfSize << 1; | |||
} | |||
tmpReal = currentPhaseShiftReal; | |||
currentPhaseShiftReal = (tmpReal * phaseShiftStepReal) - (currentPhaseShiftImag * phaseShiftStepImag); | |||
currentPhaseShiftImag = (tmpReal * phaseShiftStepImag) + (currentPhaseShiftImag * phaseShiftStepReal); | |||
} | |||
halfSize = halfSize << 1; | |||
} | |||
i = bufferSize/2; | |||
while(i--) { | |||
spectrum[i] = 2 * Math.sqrt(real[i] * real[i] + imag[i] * imag[i]) / bufferSize; | |||
} | |||
}; | |||
</script> | </script> | ||
</body> | </body> | ||
Line 139: | Line 268: | ||
</pre> | </pre> | ||
<code> | <code>mozCurrentSampleOffset()</code> | ||
<pre> | <pre> | ||
// Get current position of the underlying audio stream, measured in samples written. | // Get current position of the underlying audio stream, measured in samples written. | ||
var currentSampleOffset = audioOutput. | var currentSampleOffset = audioOutput.mozCurrentSampleOffset(); | ||
</pre> | </pre> | ||
Line 149: | Line 278: | ||
<pre> | <pre> | ||
<audio id="a1" | |||
src="song.ogg" | |||
onloadedmetadata="loadedMetadata(event);" | |||
onaudiowritten="audioWritten(event);" | |||
controls="controls"> | |||
</audio> | |||
<script> | |||
var a1 = document.getElementById('a1'), | |||
a2 = new Audio(), | |||
function | function loadedMetadata(event) { | ||
// Mute a1 audio. | |||
a1.volume = 0; | |||
// Setup a2 to be identical to a1, and play through there. | |||
a2.mozSetup(event.mozChannels, event.mozRate, 1); | |||
} | |||
function audioWritten(event) { | |||
// Write the current frame to a2 | |||
a2.mozWriteAudio(event.mozFrameBuffer); | |||
} | } | ||
</script> | |||
</pre> | </pre> | ||
Audio data written using the '''mozWriteAudio()''' method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (current sample offset of hardware can be obtained with '''mozCurrentSampleOffset()'''), where a little means something on the order of 500ms of samples. For example, if working with 2 channels at 44100 samples per second, and a writing interval chosen that is equal to 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2). | |||
===== Complete Example: Creating a Web Based Tone Generator ===== | ===== Complete Example: Creating a Web Based Tone Generator ===== | ||
Line 177: | Line 316: | ||
<body> | <body> | ||
<input type="text" size="4" id="freq" value="440"><label for="hz">Hz</label> | <input type="text" size="4" id="freq" value="440"><label for="hz">Hz</label> | ||
<button onclick="start()">play</button> | <button onclick="start()">play</button> | ||
<button onclick="stop()">stop</button> | <button onclick="stop()">stop</button> | ||
<script type="text/javascript"> | <script type="text/javascript"> | ||
var | var sampleRate = 44100, | ||
portionSize = sampleRate / 10, | |||
prebufferSize = sampleRate / 2, | |||
freq = undefined; // no sound | |||
var audio = new Audio(); | |||
audio.mozSetup(1, sampleRate, 1); | |||
var currentWritePosition = 0; | |||
function | function getSoundData(t, size) { | ||
var soundData = new Float32Array(size); | |||
if (freq) { | |||
var k = 2* Math.PI * freq / sampleRate; | |||
for (var i=0; i<size; i++) { | |||
soundData[i] = Math.sin(k * (i + t)); | |||
} | |||
} | |||
return soundData; | |||
} | } | ||
function | function writeData() { | ||
while(audio.mozCurrentSampleOffset() + prebufferSize >= currentWritePosition) { | |||
var soundData = getSoundData(currentWritePosition, portionSize); | |||
audio.mozWriteAudio(soundData); | |||
currentWritePosition += portionSize; | |||
} | } | ||
} | } | ||
function | // initial write | ||
writeData(); | |||
var writeInterval = Math.floor(1000 * portionSize / sampleRate); | |||
setInterval(writeData, writeInterval); | |||
function start() { | |||
freq = parseFloat(document.getElementById("freq").value); | freq = parseFloat(document.getElementById("freq").value); | ||
} | } | ||
function stop() { | |||
freq = undefined; | |||
} | |||
</script> | </script> | ||
</body> | </body> | ||
Line 224: | Line 366: | ||
== DOM Implementation == | == DOM Implementation == | ||
===== nsIDOMNotifyAudioMetadataEvent ===== | |||
Audio metadata is provided via custom properties of the media element's '''loadedmetadata''' event. This event occurs once when the browser first aquires information about the media resource. The event details are as follows: | |||
* '''Event''': LoadedMetadata | |||
* '''Event handler''': onloadedmetadata | |||
The '''LoadedMetadataEvent''' is defined as follows: | |||
<pre> | |||
interface nsIDOMNotifyAudioMetadataEvent : nsIDOMEvent | |||
{ | |||
readonly attribute unsigned long mozChannels; | |||
readonly attribute unsigned long mozRate; | |||
readonly attribute unsigned long mozFrameBufferLength; | |||
}; | |||
</pre> | |||
The '''mozChannels''' attribute contains a the number of channels in this audio resource (e.g., 2). The '''mozRate''' attribute contains the number of samples per second that will be played, for example 44100. The '''mozFrameBufferLength''' attribute contains the number of samples that will be returned in each '''AudioWritten''' event. This number is a total for all channels (e.g., 2 channels * 2048 samples = 4096 total). | |||
===== nsIDOMNotifyAudioWrittenEvent ===== | ===== nsIDOMNotifyAudioWrittenEvent ===== | ||
Line 271: | Line 433: | ||
The '''mozWriteAudio()''' method can be called after '''mozSetup()'''. It allows audio data to be written directly from script. It takes one argument: | The '''mozWriteAudio()''' method can be called after '''mozSetup()'''. It allows audio data to be written directly from script. It takes one argument: | ||
# '''array''' - this is a JS Array (e | # '''array''' - this is a JS Array (i.e., new Array()) or a typed float array (i.e., new Float32Array()) containing the audio data (floats) you wish to write. It must be 0 or N (where N % channels == 0) elements in length, otherwise a DOM error occurs. | ||
The '''mozCurrentSampleOffset()''' method can be called after '''mozSetup()'''. It returns the current position (measured in samples) of the audio stream. This is useful when determining how much data to write with '''mozWriteAudio()'''. | The '''mozCurrentSampleOffset()''' method can be called after '''mozSetup()'''. It returns the current position (measured in samples) of the audio stream. This is useful when determining how much data to write with '''mozWriteAudio()'''. | ||
Line 298: | Line 460: | ||
=== JavaScript Audio Libraries === | === JavaScript Audio Libraries === | ||
We have started work on a JavaScript library to make building audio web apps easier. Details are [[Audio Data API JS Library|here]]. | * We have started work on a JavaScript library to make building audio web apps easier. Details are [[Audio Data API JS Library|here]]. | ||
* [http://github.com/bfirsh/dynamicaudio.js dynamicaudio.js] - An interface for writing audio with a Flash fall back for older browsers. | |||
=== Working Audio Data Demos === | === Working Audio Data Demos === | ||
Line 308: | Line 471: | ||
==== Demos Working on Current API ==== | ==== Demos Working on Current API ==== | ||
* ... | * FFT visualization (calculated with js) | ||
** http://weare.buildingsky.net/processing/dsp.js/examples/fft.html | |||
* Beat Detection (also showing use of WebGL for 3D visualizations) | |||
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor1HD-13a.html (video [http://vimeo.com/11345262 here]) | |||
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor2HD-13a.html (video of older version [http://vimeo.com/11345685 here]) | |||
** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HD-13a.html (video [http://www.youtube.com/watch?v=OxoFcyKYwr0&fmt=22 here]) | |||
* Writing Audio from JavaScript, Digital Signal Processing | |||
** Csound shaker instrument ported to JavaScript via Processing.js http://scotland.proximity.on.ca/dxr/tmp/audio/shaker/ | |||
==== Demos Needing to be Updated to New API ==== | ==== Demos Needing to be Updated to New API ==== | ||
** http://weare.buildingsky.net/processing/dft.js/audio.new.html (video [http://vimeo.com/8525101 here]) | ** http://weare.buildingsky.net/processing/dft.js/audio.new.html (video [http://vimeo.com/8525101 here]) | ||
Line 322: | Line 492: | ||
** http://ondras.zarovi.cz/demos/audio/ | ** http://ondras.zarovi.cz/demos/audio/ | ||
** http://weare.buildingsky.net/processing/beat_detektor/beat_detektor.html | ** http://weare.buildingsky.net/processing/beat_detektor/beat_detektor.html | ||
** http://code.bocoup.com/processing-js/3d-fft/viz.xhtml | ** http://code.bocoup.com/processing-js/3d-fft/viz.xhtml | ||
Line 338: | Line 505: | ||
** JS Multi-Oscillator Synthesizer http://weare.buildingsky.net/processing/dsp.js/examples/synthesizer.html (video [http://vimeo.com/11411533 here]) | ** JS Multi-Oscillator Synthesizer http://weare.buildingsky.net/processing/dsp.js/examples/synthesizer.html (video [http://vimeo.com/11411533 here]) | ||
** JS IIR Filter http://weare.buildingsky.net/processing/dsp.js/examples/filter.html (video [http://vimeo.com/11335434 here]) | ** JS IIR Filter http://weare.buildingsky.net/processing/dsp.js/examples/filter.html (video [http://vimeo.com/11335434 here]) | ||
** API Example: [http://code.bocoup.com/audio-data-api/examples/inverted-waveform-cancellation Inverted Waveform Cancellation] | ** API Example: [http://code.bocoup.com/audio-data-api/examples/inverted-waveform-cancellation Inverted Waveform Cancellation] | ||
** API Example: [http://code.bocoup.com/audio-data-api/examples/stereo-splitting-and-panning Stereo Splitting and Panning] | ** API Example: [http://code.bocoup.com/audio-data-api/examples/stereo-splitting-and-panning Stereo Splitting and Panning] |