1,295
edits
Line 62: | Line 62: | ||
The way in which multichannel data is laid out within the sample buffer does not seem to be clearly specified, and there's lots of room for error there. Furthermore, if what we go with is interleaving samples into a single buffer (ABCDABCDABCD), I think we're leaving a lot of potential performance wins on the floor there. I can think of use cases where if each channel were specified as its own Float32Array, it would make it possible to efficiently turn two monaural streams into a stereo audio mix without having to manually copy samples with javascript. Likewise, if we allow a 'stride' parameter for each of those channel arrays, interleaved source data still ends up costing nothing, which is the best of both worlds. This is sort of analogous to the way binding arrays in classic OpenGL works, and I feel like it's a sane model. | The way in which multichannel data is laid out within the sample buffer does not seem to be clearly specified, and there's lots of room for error there. Furthermore, if what we go with is interleaving samples into a single buffer (ABCDABCDABCD), I think we're leaving a lot of potential performance wins on the floor there. I can think of use cases where if each channel were specified as its own Float32Array, it would make it possible to efficiently turn two monaural streams into a stereo audio mix without having to manually copy samples with javascript. Likewise, if we allow a 'stride' parameter for each of those channel arrays, interleaved source data still ends up costing nothing, which is the best of both worlds. This is sort of analogous to the way binding arrays in classic OpenGL works, and I feel like it's a sane model. | ||
''roc: I think I'll just go with non-interleaved for now. That's what Chrome's Web Audio API does.'' | ''roc: I think I'll just go with non-interleaved for now. That's what Chrome's Web Audio API does. We want to restrict the input format as much as possible to make it easier to write processing code, e.g. we don't want author processing code to have to deal with arbitrary strides.'' | ||
I don't like seeing relative times in APIs when we're talking about trying to do precisely timed mixing and streaming. StreamProcessor::end should accept a timestamp at which processing should cease, instead of a delay relative to the current time. Likewise, I think Stream::live should be a readonly attribute, and Stream should expose a setLive method that takes the new liveness state as an argument, along with a timestamp at which the liveness state should change. It'd also be nice if volume changes could work the same way, but that might be a hard sell. Explicit timing also has the large benefit that it prevents us from having to 'batch' audio changes based on some certain threshold - we can simply accept a bunch of audio change events, and apply them at the desired timestamp (give or take the skew that results from whatever interval we mix at internally). This is much less 'magical' than the batching we suggest, and it also is less likely to break mysteriously if some other browser vendor does their batching differently. | I don't like seeing relative times in APIs when we're talking about trying to do precisely timed mixing and streaming. StreamProcessor::end should accept a timestamp at which processing should cease, instead of a delay relative to the current time. Likewise, I think Stream::live should be a readonly attribute, and Stream should expose a setLive method that takes the new liveness state as an argument, along with a timestamp at which the liveness state should change. It'd also be nice if volume changes could work the same way, but that might be a hard sell. Explicit timing also has the large benefit that it prevents us from having to 'batch' audio changes based on some certain threshold - we can simply accept a bunch of audio change events, and apply them at the desired timestamp (give or take the skew that results from whatever interval we mix at internally). This is much less 'magical' than the batching we suggest, and it also is less likely to break mysteriously if some other browser vendor does their batching differently. | ||
''roc: I'll change it to use absolute times. I don't think liveness needs a time parameter since it's not something you're likely to change dynamically. The batching will still be needed though; it's not "magical", the idea that HTML5 tasks are atomic is normal for the Web platform.'' | |||
It would be nice if we could try to expose the current playback position as an attribute on a Stream, distinct from the amount of samples buffered. The 'currentTime' attribute is ambiguous as to which of the two 'current times' it actually represents, so it would be nice to either make it clear (preferably with a more precise name, but at least with documentation), or even better, expose both values. | It would be nice if we could try to expose the current playback position as an attribute on a Stream, distinct from the amount of samples buffered. The 'currentTime' attribute is ambiguous as to which of the two 'current times' it actually represents, so it would be nice to either make it clear (preferably with a more precise name, but at least with documentation), or even better, expose both values. | ||
''roc: Actually I don't want to expose either! We should have a global "media time" instead that people can use as a base for their time calculations. Maybe this should be the same as the animation time...'' | |||
In many of the examples provided, we're capturing the stream of an <audio> tag, and then our worker is presumed to spit it right out to the soundcard. I would feel happier about this API if the <audio> tag represented two components - an audio input stream (the content used as the src of the <audio>) and an audio playback stream (the samples being sent to the soundcard when the <audio> element is 'playing'). This would make it much clearer when the captured stream is actually going out to the soundcard, and when it's being consumed silently and used for some other purpose. This also makes it clearer whether the user's volume setting (exposed in the <audio> ui) will affect the audio data - the samples themselves will be unscaled by volume, but anything written to the <audio> tag's associated 'output stream' will be scaled by that tag's current volume. | In many of the examples provided, we're capturing the stream of an <audio> tag, and then our worker is presumed to spit it right out to the soundcard. I would feel happier about this API if the <audio> tag represented two components - an audio input stream (the content used as the src of the <audio>) and an audio playback stream (the samples being sent to the soundcard when the <audio> element is 'playing'). This would make it much clearer when the captured stream is actually going out to the soundcard, and when it's being consumed silently and used for some other purpose. This also makes it clearer whether the user's volume setting (exposed in the <audio> ui) will affect the audio data - the samples themselves will be unscaled by volume, but anything written to the <audio> tag's associated 'output stream' will be scaled by that tag's current volume. | ||
[[User:KevinGadd|kael]] | [[User:KevinGadd|kael]] | ||
''roc: All the examples actually play their output through an <audio> element. I think the proposal pretty much already works the way you want it to!'' |
edits