Platform/GFX/Gralloc: Difference between revisions
Line 99: | Line 99: | ||
** If the new write lock attempt is using a different handle than the one already locked, then this locking operation will wait until the existing lock is released. | ** If the new write lock attempt is using a different handle than the one already locked, then this locking operation will wait until the existing lock is released. | ||
Genlock is implemented in the | Genlock is implemented in the kernel. The kernel GL driver is able to lock and unlock directly. Typically, it will place a read lock on any gralloc buffer that's bound to a texture it's sampling from, and unlock when it's done with that texture. | ||
== How we lock/unlock Gralloc buffers == | == How we lock/unlock Gralloc buffers == |
Revision as of 20:19, 16 May 2013
Everything that we know, and everything that we'd like to know, about Gralloc.
What is Gralloc?
Gralloc is a type of shared memory that is also shared with the GPU. A Gralloc buffer can be written to directly by regular CPU code, but can also be used as an OpenGL texture.
Gralloc is part of Android, and is also part of B2G.
This is similar to the functionality provided by the EGL_lock_surface extension, but EGL_lock_surface is not widely supported on Android/B2G.
Gralloc buffers are represented by objects of the class android::GraphicBuffer. See ui/GraphicBuffer.h.
We only use Gralloc buffers on B2G at the moment, because the locking semantics of Gralloc buffers tend to vary a lot between GPU vendors, and on B2G we can currently at least assume that we only have to deal with Qualcomm drivers. However, this got standardized in Android 4.2. See below.
Allocation and lifetime of Gralloc buffers
How Gralloc buffers are created and refcounted (non Mozilla-specific)
The android::GraphicBuffer class is refcounted and the undering buffer handle is refcounted, too. It is meant to be used with Android Strong Pointers (android::sp). That's why you'll see a lot of
android::sp<android::GraphicBuffer>.
That's the right way to hold on to a gralloc buffer in a given process. But since gralloc buffers are shared across multiple processes, and GraphicBuffer objects only exist in one process, a different type of object has to be actually shared and reference-counted across processes. That is the notion of a gralloc buffer handle.
So when a gralloc buffer is shared between two processes, each process has its own GraphicBuffer object with its own refcount; these are sharing the same underlying gralloc buffer handle. The sharing happens by calling GraphicBuffer::flatten to serialize and GraphicBuffer::unflatten to deserialize it. GraphicBuffer::unflatten will call mBufferMapper.registerBuffer to ensure that the underlying buffer handle is refcounted correctly.
When a GraphicBuffer's refcount goes to zero, the destructor will call free_handle which call mBufferMapper.unregisterBuffer, which will decrement the refcount of the gralloc buffer handle.
The GraphicBuffer constructors take a "usage" bitfield. We should always pass HW_TEXTURE there, as we always want to use gralloc buffers as the backing surface of OpenGL textures. We also want to pass the right SW_READ_ and SW_WRITE_ flags.
The usage flag is some kind of hint for performance optimization. When you use SW flags, it may just disable all possible optimizations there. Since CPU usually cache data into registers, when we want to lock the buffer for read/write, it have to maintain the cache for correct data. However, other hardware that can use GraphicBuffer on Android e.g. Codec, Camera, GPU do not cache data. It locks/unlocks the buffer in a faster fashion.
It may helps on perforamce if we can use the usage flag correctly to describe our purpose about the buffer.
How we create Gralloc buffers
Most of out GraphicBuffer's are constructed by GrallocBufferActor::Create. This unconditionally uses SW_READ_OFTEN and SW_WRITE_OFTEN, which is probably bad at least for some use cases.
Out protocol to create GraphicBuffers is as follows. It's generally the content side that wants to create a new GraphicBuffer to draw to. It sends a message to the compositor side, which creates the Gralloc buffer and returns a serialized handle to it; then back to the content side, we receive the serialized handle and construct our own Gralloc buffer instance from it.
In more detail (this is from Vlad's wiki page):
Content side:
- Entry point: PLayersTransactionChild::SendPGrallocBufferConstructor (generally called by ISurfaceAllocator::AllocGrallocBuffer).
- This sends a synchronous IPC message to the compositor side.
Over to the compositor side:
- The message is received and this comes in as a call to PLayerTransactionParent::AllocPGrallocBuffer, implemented in LayersTransactionParent.cpp.
- This calls GrallocBufferActor::Create(...), which actually creates the GraphicBuffer* and a GrallocBufferActor* (The GrallocBufferActor contains a sp<GraphicBuffer> that references the newly-created GraphicBuffer*).
- GrallocBufferActor::Create returns the GrallocBufferActor as a PGrallocBufferParent*, and the GraphicBuffer* as a MaybeMagicGrallocBufferHandle.
- The GrallocBufferActor/PGrallocBufferParent* is added to the LayerTransactionParent's managed list.
- The MaybeMagicGrallocBufferHandle is serialized for reply (sending back the fd that represents the GraphicBuffer) -- using code in ShadowLayerUtilsGralloc ParamTraits<MGBH>::Write.
Back to the content side:
- After the sync IPC call, the child receives the MaybeMagicGrallocBufferHandle, using ShadowLayerUtilsGralloc.cpp's ParamTraits<MGBH>::Read.
- Allocates empty GrallocBufferActor() to use a PGrallocBufferChild.
- Sets the previously created GrallocBufferActor/PGrallocBufferChild's mGraphicBuffer to the newly-received sp<GraphicBuffer>.
- The GrallocBufferActor/PGrallocBufferChild is added to the LayerTransactionChild's managed list.
- A SurfaceDescriptorGralloc is created using the PGrallocBufferChild, and returned to the caller.
How we manage the lifetime of Gralloc buffers
As said above, what effectively controls the lifetime of gralloc buffers is reference counting, by means of android::sp pointers.
Most of our gralloc buffers are owned in this way by GrallocBufferActor's. The question then becomes, what controls the lifetime of GrallocBufferActors?
GrallocBufferActors are "managed" by IPDL-generated code. When they are created by the above-described protocol, as said above, they are added to the "managee lists" of the LayerTransactionParent on the compositor side, and of the LayerTransactionParent on the content side.
QUESTION: What then decides when GrallocBufferActors are destroyed and removed from the managee lists?
Unresolved problems
We don't have a good way of passing the appropriate USAGE flags when creating gralloc buffers. In most cases, we shouldn't pass SW_READ_OFTEN. If the SyncFrontBufferToBackBuffer mechanism requires that, that's sad and we should try to fix it (by doing this copy on the GPU). In many cases, it also doesn't make sense to pass SW_WRITE_OFTEN --- that basically only makes sense for Thebes layers, and Canvas2D if not using SkiaGL, but that doesn't make any sense for WebGL, SkiaGL canvas, or video.
It sucks that when content wants a new gralloc buffer to draw to, it has to wait for all the synchronous IPC work described above. Could we get async gralloc buffer creation?
Gralloc buffers locking
Gralloc buffers need to be locked before they can be accessed for either read or write. This applies both to software accesses (where we directly address gralloc buffers) and to hardware accesses made from the GL.
The lock mechanisms used by Gralloc buffers (non Mozilla-specific)
How gralloc buffer locking works, varies greatly between drivers. While we only directly deal with the gralloc API, which is the same on all Android devices (android::GraphicBuffer::lock and unlock), the precise lock semantics vary between different vendor-specific lock mechanisms, so we need to pay specific attention to them.
- On Android >= 4.2, a standardized fence mechanism is used, that should work uniformly across all drivers. We do not yet support it. B2G does not yet use Android 4.2.
- On Qualcomm hardware pre-Android-4.2, a Qualcomm-specific mechanism, named Genlock, is used. We explicitly support it. More on this below.
- On non-Qualcomm, pre-Android-4.2 hardware, other vendor-specific mechanisms are used, which we do not support (see e.g. bug 871624).
Genlock
Official genlock documentation can be found in Qualcomm kernel sources: genlock.txt.
In a nutshell, with genlock,
- Read locks are non-exclusive, reference-counted, and recursive. This means that a single caller may issue N read locks on a single gralloc buffer handle, and then issue N unlocks to release the lock.
- Write locks are completely exclusive, both with any other write lock and also with any read lock.
- The following is somewhat speculative, not firmly established (QUESTION: so is it correct?). If a buffer is already locked (for read or write) and an attempt is made to get a write lock on it, then:
- If the new write lock attempt is using the same handle to the gralloc buffer that is already locked, this will fail. This typically gives a message like "trying to upgrade a read lock to a write lock".
- If the new write lock attempt is using a different handle than the one already locked, then this locking operation will wait until the existing lock is released.
Genlock is implemented in the kernel. The kernel GL driver is able to lock and unlock directly. Typically, it will place a read lock on any gralloc buffer that's bound to a texture it's sampling from, and unlock when it's done with that texture.
How we lock/unlock Gralloc buffers
Drawing to Gralloc buffers
When (on the content side) we want to draw in software to a gralloc buffer, we call ShadowLayerForwarder::OpenDescriptor() in ShadowLayerUtilsGralloc.cpp. This calls android::GraphicBuffer::lock(). When we're done, we call ShadowLayerForwarder::CloseDescriptor() in the same file, which calls android::GraphicBuffer::unlock().
This is generally done by TextureClientShmem.
Drawing from Gralloc buffers (binding to GL textures)
When (on the compositor side) we want to draw the contents of a gralloc buffer, we have to create an EGLImage with it (see GLContextEGL::CreateEGLImageForNativeBuffer), and create a GL texture object wrapping that EGLImage (see GLContextEGL::fEGLImageTargetTexture2D).
This is generally done by GrallocTextureHostOGL.
It is worth noting that there are two levels of locking involved here.
As GrallocTextureHostOGL::Lock is called, it calls fEGLImageTargetTexture2D (as explained above) which immediately result in placing a read lock on the gralloc buffer. When a subsequent GL drawing operation occurs, sampling from that texture, it will then also place a read lock on the gralloc buffer, this time from the GL kernel driver.
It is vital that these two read locks get released as soon as possible, as we won't be able to draw again into the gralloc buffer (which requires a write lock) until then.
The read lock placed directly by fEGLImageTargetTexture2D is unlocked in GrallocTextureHostOGL::Unlock. However, we don't have a very good way to do that; see below ("Unresolved problems").
The read lock placed internally by the GL kernel driver gets released at some point after it's finished drawing; we don't know very precisely when. The following is somewhat speculative, not firmly established (QUESTION: so is it correct?): as the GL kernel driver uses a different handle than we do, its read lock doesn't cause failure of our subsequent attempts to lock the gralloc buffer for write; instead, it just causes it to wait until it's released.
Unresolved problems
We don't have a great way of un-attaching a gralloc buffer from a GL texture. What we currently do (see GrallocTextureHostOGL::Unlock) is that we issue another fEGLImageTargetTexture2D call to overwrite the attachment by a dummy "null" EGLImage. That is however known to cause performance issues at least on Peak (see bug 869696). Another approach (that happens to perform better on Peak) is to just destroy the GL texture object (and recreate it every time). But really, we should have a reliable and fast way of releasing the read locks that we are placing on gralloc buffers when we attach them to GL texture objects. QUESTION: We should understand why attaching the dummy "null" EGLImage is slow on the Peak device.
How Android is using Gralloc
QUESTION: We should understand how Android is using Gralloc, apparently behind an abstraction named SurfaceTexture. How is this designed, and how does this offer a good abstraction of gralloc that works well with vendor-specific lock semantics such as genlock? Do we already have similar abstractions (maybe GonkNativeWindow) ?
SurfaceTexture is a client-server architecture. The SurfaceTextureClient implement EGLNativeWindow, where application render into. When Android hardware UI are used, the SurfaceTextureClient is bound into a EGLSurface, and when applications want to present the buffer it incurs eglSwapBuffers which calls to EGLNativeWindow::queue and EGLNativeWindow::dequeue. Where EGLNativeWindow::queue cause the GraphicBuffer returned to SurfaceTexture side (server side). And EGLNativeWindow::dequeue return a new back buffer for drawing into.
For performance, the client and server side do not pass GraphicBuffer each time queue/dequeue incurred in fact. They refer to the same buffer queue, and just communicate with each other by buffer index.
In Android SurfaceFlinger, the GraphicBuffer bind into GPU is not unlocked explicitly in fact. Since SurfaceTexture runs in sync mode for SurfaceFlinger, the buffer are queued into a queue until it got rendered. When rendering, the SurfaceTexture::updateTexImage are called to update the GraphicBuffer undering the SurfaceTexture. After rendering, the buffer are not used again until next SurfaceTexture::updateTexImage called. Which means Android do not force GPU unlock the buffer, just lock another buffer and return the old buffer until the new buffer comes. (It use fence object to make sure GPU are done with the buffer, and just call glFinish after fence object creation)