Platform/GFX/Gralloc
Everything that we know, and everything that we'd like to know, about Gralloc.
What is Gralloc?
Gralloc is a type of shared memory that is also shared with the GPU. A Gralloc buffer can be written to directly by regular CPU code, but can also be used as an OpenGL texture.
Gralloc is part of Android, and is also part of B2G.
This is similar to the functionality provided by the EGL_lock_surface extension, but EGL_lock_surface is not widely supported on Android/B2G.
Gralloc buffers are represented by objects of the class android::GraphicBuffer. See ui/GraphicBuffer.h.
We only use Gralloc buffers on B2G at the moment, because the locking semantics of Gralloc buffers tend to vary a lot between GPU vendors, and on B2G we can currently at least assume that we only have to deal with Qualcomm drivers. However, this got standardized in Android 4.2. See below.
Allocation and lifetime of Gralloc buffers
How Gralloc buffers are created and refcounted (non Mozilla-specific)
The android::GraphicBuffer class is refcounted and the underlying gralloc buffer is refcounted, too. It is meant to be used with Android Strong Pointers (android::sp). That's why you'll see a lot of
android::sp<android::GraphicBuffer>.
That's the right way to hold on to a gralloc buffer in a given process. But since gralloc buffers are shared across multiple processes, and GraphicBuffer objects only exist in one process, a different type of object has to be actually shared and reference-counted across processes. That is the notion of a gralloc buffer, which is referenced by a file descriptor.
Think of the gralloc buffer as a file, and multiple file descriptors exist that refer to the same file. Just like what happens with normal files, the kernel keeps track of open file descriptors to it. To transfer a gralloc buffer across processes, you send a file descriptor over a socket using standard kernel functionality to do so.
So when a gralloc buffer is shared between two processes, each process has its own GraphicBuffer object with its own refcount; these are sharing the same underlying gralloc buffer (but have different filed descriptors opened for it). The sharing happens by calling GraphicBuffer::flatten to serialize and GraphicBuffer::unflatten to deserialize it. GraphicBuffer::unflatten will call mBufferMapper.registerBuffer to ensure that the underlying buffer handle is refcounted correctly.
When a GraphicBuffer's refcount goes to zero, the destructor will call free_handle which call mBufferMapper.unregisterBuffer, which will close the file descriptor, thus decrementing the refcount of the gralloc buffer.
The GraphicBuffer constructors take a "usage" bitfield. We should always pass HW_TEXTURE there, as we always want to use gralloc buffers as the backing surface of OpenGL textures. We also want to pass the right SW_READ_ and SW_WRITE_ flags.
The usage flag is a hint for performance optimization. When you use SW flags, it may just disable all possible optimizations there. Since CPU usually cache data into registers, when we want to lock the buffer for read/write, it have to maintain the cache for correct data. However, other hardware that can use GraphicBuffer on Android e.g. Codec, Camera, GPU do not cache data. It locks/unlocks the buffer in a faster fashion.
It definitely helps performance if we can use the usage flag correctly to describe our purpose about the buffer. In particular, if the SW_READ/SW_WRITE usage flags are set, the GL driver and others will make sure to flush the cache after any rendering operation so that the memory is ready for software reading or writing. Only specify the flags that you need.
How we allocate Gralloc buffers
Most of out GraphicBuffer's are constructed by GrallocBufferActor::Create. This unconditionally uses SW_READ_OFTEN and SW_WRITE_OFTEN, which is probably bad at least for some use cases.
Out protocol to create GraphicBuffers is as follows. It's generally the content side that wants to create a new GraphicBuffer to draw to. It sends a message to the compositor side, which creates the Gralloc buffer and returns a serialized handle to it; then back to the content side, we receive the serialized handle and construct our own Gralloc buffer instance from it.
In more detail (this is from Vlad's wiki page):
Content side:
- Entry point: PLayersTransactionChild::SendPGrallocBufferConstructor (generally called by ISurfaceAllocator::AllocGrallocBuffer).
- This sends a synchronous IPC message to the compositor side.
Over to the compositor side:
- The message is received and this comes in as a call to PLayerTransactionParent::AllocPGrallocBuffer, implemented in LayersTransactionParent.cpp.
- This calls GrallocBufferActor::Create(...), which actually creates the GraphicBuffer* and a GrallocBufferActor* (The GrallocBufferActor contains a sp<GraphicBuffer> that references the newly-created GraphicBuffer*).
- GrallocBufferActor::Create returns the GrallocBufferActor as a PGrallocBufferParent*, and the GraphicBuffer* as a MaybeMagicGrallocBufferHandle.
- The GrallocBufferActor/PGrallocBufferParent* is added to the LayerTransactionParent's managed list.
- The MaybeMagicGrallocBufferHandle is serialized for reply (sending back the fd that represents the GraphicBuffer) -- using code in ShadowLayerUtilsGralloc ParamTraits<MGBH>::Write.
Back to the content side:
- After the sync IPC call, the child receives the MaybeMagicGrallocBufferHandle, using ShadowLayerUtilsGralloc.cpp's ParamTraits<MGBH>::Read.
- Allocates empty GrallocBufferActor() to use a PGrallocBufferChild.
- Sets the previously created GrallocBufferActor/PGrallocBufferChild's mGraphicBuffer to the newly-received sp<GraphicBuffer>.
- The GrallocBufferActor/PGrallocBufferChild is added to the LayerTransactionChild's managed list.
- A SurfaceDescriptorGralloc is created using the PGrallocBufferChild, and returned to the caller.
How we manage the lifetime of Gralloc buffers
As said above, what effectively controls the lifetime of gralloc buffers is reference counting, by means of android::sp pointers.
Most of our gralloc buffers are owned in this way by GrallocBufferActor's. The question then becomes, what controls the lifetime of GrallocBufferActors?
GrallocBufferActors are "managed" by IPDL-generated code. When they are created by the above-described protocol, as said above, they are added to the "managee lists" of the LayerTransactionParent on the compositor side, and of the LayerTransactionParent on the content side.
GrallocBufferActors are destroyed when either a "delete" IPC message is sent (see: Send__delete__) or the top-level IPDL manager goes away.
Unresolved problems
We don't have a good way of passing the appropriate USAGE flags when creating gralloc buffers. In most cases, we shouldn't pass SW_READ_OFTEN. If the SyncFrontBufferToBackBuffer mechanism requires that, that's sad and we should try to fix it (by doing this copy on the GPU). In many cases, it also doesn't make sense to pass SW_WRITE_OFTEN --- that basically only makes sense for Thebes layers, and Canvas2D if not using SkiaGL, but that doesn't make any sense for WebGL, SkiaGL canvas, or video. This is getting resolved by the patch in bug 843599 (vlad)
It sucks that when content wants a new gralloc buffer to draw to, it has to wait for all the synchronous IPC work described above. Could we get async gralloc buffer creation?
Gralloc buffers locking
Gralloc buffers need to be locked before they can be accessed for either read or write. This applies both to software accesses (where we directly address gralloc buffers) and to hardware accesses made from the GL.
The lock mechanisms used by Gralloc buffers (non Mozilla-specific)
How gralloc buffer locking works, varies greatly between drivers. While we only directly deal with the gralloc API, which is the same on all Android devices (android::GraphicBuffer::lock and unlock), the precise lock semantics vary between different vendor-specific lock mechanisms, so we need to pay specific attention to them.
- On Android >= 4.2, a standardized fence mechanism is used, that should work uniformly across all drivers. We do not yet support it. B2G does not yet use Android 4.2.
- On Qualcomm hardware pre-Android-4.2, a Qualcomm-specific mechanism, named Genlock, is used. We explicitly support it. More on this below.
- On non-Qualcomm, pre-Android-4.2 hardware, other vendor-specific mechanisms are used, which we do not support (see e.g. bug 871624).
Genlock
Official genlock documentation can be found in Qualcomm kernel sources: genlock.txt.
In a nutshell, with genlock,
- Read locks are non-exclusive, reference-counted, and recursive. This means that a single caller may issue N read locks on a single gralloc buffer handle, and then issue N unlocks to release the lock.
- Write locks are completely exclusive, both with any other write lock and also with any read lock.
- The following is somewhat speculative, not firmly established (QUESTION: so is it correct?). If a buffer is already locked (for read or write) and an attempt is made to get a write lock on it, then:
- If the new write lock attempt is using the same handle to the gralloc buffer that is already locked, this will fail. This typically gives a message like "trying to upgrade a read lock to a write lock".
- If the new write lock attempt is using a different handle than the one already locked, then this locking operation will wait until the existing lock is released.
Genlock is implemented in the kernel. The kernel GL driver is able to lock and unlock directly. Typically, it will place a read lock on any gralloc buffer that's bound to a texture it's sampling from, and unlock when it's done with that texture.
How we lock/unlock Gralloc buffers
Drawing to Gralloc buffers
When (on the content side) we want to draw in software to a gralloc buffer, we call ShadowLayerForwarder::OpenDescriptor() in ShadowLayerUtilsGralloc.cpp. This calls android::GraphicBuffer::lock(). When we're done, we call ShadowLayerForwarder::CloseDescriptor() in the same file, which calls android::GraphicBuffer::unlock().
This is generally done by TextureClientShmem. Indeed, there is no need for a gralloc-specific TextureClient, as on B2G, TextureClientShmem will implicitly try to get gralloc memory, and will silently fall back to generic non-gralloc shared memory if gralloc fails. The knowledge of whether the TextureClientShmem is actually gralloc or not, is stored in the underlying SurfaceDescriptor. The compositor, upon receiving it, will examine the type of the SurfaceDescriptor, and if it is a SurfaceDescriptorGralloc, it will create a GrallocTextureHostOGL (see below).
Thus, there is no Gralloc-specific TextureClient class --- but there is a Gralloc-specific TextureHost class.
Drawing from Gralloc buffers (binding to GL textures)
When (on the compositor side) we want to draw the contents of a gralloc buffer, we have to create an EGLImage with it (see GLContextEGL::CreateEGLImageForNativeBuffer), and create a GL texture object wrapping that EGLImage (see GLContextEGL::fEGLImageTargetTexture2D).
This is generally done by GrallocTextureHostOGL.
It is worth noting that there are two levels of locking involved here.
As GrallocTextureHostOGL::Lock is called, it calls fEGLImageTargetTexture2D (as explained above) which immediately results in placing a read lock on the gralloc buffer. When a subsequent GL drawing operation occurs, sampling from that texture, that will then also place a read lock on the gralloc buffer, this time from the GL kernel driver.
It is vital that these two read locks get released as soon as possible, as we won't be able to draw again into the gralloc buffer (which requires a write lock) until then.
The read lock placed directly by fEGLImageTargetTexture2D is unlocked in GrallocTextureHostOGL::Unlock. However, we don't have a very good way to do that; see below ("Unresolved problems").
The read lock placed internally by the GL kernel driver gets released at some point after it's finished drawing; we don't know very precisely when. The following is somewhat speculative, not firmly established (QUESTION: so is it correct?): as the GL kernel driver uses a different handle than we do, its read lock doesn't cause failure of our subsequent attempts to lock the gralloc buffer for write; instead, it just causes it to wait until it's released.
Unresolved problems
We don't have a great way of un-attaching a gralloc buffer from a GL texture. What we currently do (see GrallocTextureHostOGL::Unlock) is that we issue another fEGLImageTargetTexture2D call to overwrite the attachment by a dummy "null" EGLImage. That is however known to cause performance issues at least on Peak (see bug 869696). Another approach (that happens to perform better on Peak) is to just destroy the GL texture object (and recreate it every time). But really, we should have a reliable and fast way of releasing the read locks that we are placing on gralloc buffers when we attach them to GL texture objects. QUESTION: We should understand why attaching the dummy "null" EGLImage is slow on the Peak device.
How Android is using Gralloc
SurfaceTexture
SurfaceTexture is a client-server architecture. The SurfaceTextureClient implements EGLNativeWindow, into which the application renders. When the Android hardware UI is used, the SurfaceTextureClient is bound into a EGLSurface, and when the application wants to present the buffer, it calls eglSwapBuffers which calls to EGLNativeWindow::queue and EGLNativeWindow::dequeue.
SurfaceTexture is the server-side class. SurfaceTextureClient is the client-side class. Both store an array of GraphicBuffers, called mSlots. The GraphicBuffer objects are separate instances in SurfaceTexture and in SurfaceTextureClient, but the underlying gralloc buffer handles are the same. The mechanism here is the client side issues a SurfaceTextureClient::dequeueBuffer call to get a new buffer to paint to. If there are not already enough buffers, and the number of buffers is not over the limit (32), it sends an IPC message that results in a call to SurfaceTexture::dequeueBuffer which allocates the GraphicBuffer. And after buffer allocation, the client side will send an IPC message that results in a call to SurfaceTexture::requestBuffer to get the gralloc buffer handle serialized over IPC back to it, and construct a GraphicBuffer around this handle, and cache it into its own mSlots. The mSlots arrays on both sides mirror each other, so that the two sides can refer to GraphicBuffers by index.
When the client side calls EGLNativeWindow::queue to present the frame from eglSwapBuffers call, SurfaceTextureClient::queue is issued and send the index where the rendered buffer is to server side SurfaceTexture::queue. Which cause the index queued into SurfaceTexture::mQueue for rendering. SurfaceTexture::mQueue is a wait queue for frame that wants to be rendered. In sync mode, the frame are showed one after another, but may dropped in async mode.
For SurfaceFlinger, after each time SurfaceTexture::queue are issued, it will start for next renderloop. In each render loop, SurfaceFlinger calls to SurfaceTexture::updateTexImage to dequeue a frame from SurfaceTexture::mQueue and bind the GraphicBuffer into a texture.
As a result, Android does not explicit unlock the GraphicBuffer as B2G did now. Since it will lock next buffer each time new buffer comes. And it will use eglCreateSyncKHR/eglClientWaitSyncKHR to make sure the buffer are not locked before it returns the GraphicBuffer into the buffer pool.