Platform/GFX/Gralloc: Difference between revisions

Jump to navigation Jump to search
 
(28 intermediate revisions by 5 users not shown)
Line 17: Line 17:
== How Gralloc buffers are created and refcounted (non Mozilla-specific) ==
== How Gralloc buffers are created and refcounted (non Mozilla-specific) ==


The android::GraphicBuffer class is refcounted and the undering buffer handle is refcounted, too. It is meant to be used with Android Strong Pointers (android::sp). That's why you'll see a lot of  
The android::GraphicBuffer class is refcounted and the underlying gralloc buffer is refcounted, too. It is meant to be used with Android Strong Pointers (android::sp). That's why you'll see a lot of  


   android::sp<android::GraphicBuffer>.
   android::sp<android::GraphicBuffer>.


That's the right way to hold on to a gralloc buffer in a given process. But since gralloc buffers are shared across multiple processes, and GraphicBuffer objects only exist in one process, a different type of object has to be actually shared and reference-counted across processes. That is the notion of a gralloc buffer ''handle''.
That's the right way to hold on to a gralloc buffer in a given process. But since gralloc buffers are shared across multiple processes, and GraphicBuffer objects only exist in one process, a different type of object has to be actually shared and reference-counted across processes. That is the notion of a gralloc buffer, which is referenced by a file descriptor.


So when a gralloc buffer is shared between two processes, each process has its own GraphicBuffer object with its own refcount; these are sharing the same underlying gralloc buffer ''handle''. The sharing happens by calling GraphicBuffer::flatten to serialize and GraphicBuffer::unflatten to deserialize it. GraphicBuffer::unflatten will call mBufferMapper.registerBuffer to ensure that the underlying buffer handle is refcounted correctly.
Think of the gralloc buffer as a file, and multiple file descriptors exist that refer to the same file. Just like what happens with normal files, the kernel keeps track of open file descriptors to it. To transfer a gralloc buffer across processes, you send a file descriptor over a socket using standard kernel functionality to do so.


When a GraphicBuffer's refcount goes to zero, the destructor will call free_handle which call mBufferMapper.unregisterBuffer, which will decrement the refcount of the gralloc buffer ''handle''.
So when a gralloc buffer is shared between two processes, each process has its own GraphicBuffer object with its own refcount; these are sharing the same underlying gralloc buffer (but have different filed descriptors opened for it). The sharing happens by calling GraphicBuffer::flatten to serialize and GraphicBuffer::unflatten to deserialize it. GraphicBuffer::unflatten will call mBufferMapper.registerBuffer to ensure that the underlying buffer handle is refcounted correctly.
 
When a GraphicBuffer's refcount goes to zero, the destructor will call free_handle which call mBufferMapper.unregisterBuffer, which will close the file descriptor, thus decrementing the refcount of the gralloc buffer.


The GraphicBuffer constructors take a "usage" bitfield. We should always pass HW_TEXTURE there, as we always want to use gralloc buffers as the backing surface of OpenGL textures. We also want to pass the right SW_READ_ and SW_WRITE_ flags.
The GraphicBuffer constructors take a "usage" bitfield. We should always pass HW_TEXTURE there, as we always want to use gralloc buffers as the backing surface of OpenGL textures. We also want to pass the right SW_READ_ and SW_WRITE_ flags.


The usage flag is some kind of hint for performance optimization. When you use SW flags, it may just disable all possible optimizations there. Since CPU usually cache data into registers, when we want to lock the buffer for read/write, it have to maintain the cache for correct data. However, other hardware that can use GraphicBuffer on Android e.g. Codec, Camera, GPU do not cache data. It locks/unlocks the buffer in a faster fashion.
The usage flag is a hint for performance optimization. When you use SW flags, it may just disable all possible optimizations there. Since CPU usually cache data into registers, when we want to lock the buffer for read/write, it have to maintain the cache for correct data. However, other hardware that can use GraphicBuffer on Android e.g. Codec, Camera, GPU do not cache data. It locks/unlocks the buffer in a faster fashion.


It may helps on perforamce if we can use the usage flag correctly to describe our purpose about the buffer.
It definitely helps performance if we can use the usage flag correctly to describe our purpose about the buffer.  In particular, if the SW_READ/SW_WRITE usage flags are set, the GL driver and others will make sure to flush the cache after any rendering operation so that the memory is ready for software reading or writing.  Only specify the flags that you need.


== How we allocate Gralloc buffers ==
== How we allocate Gralloc buffers ==
Line 42: Line 44:


Content side:
Content side:
* Entry point: PLayersTransactionChild::SendPGrallocBufferConstructor (generally called by ISurfaceAllocator::AllocGrallocBuffer).
* Entry point: PLayerTransactionChild::SendPGrallocBufferConstructor (generally called by ISurfaceAllocator::AllocGrallocBuffer).
* This sends a synchronous IPC message to the compositor side.
* This sends a synchronous IPC message to the compositor side.




Over to the compositor side:
Over to the compositor side:
* The message is received and this comes in as a call to PLayerTransactionParent::AllocPGrallocBuffer, implemented in LayersTransactionParent.cpp.
* The message is received and this comes in as a call to PLayerTransactionParent::AllocPGrallocBuffer, implemented in LayerTransactionParent.cpp.
* This calls GrallocBufferActor::Create(...), which actually creates the GraphicBuffer* and a GrallocBufferActor* (The GrallocBufferActor contains a sp<GraphicBuffer> that references the newly-created GraphicBuffer*).
* This calls GrallocBufferActor::Create(...), which actually creates the GraphicBuffer* and a GrallocBufferActor* (The GrallocBufferActor contains a sp<GraphicBuffer> that references the newly-created GraphicBuffer*).
* GrallocBufferActor::Create returns the GrallocBufferActor as a PGrallocBufferParent*, and the GraphicBuffer* as a MaybeMagicGrallocBufferHandle.
* GrallocBufferActor::Create returns the GrallocBufferActor as a PGrallocBufferParent*, and the GraphicBuffer* as a MaybeMagicGrallocBufferHandle.
Line 67: Line 69:
Most of our gralloc buffers are owned in this way by GrallocBufferActor's. The question then becomes, what controls the lifetime of GrallocBufferActors?
Most of our gralloc buffers are owned in this way by GrallocBufferActor's. The question then becomes, what controls the lifetime of GrallocBufferActors?


GrallocBufferActors are "managed" by IPDL-generated code. When they are created by the above-described protocol, as said above, they are added to the "managee lists" of the LayerTransactionParent on the compositor side, and of the LayerTransactionParent on the content side.
GrallocBufferActors are "managed" by IPDL-generated code. When they are created by the above-described protocol, as said above, they are added to the "managee lists" of the LayerTransactionParent on the compositor side, and of the LayerTransactionChild on the content side.


GrallocBufferActors are destroyed when either a "delete" IPC message is sent (see: Send__delete__) or the top-level IPDL manager goes away.
GrallocBufferActors are destroyed when either a "delete" IPC message is sent (see: Send__delete__) or the top-level IPDL manager goes away.
Line 84: Line 86:


How gralloc buffer locking works, varies greatly between drivers. While we only directly deal with the gralloc API, which is the same on all Android devices (android::GraphicBuffer::lock and unlock), the precise lock semantics vary between different vendor-specific lock mechanisms, so we need to pay specific attention to them.
How gralloc buffer locking works, varies greatly between drivers. While we only directly deal with the gralloc API, which is the same on all Android devices (android::GraphicBuffer::lock and unlock), the precise lock semantics vary between different vendor-specific lock mechanisms, so we need to pay specific attention to them.
* On Android >= 4.2, a standardized fence mechanism is used, that should work uniformly across all drivers. We do not yet support it. B2G does not yet use Android 4.2.
* On Android >= 4.2, a standardized fence mechanism is used, that should work uniformly across all drivers. We do not yet support it. B2G does not yet use Android 4.2. These are called sync points and are discussed here [http://source.android.com/devices/graphics.html] and [https://android.googlesource.com/kernel/common/+/android-3.4/Documentation/sync.txt]. They are currently in the staging tree [https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/sync.c] and there is a similar non-android linux concept called dma-buf fences being worked on.
* On Qualcomm hardware pre-Android-4.2, a Qualcomm-specific mechanism, named Genlock, is used. We explicitly support it. More on this below.
* On Qualcomm hardware pre-Android-4.2, a Qualcomm-specific mechanism, named Genlock, is used. We explicitly support it. More on this below.
* On non-Qualcomm, pre-Android-4.2 hardware, other vendor-specific mechanisms are used, which we do not support (see e.g. {{bug|871624}}).
* On non-Qualcomm, pre-Android-4.2 hardware, other vendor-specific mechanisms are used, which we do not support (see e.g. {{bug|871624}}).
Line 98: Line 100:
** If the new write lock attempt is using the same handle to the gralloc buffer that is already locked, this will fail. This typically gives a message like "trying to upgrade a read lock to a write lock".
** If the new write lock attempt is using the same handle to the gralloc buffer that is already locked, this will fail. This typically gives a message like "trying to upgrade a read lock to a write lock".
** If the new write lock attempt is using a different handle than the one already locked, then this locking operation will wait until the existing lock is released.
** If the new write lock attempt is using a different handle than the one already locked, then this locking operation will wait until the existing lock is released.
* A write lock can be converted into a read lock.




Genlock is implemented in the kernel. The kernel GL driver is able to lock and unlock directly. Typically, it will place a read lock on any gralloc buffer that's bound to a texture it's sampling from, and unlock when it's done with that texture.
Genlock is implemented in the kernel. The kernel GL driver is able to lock and unlock directly. Typically, it will place a read lock on any gralloc buffer that's bound to a texture it's sampling from, and unlock when it's done with that texture.
A logging patch for genlock that lets you see locking across process is here: [http://people.mozilla.com/~jmuizelaar/genlock-logging-patch]


== How we lock/unlock Gralloc buffers ==
== How we lock/unlock Gralloc buffers ==
Line 108: Line 113:
When (on the content side) we want to draw in software to a gralloc buffer, we call ShadowLayerForwarder::OpenDescriptor() in ShadowLayerUtilsGralloc.cpp. This calls android::GraphicBuffer::lock(). When we're done, we call ShadowLayerForwarder::CloseDescriptor() in the same file, which calls android::GraphicBuffer::unlock().
When (on the content side) we want to draw in software to a gralloc buffer, we call ShadowLayerForwarder::OpenDescriptor() in ShadowLayerUtilsGralloc.cpp. This calls android::GraphicBuffer::lock(). When we're done, we call ShadowLayerForwarder::CloseDescriptor() in the same file, which calls android::GraphicBuffer::unlock().


This is generally done by TextureClientShmem.
This is generally done by TextureClientShmem. Indeed, there is no need for a gralloc-specific TextureClient, as on B2G, TextureClientShmem will implicitly try to get gralloc memory, and will silently fall back to generic non-gralloc shared memory if gralloc fails. The knowledge of whether the TextureClientShmem is actually gralloc or not, is stored in the underlying SurfaceDescriptor. The compositor, upon receiving it, will examine the type of the SurfaceDescriptor, and if it is a SurfaceDescriptorGralloc, it will create a GrallocTextureHostOGL (see below).
 
Thus, there is no Gralloc-specific TextureClient class --- but there is a Gralloc-specific TextureHost class.


=== Drawing from Gralloc buffers (binding to GL textures) ===
=== Drawing from Gralloc buffers (binding to GL textures) ===
Line 132: Line 139:
= How Android is using Gralloc =
= How Android is using Gralloc =


QUESTION: We should understand how Android is using Gralloc, apparently behind an abstraction named SurfaceTexture. How is this designed, and how does this offer a good abstraction of gralloc that works well with vendor-specific lock semantics such as genlock? Do we already have similar abstractions (maybe GonkNativeWindow) ?
== Some terminology: EGLSurface, ANativeWindow, etc. ==
 
'''EGLSurface''' is a portable EGL abstraction for a possibly multi-buffered render target.
 
'''ANativeWindow''' is the Android-specific abstraction for a possibly multi-buffered render target. The eglCreateWindowSurface function allows to create an EGLSurface from an ANativeWindow. There are two concrete implementations of ANativeWindow in Android: '''FramebufferNativeWindow''' and '''SurfaceTextureClient'''.
 
                        EGLSurface
                            ^                      EGL world
                            |                    opaque handles
                            |
----------------------------+---------------------------------------
                            |
                            |                      Android world
                                                    C++ classes
                        ANativeWindow
                        =============
                      abstract base class
                      /                \
                    /                  \
                    /                    \
  FramebufferNativeWindow              SurfaceTextureClient
  =======================              ====================
Directly linked to fbdev                What everybody uses
Only 1 instance system-wide
 
While ANativeWindow abstracts a possibly multi-buffered render target, the individual buffers managed by ANativeWindow are instances of '''ANativeWindowBuffer'''.
 
The concrete implementation of ANativeWindowBuffer is '''GraphicBuffer''', the class discussed above in this document.
 
== SurfaceTexture ==
 
'''SurfaceTexture''' is the server side of a client-server system, whose client side is '''SurfaceTextureClient'''. As explained above, SurfaceTextureClient is a concrete implementation of ANativeWindow.
 
The reason to use a client-server system like this is to allow producing and compositing a surface in two different processes.
 
Let us introduce two important functions that a client needs to call on its ANativeWindow: dequeueBuffer and queueBuffer
* '''dequeueBuffer''' acquires a new buffer for the client to draw to;
* '''queueBuffer''' lets the client indicate that it has finished drawing to a buffer, and queues it (e.g. for compositing).
 
 
Since eglSwapBuffers internally calls dequeueBuffer and queueBuffer, this system removes the need for manual management of GraphicBuffer's as we are currently doing in our B2G code.
 
The other benefit of this system is that most [http://en.wikipedia.org/wiki/Board_support_package BSP] vendors provide graphics profilers (e.g. Adreno Profiler from QCOM, PerfHUD ES from nVidia) which recognize the eglSwapBuffers calls as frame boundaries to collect frame-based GL information from the driver to help development and performance tuning.
 
In Android 2, there were many buffer management systems. In Android 4, all of this is unified under SurfaceTexture. This is made possible by the great flexibility of SurfaceTexture:
* SurfaceTexture supports both synchronous and asynchronous modes.
* SurfaceTexture supports generic multi-buffering: it can have any number of buffers between 1 and 32.
 
 
Examples:
* The Codec/Camera code configures it to have several buffers (depending on hardware, for instance 9 buffers for camera preview on Unagi) and run asynchronously
* SurfaceFlinger (the Android compositor) configures it to have 2--3 buffers (depending on [http://en.wikipedia.org/wiki/Board_support_package BSP]) and run synchronously.
* Google Miracast uses it to encode OpenGL-rendered surfaces on-the-fly.
 
 
Let us now describe how the client-server SurfaceTexture system allocates GraphicBuffer's, and how both the client and server sides keep track of these shared buffer handles. Again, SurfaceTexture is the server-side class, while SurfaceTextureClient is the client-side class. Each of them stores an array of GraphicBuffers, which is called mSlots in both classes. The GraphicBuffer objects are separate instances in SurfaceTexture and in SurfaceTextureClient, but the underlying buffer handles are the same. The mechanism here is as follows. The client side issues a SurfaceTextureClient::dequeueBuffer call to get a new buffer to paint to. If there is not already a free buffer in mSlots, and the number of buffers is under the limit (e.g. 3 for triple buffering), it sends an IPC message that results in a call to SurfaceTexture::dequeueBuffer which allocates a GraphicBuffer. After this transaction, still inside of SurfaceTextureClient::dequeueBuffer, another IPC message is sent, that results in a call to SurfaceTexture::requestBuffer to get the GraphicBuffer serialized over IPC back to it, using GraphicBuffer::flatten and GraphicBuffer::unflatten, and cache it into its own mSlots. The mSlots arrays on both sides mirror each other, so that the two sides can refer to GraphicBuffer's by index. This allows the client and server side to communicate with each other by passing only indices, without flattening/unflattening GraphicBuffers again and again.
 
Let us now describe what happens in SurfaceTextureClient::dequeueBuffer when there is no free buffer and the number of buffers has already met the limit (e.g. 3 for triple buffering). In this case, the server side (SurfaceTexture::dequeueBuffer) will wait for a buffer to be queued, and the client side waits for that, so that SurfaceTextureClient::dequeueBuffer will not return until a buffer has actually been queued on the server side. This is what allows SurfaceTexture to support both synchronous and asynchronous modes with the same API.


SurfaceTexture is a client-server architecture. The SurfaceTextureClient implement EGLNativeWindow, where application render into. When Android hardware UI are used, the SurfaceTextureClient is bound into a EGLSurface, and when applications want to present the buffer it incurs eglSwapBuffers which calls to EGLNativeWindow::queue and EGLNativeWindow::dequeue. Where EGLNativeWindow::queue cause the GraphicBuffer returned to SurfaceTexture side (server side). And EGLNativeWindow::dequeue return a new back buffer for drawing into.
Let us now explain how synchronous mode works. In synchronous mode, on the client side, inside of eglSwapBuffers, when ANativeWindow::queueBuffer is called to present the frame, it sends the index of the rendered buffer to the server side. This causes the index to be queued into SurfaceTexture::mQueue for rendering. SurfaceTexture::mQueue is a wait queue for frames that want to be rendered. In synchronous mode, all the frames are shown one after another, whereas in asynchronous mode frames may be dropped.


For performance, the client and server side do not pass GraphicBuffer each time queue/dequeue incurred in fact. They refer to the same buffer queue, and just communicate with each other by buffer index.
Let us now explain how SurfaceFlinger (the Android compositor) uses this system to get images onto the screen. Each time SurfaceTexture::queueBuffer is called, it causes SurfaceFlinger to start the next iteration of the render loop. In each iteration, SurfaceFlinger calls SurfaceTexture::updateTexImage to dequeue a frame from SurfaceTexture::mQueue and bind that GraphicBuffer into a texture, just like we do in GrallocTextureHostOGL::Lock.


In Android SurfaceFlinger, the GraphicBuffer bind into GPU is not unlocked explicitly in fact. Since SurfaceTexture runs in sync mode for SurfaceFlinger, the buffer are queued into a queue until it got rendered. When rendering, the SurfaceTexture::updateTexImage are called to update the GraphicBuffer undering the SurfaceTexture. After rendering, the buffer are not used again until next SurfaceTexture::updateTexImage called. Which means Android do not force GPU unlock the buffer, just lock another buffer and return the old buffer until the new buffer comes. (It use fence object to make sure GPU are done with the buffer, and just call glFinish after fence object creation)
The magic part is that SurfaceFlinger does not need to do the equivalent of GrallocTextureHostOGL::Unlock. In our case, we have a separate OpenGL texture object for each TextureHost, which typically (at least in the case of a ContentHost) represent one buffer each (so a double-buffered ContentHost has two TextureHost's). So we have to unbind the GraphicBuffer from the OpenGL texture before we can hand it back to the content side --- otherwise it would remain locked for read and couldn't be locked for write for content drawing. By contrast, SurfaceFlinger does not need to worry about this because it uses only one OpenGL texture, so that when it binds a new GraphicBuffer to it for compositing, that automatically unbinds the previous one!
Confirmed users
523

edits

Navigation menu