638
edits
m (→GCHeap: fmt, punct) |
(3 top-level headings) |
||
Line 1: | Line 1: | ||
MMgc is the Tamarin (nee Macromedia) Garbage Collector, a garbage collection library that has been built as part of the AVM2/Tamarin effort. It is a static library that is linked into the Flash Player but kept separate, and can be incorporated into other products. By keeping it separate, it could also theoretically be replaced in the future with a different garbage collector... we see value in making garbage control "pluggable." | MMgc is the Tamarin (nee Macromedia) Garbage Collector, a garbage collection library that has been built as part of the AVM2/Tamarin effort. It is a static library that is linked into the Flash Player but kept separate, and can be incorporated into other products. By keeping it separate, it could also theoretically be replaced in the future with a different garbage collector... we see value in making garbage control "pluggable." | ||
= Managed vs. Unmanaged Memory = | = Using MMgc = | ||
== Managed vs. Unmanaged Memory == | |||
MMgc is not only a garbage collector, but a general-purpose memory manager. In the Flash Player, we use it for nearly all memory allocations. | MMgc is not only a garbage collector, but a general-purpose memory manager. In the Flash Player, we use it for nearly all memory allocations. | ||
Line 18: | Line 20: | ||
* Managed memory is C++ operator new, with optional delete | * Managed memory is C++ operator new, with optional delete | ||
= MMgc namespace = | == MMgc namespace == | ||
The MMgc library is in the C++ namespace <tt>MMgc</tt>. | The MMgc library is in the C++ namespace <tt>MMgc</tt>. | ||
Line 31: | Line 33: | ||
GCObject* gcObject; | GCObject* gcObject; | ||
= GC class = | == GC class == | ||
The class MMgc::GC is the main class of the GC. It represents a full, self-contained instance of the garbage collector. | The class MMgc::GC is the main class of the GC. It represents a full, self-contained instance of the garbage collector. | ||
Line 41: | Line 43: | ||
There are a few methods that you may need to call directly, such as <tt>Alloc</tt> and <tt>Free</tt>. | There are a few methods that you may need to call directly, such as <tt>Alloc</tt> and <tt>Free</tt>. | ||
== GC::Alloc, GC::Free == | === GC::Alloc, GC::Free === | ||
The <tt>Alloc</tt> and <tt>Free</tt> methods are garbage-collected analogs for <tt>malloc</tt> and <tt>free</tt>. Memory allocated with <tt>Alloc</tt> doesn't need to be explicitly freed, although it can be freed with <tt>Free</tt> if it is known that there are no other references to it. | The <tt>Alloc</tt> and <tt>Free</tt> methods are garbage-collected analogs for <tt>malloc</tt> and <tt>free</tt>. Memory allocated with <tt>Alloc</tt> doesn't need to be explicitly freed, although it can be freed with <tt>Free</tt> if it is known that there are no other references to it. | ||
Line 66: | Line 68: | ||
<tt>kFinalize</tt> and <tt>kRCObject</tt> are used internally by the GC; you should not need to set them in your user code. | <tt>kFinalize</tt> and <tt>kRCObject</tt> are used internally by the GC; you should not need to set them in your user code. | ||
== GC::GetGC == | === GC::GetGC === | ||
Given the pointer to any GCObject, it is possible to get a pointer to the GC object that allocated it. | Given the pointer to any GCObject, it is possible to get a pointer to the GC object that allocated it. | ||
Line 77: | Line 79: | ||
This practice should be used sparingly, but is sometimes useful. | This practice should be used sparingly, but is sometimes useful. | ||
= Base Classes = | == Base Classes == | ||
== GCObject == | === GCObject === | ||
A basic garbage collected object. | A basic garbage collected object. | ||
Line 104: | Line 106: | ||
}; | }; | ||
=== GCObject::GetWeakRef === | ==== GCObject::GetWeakRef ==== | ||
GCWeakRef *GetWeakRef() const; | GCWeakRef *GetWeakRef() const; | ||
Line 110: | Line 112: | ||
The <tt>GetWeakRef</tt> method returns a weak reference to the object. Normally, a pointer to the object is considered a hard reference -- any such reference will prevent the object from being destroyed. Sometimes, it is desirable to hold a pointer to a GCObject, but to let the object be destroyed if there are no other references. GCWeakRef can be used for this purpose. It has a <tt>get</tt> method which returns the pointer to the original object, or <tt>NULL</tt> if that object has already been destroyed. | The <tt>GetWeakRef</tt> method returns a weak reference to the object. Normally, a pointer to the object is considered a hard reference -- any such reference will prevent the object from being destroyed. Sometimes, it is desirable to hold a pointer to a GCObject, but to let the object be destroyed if there are no other references. GCWeakRef can be used for this purpose. It has a <tt>get</tt> method which returns the pointer to the original object, or <tt>NULL</tt> if that object has already been destroyed. | ||
== GCFinalizedObject == | === GCFinalizedObject === | ||
Base class: GCObject | Base class: GCObject | ||
Line 130: | Line 132: | ||
}; | }; | ||
== RCObject == | === RCObject === | ||
Base class: GCFinalizedObject | Base class: GCFinalizedObject | ||
Line 166: | Line 168: | ||
== GCRoot == | === GCRoot === | ||
If you have a pointer to a GCObject from an object in unmanaged memory, the unmanaged object must be a subclass of GCRoot. | If you have a pointer to a GCObject from an object in unmanaged memory, the unmanaged object must be a subclass of GCRoot. | ||
Line 190: | Line 192: | ||
GCRoot(GC *gc, const void *object, size_t size); | GCRoot(GC *gc, const void *object, size_t size); | ||
= Allocating objects = | == Allocating objects == | ||
Allocating unmanaged objects is as simple as using global operator new/delete, the same way you always have. | Allocating unmanaged objects is as simple as using global operator new/delete, the same way you always have. | ||
Line 200: | Line 202: | ||
MyObject* myObject = new (gc) MyObject(); | MyObject* myObject = new (gc) MyObject(); | ||
= DWB/DRC/DRCWB = | == DWB/DRC/DRCWB == | ||
There are several smart pointer templates which must be used in your C++ code to work properly with MMgc. | There are several smart pointer templates which must be used in your C++ code to work properly with MMgc. | ||
== DWB == | === DWB === | ||
DWB stands for Declared Write Barrier. | DWB stands for Declared Write Barrier. | ||
Line 217: | Line 219: | ||
}; | }; | ||
== DRC == | === DRC === | ||
DRC stands for Deferred Reference Counted. | DRC stands for Deferred Reference Counted. | ||
Line 230: | Line 232: | ||
}; | }; | ||
== DRCWB == | === DRCWB === | ||
DRCWB stands for Deferred Reference Counted, with Write Barrier. | DRCWB stands for Deferred Reference Counted, with Write Barrier. | ||
Line 243: | Line 245: | ||
}; | }; | ||
== When are the macros not needed? == | === When are the macros not needed? === | ||
Write barriers are not needed for stack-based local variables, regardless of whether the object pointed to is GCObject, GCFinalizedObject, RCObject or RCFinalizedObject. The GC marks the entire stack during collection, and not incrementally, so write barriers aren't needed. | Write barriers are not needed for stack-based local variables, regardless of whether the object pointed to is GCObject, GCFinalizedObject, RCObject or RCFinalizedObject. The GC marks the entire stack during collection, and not incrementally, so write barriers aren't needed. | ||
Line 251: | Line 253: | ||
Write barriers are not needed for C++ objects that exist purely on the stack, and never in the heap. The Flash Player class "NativeInfo" is a good example. Such objects are essentially the same as stack-based local variables. | Write barriers are not needed for C++ objects that exist purely on the stack, and never in the heap. The Flash Player class "NativeInfo" is a good example. Such objects are essentially the same as stack-based local variables. | ||
= Zeroing RCObjects = | == Zeroing RCObjects == | ||
All RCObjects (including all subclasses) must zero themselves out completely upon destruction. This has been done and asserts enforce it. The reason is that our collector zero's memory upon free and this was hurting us performance wise, since we need to traverse objects to decrement ref counts properly upon destruction I just made destructors do zeroing too. This mostly applies to non-pointer fields as the write barrier smart pointers do this for you. | All RCObjects (including all subclasses) must zero themselves out completely upon destruction. This has been done and asserts enforce it. The reason is that our collector zero's memory upon free and this was hurting us performance wise, since we need to traverse objects to decrement ref counts properly upon destruction I just made destructors do zeroing too. This mostly applies to non-pointer fields as the write barrier smart pointers do this for you. | ||
= Poisoned Memory = | == Poisoned Memory == | ||
In DEBUG builds, MMgc writes "poison" into deallocated memory as a debugging aid. Here's what the different poison values mean: | In DEBUG builds, MMgc writes "poison" into deallocated memory as a debugging aid. Here's what the different poison values mean: | ||
Line 280: | Line 282: | ||
|} | |} | ||
= Finalizers = | == Finalizers == | ||
If your C++ class is a subclass of GCFinalizedObject or RCFinalizedObject, it has finalizer support. | If your C++ class is a subclass of GCFinalizedObject or RCFinalizedObject, it has finalizer support. | ||
Line 294: | Line 296: | ||
It's best to avoid finalizers if you can, since finalization behavior can be unpredictable and nondeterministic, and also slows down the GC since the finalizers need to be invoked. | It's best to avoid finalizers if you can, since finalization behavior can be unpredictable and nondeterministic, and also slows down the GC since the finalizers need to be invoked. | ||
= Threading = | == Threading == | ||
The GC routines are not currently thread safe, we're operating under the assumption that none of the player spawned threads create GC'd things. If this isn't true we hope to eliminate other threads from doing this and if we can't do that we will be forced to make our GC thread safe, although we hope we don't have to do that. | The GC routines are not currently thread safe, we're operating under the assumption that none of the player spawned threads create GC'd things. If this isn't true we hope to eliminate other threads from doing this and if we can't do that we will be forced to make our GC thread safe, although we hope we don't have to do that. | ||
Line 300: | Line 302: | ||
Threading gets more complicated because it makes sense to re-write ChunkMalloc and ChunkAllocBase to get their blocks from the GCHeap. They can also take advantage of the 4K boundary to eliminate the 4 byte per allocation penalty. | Threading gets more complicated because it makes sense to re-write ChunkMalloc and ChunkAllocBase to get their blocks from the GCHeap. They can also take advantage of the 4K boundary to eliminate the 4 byte per allocation penalty. | ||
= Dealing with bugs = | = Troubleshooting Mmgc = | ||
== Dealing with bugs == | |||
GC bugs are hard. | GC bugs are hard. | ||
== Forgetting a write barrier == | === Forgetting a write barrier === | ||
If you forget to put a write barrier on a pointer, the incremental mark process might miss the pointer being changed. The result will be an object that your code has a pointer to, but which the GC thinks is unreachable. The GC will destroy the object, and later you will crash with a dangling pointer. | If you forget to put a write barrier on a pointer, the incremental mark process might miss the pointer being changed. The result will be an object that your code has a pointer to, but which the GC thinks is unreachable. The GC will destroy the object, and later you will crash with a dangling pointer. | ||
Line 310: | Line 314: | ||
When you crash with what looks like a dangling pointer to a GC object, check for missing write barriers in the vicinity. | When you crash with what looks like a dangling pointer to a GC object, check for missing write barriers in the vicinity. | ||
== Forgetting a DRC == | === Forgetting a DRC === | ||
If you forget to put a DRC macro on a pointer to an RCObject from unmanaged memory, you can get a dangling pointer. The reference count of the object may go to zero, and the object will be placed in the ZCT. Later, the ZCT will be reaped and the object will be destroyed. But you still have a pointer to it. When you dereference the pointer later, you'll crash with a dangling pointer. | If you forget to put a DRC macro on a pointer to an RCObject from unmanaged memory, you can get a dangling pointer. The reference count of the object may go to zero, and the object will be placed in the ZCT. Later, the ZCT will be reaped and the object will be destroyed. But you still have a pointer to it. When you dereference the pointer later, you'll crash with a dangling pointer. | ||
Line 316: | Line 320: | ||
When you crash with what looks like a dangling pointer to a RC object, look at who refers to the object. See if there are missing DRC macros that need to be put in. | When you crash with what looks like a dangling pointer to a RC object, look at who refers to the object. See if there are missing DRC macros that need to be put in. | ||
== Wrong macro == | === Wrong macro === | ||
If you put DWB instead of DRCWB, you'll avoid dangling pointer issues from a missing write barrier, but you might hit dangling pointer issues from a zero reference count. | If you put DWB instead of DRCWB, you'll avoid dangling pointer issues from a missing write barrier, but you might hit dangling pointer issues from a zero reference count. | ||
Line 322: | Line 326: | ||
If you crash with a dangling pointer to a RC object, check for DWB macros that need to be DRCWB. | If you crash with a dangling pointer to a RC object, check for DWB macros that need to be DRCWB. | ||
== Unmarked unmanaged memory == | === Unmarked unmanaged memory === | ||
If you have pointers to GC objects in your unmanaged memory objects, the unmanaged objects need to be GCRoots. | If you have pointers to GC objects in your unmanaged memory objects, the unmanaged objects need to be GCRoots. | ||
Line 330: | Line 334: | ||
If you get a crash dereferencing a pointer to a GC object, and the pointer was a member variable in an unmanaged (non-GC) object, check whether the unmanaged object is a GCRoot. If it isn't, maybe it needs to be. | If you get a crash dereferencing a pointer to a GC object, and the pointer was a member variable in an unmanaged (non-GC) object, check whether the unmanaged object is a GCRoot. If it isn't, maybe it needs to be. | ||
== Finding missing write barriers == | === Finding missing write barriers === | ||
There are some automatic aids in the MMgc library which can help you find missing write barriers. Look in MMgc/GC.cpp. | There are some automatic aids in the MMgc library which can help you find missing write barriers. Look in MMgc/GC.cpp. | ||
Line 346: | Line 350: | ||
Sometimes, this missing write barrier detection will turn up a false positive. If you can't find anything wrong with the code, it might just be a false positive. | Sometimes, this missing write barrier detection will turn up a false positive. If you can't find anything wrong with the code, it might just be a false positive. | ||
= Debugging Aids = | == Debugging Aids == | ||
MMgc has several debugging aids that can be useful in your development work. | MMgc has several debugging aids that can be useful in your development work. | ||
== Underwrite/overwrite detection == | === Underwrite/overwrite detection === | ||
MMgc can often detect when you write outside the boundaries of an object, and will throw an assert in debugging builds when this happens. | MMgc can often detect when you write outside the boundaries of an object, and will throw an assert in debugging builds when this happens. | ||
== Leak detection (for unmanaged memory) == | === Leak detection (for unmanaged memory) === | ||
When the application is exiting, MMgc will detect memory leaks in its unmanaged memory allocators and print out the addresses and sizes of the leaked objects, and stack traces if stack traces are enabled. | When the application is exiting, MMgc will detect memory leaks in its unmanaged memory allocators and print out the addresses and sizes of the leaked objects, and stack traces if stack traces are enabled. | ||
Line 364: | Line 368: | ||
const bool enableTraces = false; | const bool enableTraces = false; | ||
== Deleted object poisoning and write detection == | === Deleted object poisoning and write detection === | ||
MMgc will "poison" memory for deleted objects, and will detect if the poison has been written over by the application, which would indicate a write to a deleted object. | MMgc will "poison" memory for deleted objects, and will detect if the poison has been written over by the application, which would indicate a write to a deleted object. | ||
== Stack traces (walk stack frame and lookup IPs) == | === Stack traces (walk stack frame and lookup IPs) === | ||
When <tt>#define MEMORY_INFO</tt> is on, MMgc will capture a stack trace for every object allocation. This slows the application down but can be invaluable when debugging. Memory leaks will be displayed with their stack trace of origin. | When <tt>#define MEMORY_INFO</tt> is on, MMgc will capture a stack trace for every object allocation. This slows the application down but can be invaluable when debugging. Memory leaks will be displayed with their stack trace of origin. | ||
Line 376: | Line 380: | ||
xmlclass.cpp:391 toplevel.cpp:164 toplevel.cpp:507 interpreter.cpp:1098 interpreter.cpp:20 methodenv.cpp:47 | xmlclass.cpp:391 toplevel.cpp:164 toplevel.cpp:507 interpreter.cpp:1098 interpreter.cpp:20 methodenv.cpp:47 | ||
== Allocation traces, deletion traces etc. == | === Allocation traces, deletion traces etc. === | ||
If you're trying to see why memory is not getting reclaimed; GC::WhosPointingAtMe() can be called from the msvc debugger and will spit out objects that are pointing to the given memory location. | If you're trying to see why memory is not getting reclaimed; GC::WhosPointingAtMe() can be called from the msvc debugger and will spit out objects that are pointing to the given memory location. | ||
== Memory Profiler == | === Memory Profiler === | ||
MMgc's memory profiler can display the state of your application's heap, showing the different classes of object in memory, along with object counts, byte counts, and percentage of total memory. It can also display stack traces for where every object was allocated. The report can be output to the console or to a file, and can be configured to be displayed pre/post sweep or via API call. | MMgc's memory profiler can display the state of your application's heap, showing the different classes of object in memory, along with object counts, byte counts, and percentage of total memory. It can also display stack traces for where every object was allocated. The report can be output to the console or to a file, and can be configured to be displayed pre/post sweep or via API call. | ||
Line 394: | Line 398: | ||
6.5% - 103 kb - 3311 items - avmcore.cpp:2300 abcparser.cpp:1077 … | 6.5% - 103 kb - 3311 items - avmcore.cpp:2300 abcparser.cpp:1077 … | ||
= Mark/Sweep = | = How MMgc works = | ||
== Mark/Sweep == | |||
The MMgc garbage collector uses a mark/sweep algorithm. This is one of the most common garbage collection algorithms. | The MMgc garbage collector uses a mark/sweep algorithm. This is one of the most common garbage collection algorithms. | ||
Line 411: | Line 417: | ||
<gflash>600 300 GC.swf</gflash> | <gflash>600 300 GC.swf</gflash> | ||
== One pass == | === One pass === | ||
The mark sweep algorithm described above decomposes into ClearMarks/Mark/Finalize/Sweep. In our original implementation ClearMarks/Finalize/Sweep visited every GC page and every object on that page. 3 passes! Now we have one pass where marks are cleared during sweep so clear marks isn't needed at the start. Also Finalize builds up a lists of pages that need sweeping so sweep doesn't need to visit every page. This has been shown to cut the Finalize/Sweep pause in half (which happens back to back atomically). Overall this wasn't a huge performance increase due to the fact the majority of our time is spent in the Mark phase. | The mark sweep algorithm described above decomposes into ClearMarks/Mark/Finalize/Sweep. In our original implementation ClearMarks/Finalize/Sweep visited every GC page and every object on that page. 3 passes! Now we have one pass where marks are cleared during sweep so clear marks isn't needed at the start. Also Finalize builds up a lists of pages that need sweeping so sweep doesn't need to visit every page. This has been shown to cut the Finalize/Sweep pause in half (which happens back to back atomically). Overall this wasn't a huge performance increase due to the fact the majority of our time is spent in the Mark phase. | ||
= Conservative Collection = | == Conservative Collection == | ||
MMgc is a conservative mark/sweep collector. Conservative means that it may not reclaim all of the memory that it is possible to reclaim; it will sometimes make a "conservative" decision and not reclaim memory that might've actually been free. | MMgc is a conservative mark/sweep collector. Conservative means that it may not reclaim all of the memory that it is possible to reclaim; it will sometimes make a "conservative" decision and not reclaim memory that might've actually been free. | ||
Line 429: | Line 435: | ||
It is possible that a future version of MMgc might do exact marking. This would be needed for a generational collector. | It is possible that a future version of MMgc might do exact marking. This would be needed for a generational collector. | ||
= Deferred Reference Counting (DRC) = | == Deferred Reference Counting (DRC) == | ||
MMgc uses Deferred Reference Counting (DRC). DRC is a scheme for getting more immediate reclamation of objects, while still achieving high performance and getting the other benefits of garbage collection. | MMgc uses Deferred Reference Counting (DRC). DRC is a scheme for getting more immediate reclamation of objects, while still achieving high performance and getting the other benefits of garbage collection. | ||
== Classic Reference Counting == | === Classic Reference Counting === | ||
Previous versions of the Flash Player, up to Flash Player 7, used reference counting to track object lifetimes. | Previous versions of the Flash Player, up to Flash Player 7, used reference counting to track object lifetimes. | ||
Line 450: | Line 456: | ||
Reference counting is a kind of automatic memory management. Reference counting can track relationships between objects, and as long as AddRef and Release are called at the proper times, can reclaim memory from objects that are no longer referenced. | Reference counting is a kind of automatic memory management. Reference counting can track relationships between objects, and as long as AddRef and Release are called at the proper times, can reclaim memory from objects that are no longer referenced. | ||
== Problem: Circular References == | === Problem: Circular References === | ||
Reference counting falls down when circular references occur in objects. If object A and object B are reference counted and refer to each other, their reference counts will both be nonzero even if no other objects in the system point to them. Locked in this embrace, they will never be destroyed. | Reference counting falls down when circular references occur in objects. If object A and object B are reference counted and refer to each other, their reference counts will both be nonzero even if no other objects in the system point to them. Locked in this embrace, they will never be destroyed. | ||
Line 464: | Line 470: | ||
However, reference counting is also slow because the reference counts need to be constantly maintained. So, it's attractive to find some form of reference counting that doesn't require maintaining reference counts for every single reference. | However, reference counting is also slow because the reference counts need to be constantly maintained. So, it's attractive to find some form of reference counting that doesn't require maintaining reference counts for every single reference. | ||
== Enter Deferred Reference Counting == | === Enter Deferred Reference Counting === | ||
In Deferred Reference Counting, a distinction is made between heap and stack references. | In Deferred Reference Counting, a distinction is made between heap and stack references. | ||
Line 474: | Line 480: | ||
We basically ignore the stack and registers. They are considered stack memory. | We basically ignore the stack and registers. They are considered stack memory. | ||
== Zero Count Table == | === Zero Count Table === | ||
Of course, when an object's reference count goes to zero, what happens? If the object was immediately destroyed, that could leave dangling pointers on the stack, since we didn't bump up the object's reference count when stack references were made to it. | Of course, when an object's reference count goes to zero, what happens? If the object was immediately destroyed, that could leave dangling pointers on the stack, since we didn't bump up the object's reference count when stack references were made to it. | ||
Line 486: | Line 492: | ||
If an object is in the ZCT, it is known that there are no heap references to it. So, there can only be stack references to it. MMgc scans the stack to see if there are any stack references to ZCT objects. Any objects in the ZCT that are NOT found on the stack are deleted. | If an object is in the ZCT, it is known that there are no heap references to it. So, there can only be stack references to it. MMgc scans the stack to see if there are any stack references to ZCT objects. Any objects in the ZCT that are NOT found on the stack are deleted. | ||
= Incremental Collection = | == Incremental Collection == | ||
The Flash Player is frequently used for animations and video that must maintain a certain framerate to play properly. Applications are also getting larger and larger and consuming more memory with scripting giving way to full fledged application component models (ala Flex). Unfortunately the flash player suffers a periodic pause (at least every 60 seconds) due to garbage collection requirements that may cause unbounded pauses (the GC pause is proportional to the amount of memory the application is using). One way to avoid this unbounded pause is to break up the work the GC needs to do into "increments". | The Flash Player is frequently used for animations and video that must maintain a certain framerate to play properly. Applications are also getting larger and larger and consuming more memory with scripting giving way to full fledged application component models (ala Flex). Unfortunately the flash player suffers a periodic pause (at least every 60 seconds) due to garbage collection requirements that may cause unbounded pauses (the GC pause is proportional to the amount of memory the application is using). One way to avoid this unbounded pause is to break up the work the GC needs to do into "increments". | ||
Line 497: | Line 503: | ||
# How much time to spend marking in each increment | # How much time to spend marking in each increment | ||
== Mark consistency == | === Mark consistency === | ||
A correct collector never deletes a live object (duh). In order to be correct we must account for a new or unmarked object being stored into an object we've already marked. In implementation terms this means a new or unmarked object is stored in an object that has already been processed by the marking algorithm and is no longer in the queue. Unless we do something we will delete this object and leave a dangling pointer to it in its referant. | A correct collector never deletes a live object (duh). In order to be correct we must account for a new or unmarked object being stored into an object we've already marked. In implementation terms this means a new or unmarked object is stored in an object that has already been processed by the marking algorithm and is no longer in the queue. Unless we do something we will delete this object and leave a dangling pointer to it in its referant. | ||
Line 530: | Line 536: | ||
1 and 2 are pretty well isolated. 1 is the SetSlot method in the AVM- and some assembly code in the AVM+. #2 can be found by examining all non-const methods of GC objects (and making all fields private, something the AVM+ code base does already). #3 is a little harder because there are a good # of GC roots. This is an unfortunate artifact of the existing code base, the AVM+ is relatively clean and its reachability graph consists of basically 2 GC roots (the AvmCore and URLStreams) but the AVM- has a bunch (currently includes SecurityCallbackData, MovieClipLoader, CameraInstance, FAPPacket, MicrophoneInstance, CSoundChannel, URLRequest, ResponceObject, URLStream and UrlStreamSecurity). In order to make things easier we could avoid WB's for #3 by marking the root set twice. The first increment pushes the root set on to the queue and when the queue is empty we process the root set again, this second root set processing should very fast since the majority of objects should already be marked and the root set is usually small (marked objects are ignored and not pushed on to the work queue). This would mean that developers would only have to take into account #2 really when writing new code. | 1 and 2 are pretty well isolated. 1 is the SetSlot method in the AVM- and some assembly code in the AVM+. #2 can be found by examining all non-const methods of GC objects (and making all fields private, something the AVM+ code base does already). #3 is a little harder because there are a good # of GC roots. This is an unfortunate artifact of the existing code base, the AVM+ is relatively clean and its reachability graph consists of basically 2 GC roots (the AvmCore and URLStreams) but the AVM- has a bunch (currently includes SecurityCallbackData, MovieClipLoader, CameraInstance, FAPPacket, MicrophoneInstance, CSoundChannel, URLRequest, ResponceObject, URLStream and UrlStreamSecurity). In order to make things easier we could avoid WB's for #3 by marking the root set twice. The first increment pushes the root set on to the queue and when the queue is empty we process the root set again, this second root set processing should very fast since the majority of objects should already be marked and the root set is usually small (marked objects are ignored and not pushed on to the work queue). This would mean that developers would only have to take into account #2 really when writing new code. | ||
== Illustration of Write Barriers == | === Illustration of Write Barriers === | ||
The following Flash animation demonstrates how a write barrier works. | The following Flash animation demonstrates how a write barrier works. | ||
Line 537: | Line 543: | ||
<gflash>600 300 GC2.swf</gflash> | <gflash>600 300 GC2.swf</gflash> | ||
== Detecting missing Write Barriers == | === Detecting missing Write Barriers === | ||
To make sure that we injected write barriers into all the right places we plan on implementing a debug mode that will search for missing write barriers. The signature of a missing write barrier is a black to white pointer that exists right before we sweep, after the sweep the pointer will point to deleted memory. Also we can check throughout the incremental mark by making sure any black -> white pointers have been recorded by the write barrier. Furthermore we can run this check even more frequently than every mark increment, for instance every time our GC memory allocators request a new block from our primary allocator (the way our extremely helpful greedy collection mode works). This would of course be slow but with good code coverage should be capable of finding all missing write barriers. Only checking for missing write barriers before every sweep will probably be a small enough performance impact to enable it in DEBUG builds. The more frequent ones will have to be turned on manually. | To make sure that we injected write barriers into all the right places we plan on implementing a debug mode that will search for missing write barriers. The signature of a missing write barrier is a black to white pointer that exists right before we sweep, after the sweep the pointer will point to deleted memory. Also we can check throughout the incremental mark by making sure any black -> white pointers have been recorded by the write barrier. Furthermore we can run this check even more frequently than every mark increment, for instance every time our GC memory allocators request a new block from our primary allocator (the way our extremely helpful greedy collection mode works). This would of course be slow but with good code coverage should be capable of finding all missing write barriers. Only checking for missing write barriers before every sweep will probably be a small enough performance impact to enable it in DEBUG builds. The more frequent ones will have to be turned on manually. | ||
== Write Barrier Implementation == | === Write Barrier Implementation === | ||
There are a couple options for implementing write barriers. At the finest level of granularity everytime a a white object is written to a black object we push the white object onto the work queue (thus making it grey). Another solution is to put the black object on the work queue, thus if multiple writes occur to the black object we only needed one push on the the queue. This could be a significant speedup if the black object was a large array getting populated with a bunch of new objects. On the other hand if the black is a huge array and only a couple slots had new objects written to them we are wasting time by marking the whole thing. | There are a couple options for implementing write barriers. At the finest level of granularity everytime a a white object is written to a black object we push the white object onto the work queue (thus making it grey). Another solution is to put the black object on the work queue, thus if multiple writes occur to the black object we only needed one push on the the queue. This could be a significant speedup if the black object was a large array getting populated with a bunch of new objects. On the other hand if the black is a huge array and only a couple slots had new objects written to them we are wasting time by marking the whole thing. | ||
A popular solution to this is what's called card marking. Here you divide memory into "cards" and when a white object is written to a black object you mark the card containing the slot the pointer to the white object was written to. After all marking is done you circle back and remark the black portion of any card that was flagged by the write barrier. There are two techniques to optimize this process. One is to save all the addresses of all pages that had cards marked (so you don't have to bring every page into memory to check its hand, so to speak). Another option is to check every page while doing the normal marking and it any of its cards where flagged handle them immediately since your already reading/writing from that page. The result will be at the end of the mark cycle fewer things need to be marked. These two optimizations can be combined. | A popular solution to this is what's called card marking. Here you divide memory into "cards" and when a white object is written to a black object you mark the card containing the slot the pointer to the white object was written to. After all marking is done you circle back and remark the black portion of any card that was flagged by the write barrier. There are two techniques to optimize this process. One is to save all the addresses of all pages that had cards marked (so you don't have to bring every page into memory to check its hand, so to speak). Another option is to check every page while doing the normal marking and it any of its cards where flagged handle them immediately since your already reading/writing from that page. The result will be at the end of the mark cycle fewer things need to be marked. These two optimizations can be combined. | ||
Increment Time Slice | Increment Time Slice | ||
Line 571: | Line 578: | ||
If we don't maintain the frame rate the movie will appear to pause and if we don't mark fast enough the mutator could get ahread of the collector and allocate memory so fast that the collection never finishes and memory grows unbounded. The ideal solution will result in only one mark incremental per frame unless the mutator is allocating memory so fast we need to mark more aggressively to get to the sweep. So the frequency of the incremental marking will be based on two factors: the rate at which we can trace memory and the rate at which the mutator is requesting more memory. Study of real world apps will be used to determine how best to factor these to rates together. | If we don't maintain the frame rate the movie will appear to pause and if we don't mark fast enough the mutator could get ahread of the collector and allocate memory so fast that the collection never finishes and memory grows unbounded. The ideal solution will result in only one mark incremental per frame unless the mutator is allocating memory so fast we need to mark more aggressively to get to the sweep. So the frequency of the incremental marking will be based on two factors: the rate at which we can trace memory and the rate at which the mutator is requesting more memory. Study of real world apps will be used to determine how best to factor these to rates together. | ||
= GCHeap = | == GCHeap == | ||
The GC library has a tiered memory allocation strategy, consisting of 3 parts: | The GC library has a tiered memory allocation strategy, consisting of 3 parts: | ||
Line 581: | Line 588: | ||
When you want to allocate something we figure out what size class it's in and then ask that allocator for the memory. Each fixed size allocator maintains a doubly linked list of 4K blocks that it obtains from the <code>GCHeap</code>. These 4K blocks are aligned on 4K boundaries so we can easily allocate everything on 8-byte boundaries (a necessary consequence of the 32-bit atom design-- 3 type bits and 29 pointer bits). Also we store the <code>GCAlloc::GCBlock</code> structure at the beginning of the 4K block so each allocation doesn't need a pointer to its block (just zero the lower 12 bits of any GC-allocated thing to get the <code>GCBlock</code> pointer). The <code>GCBlock</code> contains bitmaps for marking and indicating if an item has a destructor that needs to be called (a <code>GCFinalizedObject</code> base class exists defining a virtual destructor for GC items that need it). Deleted items are stored in a per-block free list which is used if there are any otherwise we get the next free item at the end. If we don't have anything free and we reach the end, we get another block from the Heap. | When you want to allocate something we figure out what size class it's in and then ask that allocator for the memory. Each fixed size allocator maintains a doubly linked list of 4K blocks that it obtains from the <code>GCHeap</code>. These 4K blocks are aligned on 4K boundaries so we can easily allocate everything on 8-byte boundaries (a necessary consequence of the 32-bit atom design-- 3 type bits and 29 pointer bits). Also we store the <code>GCAlloc::GCBlock</code> structure at the beginning of the 4K block so each allocation doesn't need a pointer to its block (just zero the lower 12 bits of any GC-allocated thing to get the <code>GCBlock</code> pointer). The <code>GCBlock</code> contains bitmaps for marking and indicating if an item has a destructor that needs to be called (a <code>GCFinalizedObject</code> base class exists defining a virtual destructor for GC items that need it). Deleted items are stored in a per-block free list which is used if there are any otherwise we get the next free item at the end. If we don't have anything free and we reach the end, we get another block from the Heap. | ||
== GCHeap's reserve/commit strategy == | === GCHeap's reserve/commit strategy === | ||
GCHeap reserves 16 MB of address space per heap region. The goal of reserving so much address space is so that subsequent expansions of the heap are able to obtain contiguous memory blocks. If we can keep the heap contiguous, that reduces fragmentation and the possibility of many small "Balkanized" heap regions. | GCHeap reserves 16 MB of address space per heap region. The goal of reserving so much address space is so that subsequent expansions of the heap are able to obtain contiguous memory blocks. If we can keep the heap contiguous, that reduces fragmentation and the possibility of many small "Balkanized" heap regions. | ||
Line 591: | Line 598: | ||
GCHeap serves up 4K blocks to the size class allocators or groups of continguous 4K blocks for requests from the large allocator. It maintains a free list and blocks are coalesced with their neighbors when freed. If we use up the 16 MB reserved chunk we reserve another one, contiguously with the previous if possible. | GCHeap serves up 4K blocks to the size class allocators or groups of continguous 4K blocks for requests from the large allocator. It maintains a free list and blocks are coalesced with their neighbors when freed. If we use up the 16 MB reserved chunk we reserve another one, contiguously with the previous if possible. | ||
== When memory mapping is not available == | === When memory mapping is not available === | ||
GCHeap can fall back on a malloc/free approach for obtaining memory if a memory mapping API like VirtualAlloc or mmap is not available. In this case, GCHeap will allocate exactly as much memory as is requested when the heap is expanded and not try to reserve additional memory pages to expand into. GCHeap won't attempt to allocate contiguous regions in this case. | GCHeap can fall back on a malloc/free approach for obtaining memory if a memory mapping API like VirtualAlloc or mmap is not available. In this case, GCHeap will allocate exactly as much memory as is requested when the heap is expanded and not try to reserve additional memory pages to expand into. GCHeap won't attempt to allocate contiguous regions in this case. | ||
We currently use VirtualAlloc for Windows (supported on all flavors of Windows back to 95), mmap on Mach-O and Linux. On Classic and Carbon, we do not currently use a memory mapping strategy... these implementations are calling MPAllocAligned, which can allocate 4096-byte aligned memory. We could potentially bind to the Mach-O Framework dynamically from Carbon, if the user's system is Mac OS X, and call mmap/munmap. | We currently use VirtualAlloc for Windows (supported on all flavors of Windows back to 95), mmap on Mach-O and Linux. On Classic and Carbon, we do not currently use a memory mapping strategy... these implementations are calling MPAllocAligned, which can allocate 4096-byte aligned memory. We could potentially bind to the Mach-O Framework dynamically from Carbon, if the user's system is Mac OS X, and call mmap/munmap. |
edits