Performance/MemShrink/DMD: Difference between revisions

Line 260: Line 260:
Setting the <tt>DMD</tt> environment variable to <tt>1</tt> gives default options.  But you can also specify non-default options by setting <tt>DMD</tt> to a whitespace separated list of <tt>--option=val</tt> entries.
Setting the <tt>DMD</tt> environment variable to <tt>1</tt> gives default options.  But you can also specify non-default options by setting <tt>DMD</tt> to a whitespace separated list of <tt>--option=val</tt> entries.


Valid options are as follows.
At the moment, you can provide two options to DMD: <tt>--sample-below=<1..n></tt> and <tt>--mode=<normal|test|stress></tt>.


* <tt>--sample-below=<1..n></tt>  This is the size below which blocks are sampled.  The default is 4093, which is a prime number, which helps avoid sampling artifacts that can happen with a rounder number like 4096.  Set it to 1 to disable sampling, but note that DMD will run substantially slower if you do so.
=== --sample-below=<1..n> ===


* <tt>--mode=<normal|test|stress></tt> This can be used to invoke "test" or "stress" mode, which are useful if you're hacking on DMD.  The default is normal mode.
By default, DMD ''samples'' blocks with a sample-below size of 4093.  You can change this by invoking Firefox with e.g. <tt>DMD="--sample-below=15"</tt>.


For example, if you invoke DMD like this:
When DMD samples allocations, it ignores some allocations in an attempt to run faster.  The chance that an allocation is recorded depends on its size.


  DMD='--mode=normal --sample-below=1' <rest-of-command>
If the sample-below size is 4093, then when a malloc allocating n >= 4093 bytes occurs, DMD records that allocation precisely.


DMD will run in normal mode with sampling off.
If on the other hand a malloc allocating n < 4093 bytes occurs, DMD may or may not record that allocation at all. 
 
When the allocation occurs, DMD increments a global counter by n.  If the counter's new value is less than 4093 bytes, we ignore the allocation.  If the counter's new value is greater than 4093, we pretend as though the current stack trace just allocated a block of size 4093 bytes, and we decrement the counter by 4093.
 
In this way, we record callsites roughly in proportion to how much memory they allocate.  The idea is that if one callsite allocates many small blocks, it will cause the counter to roll over often enough that the callsite will show up in DMD and be blamed for roughly the right amount of memory usage.
 
(Note that "a malloc allocating n bytes" is not the same as <tt>malloc(n)</tt>!  DMD attributes to a malloc call the full amount of memory reserved by the allocator for that malloc(), which may be greater than n.  For example, if you malloc(1023), the allocator will probably give you a 1024-byte block.)
 
We've found that sampling in this way works well in practice, particularly since most of the pieces of memory we're trying to track down with DMD happen to be large allocations.  Sampling leads to a huge speedup in DMD's performance and a large reduction in its memory usage.
 
If you want DMD to record all allocations precisely, pass <tt>--sample-below=1</tt>.  If instead you want DMD to run faster, pass --sample-below=K for K > 4093.
 
==== Choosing your sample-below value ====
 
Although you can pass <tt>--sample-below=K</tt> for any natural number K, we've found that primes seem to work particularly well.  We chose the default as 4093 because it's the largest prime smaller than 4096.
 
There are number-theoretic reasons why prime numbers work well here.  At a high level: We want the probability that we record a malloc allocating n bytes to be n/K, independent of which allocations came before us.
 
Since whether we record a malloc is a deterministic function of n and the global counter C, this means that we want C to have as much entropy as possible.
 
But consider the case when all our allocations are for multiples of 4 bytes (that is, n % 4 == 0 for all allocs).  If K=4096, C is only ever a multiple of 4.  But on the other hand if K is a prime, it can be shown that C can take on all values 0..K-1.
 
More possible values of C means more randomness, which means better sampling.
 
=== --mode=<normal|test|stress> ===
 
<tt>--mode=<normal|test|stress></tt> can be used to invoke "test" or "stress" mode, which are useful if you're hacking on DMD.  The default is normal mode.
 
"test" and "stress" modes set their own <tt>--sample-below</tt> values, so you
should never have to specify both <tt>--sample-below</tt> and <tt>--mode</tt>.


=== Setting options on B2G device builds ===
=== Setting options on B2G device builds ===


If for some reason you want to run B2G on a device with args other than DMD=1, you'll need to modify the <tt>gonk-misc/b2g.sh</tt> script and then push it to the device.
If you want to run B2G on a device with args other than DMD=1, you'll need to modify the <tt>gonk-misc/b2g.sh</tt> script and then push it to the device.


To push the modified script, do something like
To push the modified script, do something like
187

edits