Performance/MemShrink/DMD
DMD (short for "dark matter detector") is a tool for that tracks which heap blocks have been reported by memory reporters. It's designed to help us reduce the "heap-unclassified" value in Firefox's about:memory page. It also detects if any heap blocks are reported twice.
Build
Everything other than B2G-device builds
If you're building on Linux or Android, add this line to the mozconfig file of your choice:
ac_add_options --enable-dmd
If you're building on Mac, add these lines:
ac_add_options --enable-debug ac_add_options --enable-dmd
Note: non-debug DMD builds do not currently work on Mac. see bug 995443.
If you're building on Windows, add these lines:
ac_add_options --enable-dmd ac_add_options --enable-profiling
Build with that mozconfig. Optimized builds should work fine. DMD enables jemalloc, and does not work in combination with --enable-trace-malloc
.
If you don't want to do the build yourself, you can get try server to do it for you. For a desktop build, modify the appropriate in-tree mozconfig file before pushing. E.g. for an optimized Linux64 build, modify browser/config/mozconfigs/linux64/common-opt
before pushing. On Windows you'll also need to add this line to build/mozconfig.common:
MOZ_CRASHREPORTER_UPLOAD_FULL_SYMBOLS=1
(and note that below the "Run" instructions for Windows try builds differ from the instructions for local Windows builds).
Note: If you build with clang earlier than r171782 on Linux, you may not get any stack traces; see bug 826962. As of Jan 16, 2013, r171782 is not part of any Clang release, so you have to build LLVM/clang from scratch to get it. GCC works fine.
B2G device builds
First, update your B2G checkout with git pull or git fetch && git merge origin/master. ./repo sync is not sufficient! You must git pull to get the latest version of the relevant tools.
For B2G device builds, we don't usually modify the mozconfig (although you can; it's hiding under gonk-misc/default-gecko-config).
Instead, modify your .userconfig and add
export MOZ_DMD=1
(don't forget the export).
You probably need to clobber your objdir (rm -rf objdir-gecko). Then build normally.
Run
Desktop
To run DMD on a desktop build (including Firefox and B2G desktop), you need to precede your usual invocation of Firefox with three environment variable definitions.
On Linux, from a bourne-style shell, do this:
LD_PRELOAD=$OBJDIR/dist/lib/libdmd.so \ LD_LIBRARY_PATH=$OBJDIR/dist/lib/ \ DMD=1 \ <command>
If you are feeling masochistic enough to want to run DMD under gdb, one way is to do this:
$ gdb --args <command> (gdb) set exec-wrapper env LD_PRELOAD=[path_to_lib]/libdmd.so LD_LIBRARY_PATH=[path_to_lib]/ DMD=1 (gdb) run
On Mac OS X, do this:
DYLD_INSERT_LIBRARIES=$OBJDIR/dist/lib/libdmd.dylib \ LD_LIBRARY_PATH=$OBJDIR/dist/lib/ \ DMD=1 \ <command>
On a local Windows build, do this:
set MOZ_REPLACE_MALLOC_LIB=path\to\dmd.dll set DMD=1 <command>
On a Windows build done by the try server, follow these instructions instead.
On start-up, you'll see some commentary on stderr, such as:
DMD[20523] $DMD = '1' DMD[20523] DMD is enabled
The number in brackets is the process ID.
The browser will run a little slower than usual.
Fennec
To run DMD on Fennec, run the following commands (be sure to replace "org.mozilla.fennec" with the app identifier as appropriate; this will usually be org.mozilla.fennec_$USERNAME for a local build).
adb push $OBJDIR/dist/lib/libdmd.so /sdcard/ adb shell am start -n org.mozilla.fennec/.App --es \ env0 MOZ_REPLACE_MALLOC_LIB=/sdcard/libdmd.so \ --es env1 DMD=1
The commentary on Fennec goes to logcat, and looks like this:
I/DMD (27314): $DMD = '1' I/DMD (27314): DMD is enabled
The number in the parentheses is the process ID.
B2G for devices
If you built B2G with export MOZ_DMD=1, your build will automagically run with DMD enabled. (The b2g.sh script figures out whether to enable DMD by checking for the presence of libdmd.so in /system/b2g)
If DMD is enabled, you'll see a message in logcat when a process starts up:
I/DMD ( 305): $DMD = '1' I/DMD ( 305): DMD is enabled
The run-gdb.sh script also knows to start DMD builds with DMD enabled, so you don't need to do anything special.
Analyze
DMD doesn't do anything notable until you ask it to.
Desktop
If you are on a sufficiently recent build, visit about:memory and click the "DMD" button. (The button won't be present in non-DMD builds, and will be grayed out in DMD builds if DMD wasn't enabled at start-up.)
If you are on an older build, create a bookmark with the following location:
javascript:DMDReportAndDump("out.dmd")
and click that instead. Note that this silently fails on certain special pages such as about:memory or an empty tab.
Both actions invoke all the memory reporters and then DMD analyzes the reports, printing this commentary:
DMD: running reporters... DMD[11420] Dump 1 { DMD[11420] gathering stack trace records... DMD[11420] creating and sorting twice-reported stack trace record array... DMD[11420] creating and sorting unreported stack trace record array... DMD[11420] printing unreported stack trace record array... DMD[11420] creating and sorting once-reported stack trace record array... DMD[11420] printing once-reported stack trace record array... DMD[11420] }
The output be written to a file called out.dmd
in the current working directory.
On Linux the analysis is fast. On Mac it can take 30+ seconds.
Fennec
Due to bug 823354, you may get empty stack traces on Fennec, rendering DMD's output largely unhelpful. Sorry.
On Fennec you can use the existing memory-report dumping hook to get a DMD report as well, assuming you have a DMD-enabled build. Run the following command:
adb shell am broadcast -a org.mozilla.gecko.MEMORY_DUMP
In logcat, you should see output similar to this:
E/GeckoConsole (27314): nsIMemoryInfoDumper dumped reports to /data/data/org.mozilla.fennec_kats/app_tmp/memory-report-default-27314.json.gz
The path (should always be /data/data/$APPID/app_tmp/) is where the memory reports and DMD reports get dumped to. You can pull them like so:
adb pull /data/data/org.mozilla.fennec_kats/app_tmp/memory-report-default-27314.json.gz adb pull /data/data/org.mozilla.fennec_kats/app_tmp/dmd-default-27314.txt.gz
B2G
Did you remember to git pull or git fetch && git merge as instructed in the B2G building section? If you skip this step, the tools below may be outdated. ./repo sync is not sufficient.
Run tools/get_about_memory.py. If DMD is enabled on your device, you should see output like the following:
$ ./get_about_memory.py Got 3/3 files. Pulled files into about-memory-18. Got 3 DMD dump(s). [...] Done processing DMD files. Have a look in about-memory-18.
get_about_memory.py invokes fix_b2g_stack.py, so you shouldn't need to run it yourself, but it's there in case you need it. It works just like fix_linux_stack.pl on desktop.
See get_about_memory.py --help for more options, but you probably don't need anything other than the defaults.
The output
Pre-processing
Note: You can skip this step if you're on Windows or a B2G device build.
DMD's output file contains a lot of stack traces. As printed, many of the stack trace entries will look like this:
???[/home/njn/moz/mi2/dmdo64/dist/bin/libxul.so +0x1761BCD] 0x7f845186bbcd
To make them more useful, you need to run them through a "stack-fixing" script.
- On Linux, use tools/rb/fix-linux-stack.pl.
- On Mac, use tools/rb/fix_macosx_stack.py.
Both scripts read from stdin and print to stdout. After doing so, these lines should look something more like this:
nsStringBuffer::Alloc(unsigned long) (/home/njn/moz/mi2/xpcom/string/src/nsSubstring.cpp:177) 0x7f845186bbcd
This shows the function, filename, line number, and PC.
Note that fix-linux-stack.pl is very slow, and can take 2+ minutes to process a DMD output file. fix_macosx_stack.py is faster, but can still take 30+ seconds.
Output Sections
DMD's output is broken into multiple sections.
- "Invocation". This tells you how DMD was invoked, i.e. what options were used.
- "Twice-reported stack trace records". This tells you which heap blocks were reported twice or more. The presence of any such records indicates bugs in one or more memory reporters.
- "Unreported stack trace records". This tells you which heap blocks were not reported, which indicate where additional memory reporters would be most helpful.
- "Once-reported stack trace records": like the "Unreported stack trace records" section, but for blocks reported once.
- "Summary": gives measurements of the total heap, and the unreported/once-reported/twice-reported portions of it.
- "Execution measurements": gives some statistics about DMD's execution, which are mostly of interest to DMD's developers.
The "Twice-reported stack trace records" and "Unreported stack trace records" sections are the most important, because they indicate ways in which the memory reporters can be improved.
Stack trace records
The stack trace record sections are the most important ones. Here's an example stack trace record from the "Unreported stack trace records" section.
Unreported: 3 blocks in stack trace record 209 of 1,891 36,864 bytes (26,184 requested / 10,680 slop) 0.03% of the heap (64.55% cumulative); 0.04% of unreported (86.78% cumulative) Allocated at malloc (/home/njn/moz/mi2/memory/build/replace_malloc.c:151) 0x417170 PR_Malloc (/home/njn/moz/mi2/nsprpub/pr/src/malloc/prmem.c:435) 0x7f68650f423c PL_ArenaAllocate (/home/njn/moz/mi2/nsprpub/lib/ds/plarena.c:200) 0x7f68652463e1 nsFixedSizeAllocator::Alloc(unsigned long) (/home/njn/moz/mi2/xpcom/ds/nsFixedSizeAllocator.cpp:95) 0x7f6860f528dc nsNodeInfo::Create(nsIAtom*, nsIAtom*, int, unsigned short, nsIAtom*, nsNodeInfoManager*) (/home/njn/moz/mi2/content/base/src/nsNodeInfo.cpp:64) 0x7f685f640933 nsNodeInfoManager::GetNodeInfo(nsIAtom*, nsIAtom*, int, unsigned short, nsIAtom*) (/home/njn/moz/mi2/content/base/src/nsNodeInfoManager.cpp:225) 0x7f685f642d05 mozilla::dom::Element::SetAttrAndNotify(int, nsIAtom*, nsIAtom*, nsAttrValue const&, nsAttrValue&, unsigned char, bool, bool, bool) (/home/njn/moz/mi2/content/base/src/Element.cpp:1862) 0x7f685f60ad87 mozilla::dom::Element::SetAttr(int, nsIAtom*, nsIAtom*, nsAString_internal const&, bool) (/home/njn/moz/mi2/content/base/src/Element.cpp:1778) 0x7f685f60a9b3 nsXMLContentSink::AddAttributes(unsigned short const**, nsIContent*) (/home/njn/moz/mi2/content/xml/document/src/nsXMLContentSink.cpp:1464) 0x7f685fa76c5c nsXBLContentSink::AddAttributes(unsigned short const**, nsIContent*) (/home/njn/moz/mi2/content/xbl/src/nsXBLContentSink.cpp:882) 0x7f685fb3ad42 nsXMLContentSink::HandleStartElement(unsigned short const*, unsigned short const**, unsigned int, int, unsigned int, bool) (/home/njn/moz/mi2/content/xml/document/src/nsXMLContentSink.cpp:1018) 0x7f685fa73db5 nsXMLContentSink::HandleStartElement(unsigned short const*, unsigned short const**, unsigned int, int, unsigned int) (/home/njn/moz/mi2/content/xml/document/src/nsXMLContentSink.cpp:947) 0x7f685fa7370a nsXBLContentSink::HandleStartElement(unsigned short const*, unsigned short const**, unsigned int, int, unsigned int) (/home/njn/moz/mi2/content/xbl/src/nsXBLContentSink.cpp:258) 0x7f685fb37cc0
It tells you that there were 3 heap blocks that were allocated from the program point indicated by the "Allocated at" stack trace, that these blocks took up 36,864 bytes, and that 10,680 of those bytes were "slop" (wasted space caused by the heap allocator rounding up request sizes). It also indicates what percentage of the total heap size and the unreported portion of the heap these blocks represent.
Within each section, records are listed from largest to smallest.
Once-reported and twice-reported stack trace records also have stack traces for the report point(s). For example:
Reported at mozilla::dmd::Report(void const*) (/home/njn/moz/mi2/memory/replace/dmd/DMD.cpp:1740) 0x7f68652581ca CycleCollectorMallocSizeOf(void const*) (/home/njn/moz/mi2/xpcom/base/nsCycleCollector.cpp:3008) 0x7f6860fdfe02 nsPurpleBuffer::SizeOfExcludingThis(unsigned long (*)(void const*)) const (/home/njn/moz/mi2/xpcom/base/nsCycleCollector.cpp:933) 0x7f6860fdb7af nsCycleCollector::SizeOfIncludingThis(unsigned long (*)(void const*), unsigned long*, unsigned long*, unsigned long*, unsigned long*, unsigned long*) const (/home/njn/moz/mi2/xpcom/base/nsCycleCollector.cpp:3029) 0x7f6860fdb6b1 CycleCollectorMultiReporter::CollectReports(nsIMemoryMultiReporterCallback*, nsISupports*) (/home/njn/moz/mi2/xpcom/base/nsCycleCollector.cpp:3075) 0x7f6860fde432 nsMemoryInfoDumper::DumpMemoryReportsToFileImpl(nsAString_internal const&) (/home/njn/moz/mi2/xpcom/base/nsMemoryInfoDumper.cpp:626) 0x7f6860fece79 nsMemoryInfoDumper::DumpMemoryReportsToFile(nsAString_internal const&, bool, bool) (/home/njn/moz/mi2/xpcom/base/nsMemoryInfoDumper.cpp:344) 0x7f6860febaf9 mozilla::(anonymous namespace)::DumpMemoryReportsRunnable::Run() (/home/njn/moz/mi2/xpcom/base/nsMemoryInfoDumper.cpp:58) 0x7f6860fefe03
You can tell which memory reporter made the report by the name of the MallocSizeOf function near the top of the stack trace. In this case it was the cycle collector's reporter.
By default, DMD measures heap blocks above a certain size precisely, but uses sampling to measure blocks below that size. Any measurements that involve sampled blocks (even if combined with non-sampled measurements) are approximate, and this is indicated by a preceding '~'. For example:
Unreported: ~273 blocks in block group 17 of 14,611 ~1,125,590 bytes (~1,117,936 requested / ~7,654 slop) 0.07% of the heap (2.58% cumulative); 0.43% of unreported (16.36% cumulative)
The sampling threshold can be adjusted with an option (see below). This will affect the precision of the output and the speed at which Firefox+DMD runs.
Options
Setting the DMD environment variable to 1 gives default options. But you can also specify non-default options by setting DMD to a whitespace separated list of --option=val entries.
At the moment, you can provide two options to DMD: --sample-below=<1..n> and --mode=<normal|test|stress>.
--sample-below=<1..n>
By default, DMD samples blocks with a sample-below size of 4093. I.e. it ignores some small allocations in order to run (much) faster.
If you want DMD to record all allocations precisely, pass --sample-below=1. Otherwise, you should probably leave it unchanged. If you do pick a different value, prime numbers work best.
--mode=<normal|test|stress>
--mode=<normal|test|stress> can be used to invoke "test" or "stress" mode, which are useful if you're hacking on DMD. The default is normal mode.
"test" and "stress" modes set their own --sample-below values, so you should never have to specify both --sample-below and --mode.
To run the tests, specify --mode=test and start Firefox. It will print out some stuff and very quickly exit. Then run the following command from the top of your source directory.
memory/replace/dmd/check_test_output.py . test.dmd
(If you invoked Firefox from a different directory to your source directory, you might need to specify the path to test.dmd.)
This script checks the output produced by the previous step, and will indicate if the test passed or failed. It should work on Linux and Mac, but is unreliable on Windows.
Setting options on B2G device builds
If you want to run B2G on a device with args other than DMD=1, you'll need to modify the gonk-misc/b2g.sh script and then push it to the device.
To push the modified script, do something like
adb shell stop b2g adb remount adb push b2g.sh /system/bin adb shell chmod 0755 /system/bin/b2g.sh adb shell start b2g
If you want to run B2G on the device under GDB with args other than DMD=1, modify the run-gdb.sh script. You don't need to push anything.
Which heap blocks are reported?
At this stage you might wonder how DMD knows which allocations have been reported and which haven't. DMD only knows about heap blocks that are measured via a function created with one of the following two macros:
NS_MEMORY_REPORTER_MALLOC_SIZEOF_FUN NS_MEMORY_REPORTER_MALLOC_SIZEOF_ON_ALLOC_FUN
Fortunately, most of the existing memory reporters do this. See Platform/Memory_Reporting for more details about how memory reporters are written.
Troubleshooting DMD
Contact Nick Nethercote ("njn" on IRC) or Nathan Froyd ("froydnj" on IRC).