Firefox OS/Performance/Debugging OOMs
How to debug a B2G OOM crash
B2G runs on severely memory-constrained devices, and it's easy for apps to exhaust the memory available on the system. When a process exhausts the memory available on the system, the kernel must kill some process in order to free up memory. When the kernel choses to kill the foreground process, this manifests as an apparent crash of the app you're using.
This document describes how B2G's multiprocess architecture affects what the phone does when we run out of memory, and how to understand and debug OOM crashes.
Process priorities
B2G uses multiple processes when it runs on a phone. It has one "main process" and potentially many "child processes". Every app runs in its own child process, with one exception: The browser app runs in the main process, while the tabs inside the browser app each run in their own child process.
The process we kill when we run out of memory isn't necessarily the one that "caused" the out-of-memory condition. B2G assigns priorities to each process based on how important it thinks the process is, and when the system runs out of memory, it kills process strictly in order of priority.
A process's priority is known as its "oom_adj". Smaller oom_adj values correspond to higher priority processes.
Killing the main process kills all child processes and effectively reboots the phone, so we never want to kill the main process. Therefore, the main process runs with oom_adj 0.
Most processes run with oom_adj 2 while they're in the foreground. Processes in the background run with oom_adj between 3 and 6 (inclusive). Exactly what oom_adj a background process gets depends on a number of factors, such as whether it's playing sound, whether it's the homescreen app, and so on.
Debugging an OOM crash
Suppose you have a reproducible crash that you suspect is caused by the phone running out of memory. The following are steps you can take to understand more about what's going wrong.
Step 1: Verify that it's actually an OOM
First, we need to check whether the crash is actually due to the phone running out of memory. To do this, run adb shell dmesg. If the app is being killed due to OOM, you'll see something like the following line in dmesg:
<4>[06-18 07:40:25.291] [2897: Notes+]send sigkill to 2897 (Notes+), adj 2, size 30625
This line indicates that the phone's low-memory killer killed the Notes+ app (process-id 2897), which had oom_adj 2 when it was killed. The size reported here is in pages, which are 4kb each. So in this case, the Notes+ app was using 30625 * 4kb = 120mb of memory.
Digression: If it's not an OOM
If you don't see a line like this in the dmesg output, your crash is likely not an OOM. The next step in debugging such a crash is usually to attach gdb to the crashing process and get a backtrace:
$ cd path/to/B2G/checkout $ adb shell b2g-ps # Note pid of the app that you're going to crash $ ./run-gdb.sh attach <pid> (gdb) continue # crash the app (gdb) bt
Attach this output, along with the output of adb logcat to a bug.
If your crash is due to OOM, a gdb backtrace is probably not interesting, because an OOM crash is triggered by a signal sent from the kernel, not by bad code that the process executes.
Step 2: Collect memory reports
After you've verified that your crash is actually due to OOM, the next step is to collect a memory report from the phone before the app crashes. A memory report will help us understand where memory is being used.
This step is a bit tricky, because once an app crashes, there's no way to collect a memory report from that process. There's no way to trigger a memory report when the kernel tries to kill a process -- by then, it's too late.
To pull a memory report from the phone, first update your tree so you get the latest version of this tool. repo sync is not sufficient.
$ cd path/to/B2G/checkout $ git fetch origin $ git merge --ff-only origin
Now you can run the tool:
$ tools/get_about_memory.py
But again, this is only helpful if you run this command while the app you care about is alive and using a lot of memory. We have a few options here.
Step 2, option 1: Get a different device
Often the easiest thing to do is to get a device with more RAM. You know from step 1 above how much memory the process used when it crashed, so you can simply wait until the process is using about that much memory, and then take a memory report.
The b2g-info tool let you see how much memory the different B2G processes are using. You can run this tool in a loop by doing something like the following:
$ adb shell 'while true; do b2g-info; sleep 1; done'
If b2g-info isn't available on your device, you can use b2g-procrank instead.
Step 2, option 2: Fastest finger
If you don't have access to a device with more RAM, you can try to run get_about_memory.py just before the app crashes. Again, you can run b2g-info in a loop to figure out when to run get_about_memory.py.
Step 2, option 3: Run B2G on your desktop
If worst comes to worst, you can run B2G on your desktop, which probably has much more RAM than your FFOS phone. This is tricky because B2G running on a desktop machine is a different in some key ways from B2G running on a phone.
In particular, B2G on desktop machines has multiprocess disabled by default. It doesn't really work 100% correctly anywhere, but it mostly works on Linux and Mac. (I'm not sure yet how to enable it.)
It's also not as convenient to take memory reports from a B2G desktop process. On Linux, you can send signal 34 to the main B2G process and we'll write "memory-report-*.gz" files out to /tmp.
One advantage to using B2G desktop builds is that you can use your favorite desktop debugging tools, such as Instruments on MacOS. We've had a lot of success with this in the past.
Instructions for setting up B2G desktop builds can be found here: https://wiki.mozilla.org/Gaia/Hacking#B2G_Desktop