Firefox OS/Performance/Profiling

From MozillaWiki
< Firefox OS‎ | Performance
Revision as of 22:28, 23 April 2013 by JLD (talk | contribs) (Completely replace the very outdated perf(1) section with a summary of the state of bug 831611)
Jump to navigation Jump to search

Profiling with the gecko profiler

See these instructions. Patches are in-flight to get native stacks in profiles, but that's not in default configurations yet.

Profiling with perf(1)

Work is in progress to make the Linux kernel profiler, called "perf", useful for debugging on B2G. See bug 831611 for more information; the main issue for getting it landed is obtaining stack traces, which bug 856899 goes into more detail on. Also, currently it requires a Linux build host, but see #Experimental MacOS Host Support below.

Quick Start

  1. Add git://github.com/jld/B2G.git as a remote and check out the "profiling" branch from it.
  2. ./config.sh. Don't set BRANCH here; the default is "profiling-v1", which the v1-train manifests with suitable changes. You can check out gecko and/or gaia to different versions afterwards.
  3. "export B2G_PROFILING=1" in .userconfig
  4. Delete "out" and "objdir-gecko" (or whatever your gecko objdir is named), then ./build.sh.
  5. ./flash.sh
  6. Now profile something:
    1. ./run-perf.sh record-sps
    2. Do something of interest on the device.
    3. Hit Enter in the shell window, like the message said to.
  7. There should have been a line like "Writing profile to perf_20130423_122912.txt". Go to https://people.mozilla.com/~bgirard/cleopatra/ (or a local clone, if you have one) and feed it that file.

Experimental MacOS Host Support

As above, but use the "miniperf" branch. This uses Python code to parse the performance event records directly instead of running the Linux "perf" command and scraping its output, to convert them into Cleopatra/SPS format. (It also replaces the perf command run on the device with a small C program that implements enough of "perf record" for our purposes and outputs it in a simplified format; the perf.data file format is not well documented, and the perf command is large and difficult to cross-compile.)

This may also be useful for Linux hosts that don't have the same libraries as the system I built perf on; it has a lot of dependencies, and some of them have compatibility issues between distributions. The current plan is to make miniperf the default for this.

Fine-Tuning

By default, perf samples based on the CPU's cycle counter, adjusting the period to gather approximately 4000 samples/sec. However, it gathers nothing while the CPU is idle, and currently Cleopatra (the Gecko profiler front-end) ignores these times — it doesn't display them in the timeline, and its "real interval" is an average over both real inter-sample intervals and idle times.

Other timers are available; use the -e flag to select one. In particular, "-e cpu-clock" uses a real-time interval timer, which gather samples even when the CPU is idle. However, at least on unagi it seems to be restricted to 2500 samples/sec.

Note that on the miniperf branch, cpu-clock is the default; use "-e cycles" to use the cycle counter. Or try something else; run "perf list" on a Linux host, or look at the table at the top of gonk-misc/miniperf/miniperf-record.c on the miniperf branch. The hardware might not support some of them.

To set the sample rate, use the -F flag to specify a target frequency (samples/sec) or -c to give an absolute number of cycles (note that the kernel will adjust the CPU speed in response to demand). The "cycle time" for cpu-clock appears to be in nanoseconds regardless of the physical timer used. Note that, at very high rates (empirically, >10 kHz), the CPU may spend enough time gathering samples to noticeably slow down the application being profiled.

Other Commands

The original run-perf.sh commands were "record" and "report", corresponding to those perf(1) commands, except that "record" is run on the device (and the perf.data file pulled afterwards, along with kallsyms) and "report" constructs a symlink farm to provide symbol information. This may be useful to those who are already familiar with perf, but it's not the most obvious interface for new users.

The next layer is "./run-perf.sh sps", which converts the perf.data file pulled by "./run-perf.sh record" to the format used by the Gecko profiler (with symbols), as long as it was made with -a (all CPUs); and "./run-perf.sh record-sps", which combines "record -a -g" and "sps".

On the miniperf branch, there are currently "minirecord" (runs the miniperf-record command instead of perf record), and the "sps" subcommand recognizes miniperf files, and record-sps uses minirecord instead.