Firefox OS/Performance/Profiling: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Completely replace the very outdated perf(1) section with a summary of the state of bug 831611)
(→‎Profiling with perf(1): Update for the miniperf→profiling merge.)
Line 5: Line 5:
= Profiling with perf(1) =
= Profiling with perf(1) =


Work is in progress to make the Linux kernel profiler, called "perf", useful for debugging on B2G.  See [https://bugzilla.mozilla.org/show_bug.cgi?id=831611 bug 831611] for more information; the main issue for getting it landed is obtaining stack traces, which [https://bugzilla.mozilla.org/show_bug.cgi?id=856899 bug 856899] goes into more detail on.  Also, currently it requires a Linux build host, but see [[#Experimental MacOS Host Support]] below.
Work is in progress to make the Linux kernel profiler, called "perf", useful for debugging on B2G.  See [https://bugzilla.mozilla.org/show_bug.cgi?id=831611 bug 831611] for more information; the main issue for getting it landed is obtaining stack traces, which [https://bugzilla.mozilla.org/show_bug.cgi?id=856899 bug 856899] goes into more detail on.


== Quick Start ==
== Quick Start ==
This should now work on both Linux and Mac build hosts.


# Add git://github.com/jld/B2G.git as a remote and check out the "profiling" branch from it.
# Add git://github.com/jld/B2G.git as a remote and check out the "profiling" branch from it.
Line 19: Line 21:
## Hit Enter in the shell window, like the message said to.
## Hit Enter in the shell window, like the message said to.
# There should have been a line like "Writing profile to perf_20130423_122912.txt".  Go to https://people.mozilla.com/~bgirard/cleopatra/ (or a local clone, if you have one) and feed it that file.
# There should have been a line like "Writing profile to perf_20130423_122912.txt".  Go to https://people.mozilla.com/~bgirard/cleopatra/ (or a local clone, if you have one) and feed it that file.
== Experimental MacOS Host Support ==
As above, but use the "miniperf" branch.  This uses Python code to parse the performance event records directly instead of running the Linux "perf" command and scraping its output, to convert them into Cleopatra/SPS format.  (It also replaces the perf command run on the device with a small C program that implements enough of "perf record" for our purposes and outputs it in a simplified format; the perf.data file format is not well documented, and the perf command is large and difficult to cross-compile.)
This may also be useful for Linux hosts that don't have the same libraries as the system I built perf on; it has a lot of dependencies, and some of them have compatibility issues between distributions.  The current plan is to make miniperf the default for this.


== Fine-Tuning ==
== Fine-Tuning ==
Line 30: Line 26:
By default, perf samples based on the CPU's cycle counter, adjusting the period to gather approximately 4000 samples/sec.  However, it gathers nothing while the CPU is idle, and currently Cleopatra (the Gecko profiler front-end) ignores these times — it doesn't display them in the timeline, and its "real interval" is an average over both real inter-sample intervals and idle times.
By default, perf samples based on the CPU's cycle counter, adjusting the period to gather approximately 4000 samples/sec.  However, it gathers nothing while the CPU is idle, and currently Cleopatra (the Gecko profiler front-end) ignores these times — it doesn't display them in the timeline, and its "real interval" is an average over both real inter-sample intervals and idle times.


Other timers are available; use the -e flag to select one.  In particular, "-e cpu-clock" uses a real-time interval timer, which gather samples even when the CPU is idle.  However, at least on unagi it seems to be restricted to 2500 samples/sec.
Other timers are available; use the -e flag (when running "./run-perf record-sps") to select one.  In particular, "-e cpu-clock" uses a real-time interval timer, which gather samples even when the CPU is idle.  However, at least on unagi it seems to be restricted to 2500 samples/sec.


Note that on the miniperf branch, cpu-clock is the default; use "-e cycles" to use the cycle counter.  Or try something else; run "perf list" on a Linux host, or look at the table at the top of gonk-misc/miniperf/miniperf-record.c on the miniperf branch.  The hardware might not support some of them.
Note that on the miniperf branch, cpu-clock is the default; use "-e cycles" to use the cycle counter.  Or try something else; run "perf list" on a Linux host, or look at the table at the top of gonk-misc/miniperf/miniperf-record.c on the miniperf branch.  The hardware might not support some of them.
Line 36: Line 32:
To set the sample rate, use the -F flag to specify a target frequency (samples/sec) or -c to give an absolute number of cycles (note that the kernel will adjust the CPU speed in response to demand).  The "cycle time" for cpu-clock appears to be in nanoseconds regardless of the physical timer used.  Note that, at very high rates (empirically, >10 kHz), the CPU may spend enough time gathering samples to noticeably slow down the application being profiled.
To set the sample rate, use the -F flag to specify a target frequency (samples/sec) or -c to give an absolute number of cycles (note that the kernel will adjust the CPU speed in response to demand).  The "cycle time" for cpu-clock appears to be in nanoseconds regardless of the physical timer used.  Note that, at very high rates (empirically, >10 kHz), the CPU may spend enough time gathering samples to noticeably slow down the application being profiled.


== Other Commands ==
== In More Detail ==


The original run-perf.sh commands were "record" and "report", corresponding to those perf(1) commands, except that "record" is run on the device (and the perf.data file pulled afterwards, along with kallsyms) and "report" constructs a symlink farm to provide symbol information.  This may be useful to those who are already familiar with perf, but it's not the most obvious interface for new users.
The original run-perf.sh commands were "record" and "report", corresponding to those perf(1) commands, except that "record" is run on the device (and the perf.data file pulled afterwards, along with kallsyms) and "report" constructs a symlink farm to provide symbol information.  This may be useful to those who are already familiar with perf, but it's not the most obvious interface for new users.


The next layer is "./run-perf.sh sps", which converts the perf.data file pulled by "./run-perf.sh record" to the format used by the Gecko profiler (with symbols), as long as it was made with -a (all CPUs); and "./run-perf.sh record-sps", which combines "record -a -g" and "sps".
However: "report" works only on Linux hosts (and not even all of those out of the box, without rebuilding the perf executable, due to library issues), and while "record" will work without "report" it's not very useful — the perf.data format is undocumented, and the code for reading and writing it is not trivial to extract from its Linux dependencies.
 
As an alternative to this, "./run-perf.sh minirecord" accepts a subset of the perf record options (-e -c -F -m -o, and defaults to -a -g -e cpu-clock) and obtains output in a simpler format.  The actual command it runs, "miniperf-record", is built in gonk-misc with the normal Android makefiles.
 
The next layer is "./run-perf.sh sps", which converts the raw profile data to the format used by the Gecko profiler (with symbols).  If it was collected with "./run-perf.sh record" then it needs to run perf report, in which case see above about nonportability; for miniperf it reads the file with Python code that should run everywhere.


On the miniperf branch, there are currently "minirecord" (runs the miniperf-record command instead of perf record), and the "sps" subcommand recognizes miniperf files, and record-sps uses minirecord instead.
Finally, "./run-perf.sh record-sps" is just "minirecord" followed by "sps".

Revision as of 00:34, 2 May 2013

Profiling with the gecko profiler

See these instructions. Patches are in-flight to get native stacks in profiles, but that's not in default configurations yet.

Profiling with perf(1)

Work is in progress to make the Linux kernel profiler, called "perf", useful for debugging on B2G. See bug 831611 for more information; the main issue for getting it landed is obtaining stack traces, which bug 856899 goes into more detail on.

Quick Start

This should now work on both Linux and Mac build hosts.

  1. Add git://github.com/jld/B2G.git as a remote and check out the "profiling" branch from it.
  2. ./config.sh. Don't set BRANCH here; the default is "profiling-v1", which the v1-train manifests with suitable changes. You can check out gecko and/or gaia to different versions afterwards.
  3. "export B2G_PROFILING=1" in .userconfig
  4. Delete "out" and "objdir-gecko" (or whatever your gecko objdir is named), then ./build.sh.
  5. ./flash.sh
  6. Now profile something:
    1. ./run-perf.sh record-sps
    2. Do something of interest on the device.
    3. Hit Enter in the shell window, like the message said to.
  7. There should have been a line like "Writing profile to perf_20130423_122912.txt". Go to https://people.mozilla.com/~bgirard/cleopatra/ (or a local clone, if you have one) and feed it that file.

Fine-Tuning

By default, perf samples based on the CPU's cycle counter, adjusting the period to gather approximately 4000 samples/sec. However, it gathers nothing while the CPU is idle, and currently Cleopatra (the Gecko profiler front-end) ignores these times — it doesn't display them in the timeline, and its "real interval" is an average over both real inter-sample intervals and idle times.

Other timers are available; use the -e flag (when running "./run-perf record-sps") to select one. In particular, "-e cpu-clock" uses a real-time interval timer, which gather samples even when the CPU is idle. However, at least on unagi it seems to be restricted to 2500 samples/sec.

Note that on the miniperf branch, cpu-clock is the default; use "-e cycles" to use the cycle counter. Or try something else; run "perf list" on a Linux host, or look at the table at the top of gonk-misc/miniperf/miniperf-record.c on the miniperf branch. The hardware might not support some of them.

To set the sample rate, use the -F flag to specify a target frequency (samples/sec) or -c to give an absolute number of cycles (note that the kernel will adjust the CPU speed in response to demand). The "cycle time" for cpu-clock appears to be in nanoseconds regardless of the physical timer used. Note that, at very high rates (empirically, >10 kHz), the CPU may spend enough time gathering samples to noticeably slow down the application being profiled.

In More Detail

The original run-perf.sh commands were "record" and "report", corresponding to those perf(1) commands, except that "record" is run on the device (and the perf.data file pulled afterwards, along with kallsyms) and "report" constructs a symlink farm to provide symbol information. This may be useful to those who are already familiar with perf, but it's not the most obvious interface for new users.

However: "report" works only on Linux hosts (and not even all of those out of the box, without rebuilding the perf executable, due to library issues), and while "record" will work without "report" it's not very useful — the perf.data format is undocumented, and the code for reading and writing it is not trivial to extract from its Linux dependencies.

As an alternative to this, "./run-perf.sh minirecord" accepts a subset of the perf record options (-e -c -F -m -o, and defaults to -a -g -e cpu-clock) and obtains output in a simpler format. The actual command it runs, "miniperf-record", is built in gonk-misc with the normal Android makefiles.

The next layer is "./run-perf.sh sps", which converts the raw profile data to the format used by the Gecko profiler (with symbols). If it was collected with "./run-perf.sh record" then it needs to run perf report, in which case see above about nonportability; for miniperf it reads the file with Python code that should run everywhere.

Finally, "./run-perf.sh record-sps" is just "minirecord" followed by "sps".