Firefox OS/Performance/Profiling: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Completely replace the very outdated perf(1) section with a summary of the state of bug 831611)
m (Lakrits moved page FirefoxOS/Performance/Profiling to Firefox OS/Performance/Profiling: The official spelling of "Firefox OS" leaves a space between the two parts of the name. It's easier to find a page if the spelling of its name is consistent...)
 
(13 intermediate revisions by 8 users not shown)
Line 1: Line 1:
= Profiling with the gecko profiler =
= Profiling with the gecko profiler =
Good at: Native stacks (with runtime options) + javascript profiling, low overhead sampling, familiar for gecko developers


See [https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Profiling_Boot_to_Gecko_%28with_a_real_device%29 these instructions].  Patches are in-flight to get native stacks in profiles, but that's not in default configurations yet.
See [https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Profiling_Boot_to_Gecko_%28with_a_real_device%29 these instructions].  Patches are in-flight to get native stacks in profiles, but that's not in default configurations yet.


= Profiling with perf(1) =
= Profiling with systrace =
 
Good at: Shows process preemption, shows all calls to instrumented functions, Familiar for android developers
Work is in progress to make the Linux kernel profiler, called "perf", useful for debugging on B2G.  See [https://bugzilla.mozilla.org/show_bug.cgi?id=831611 bug 831611] for more information; the main issue for getting it landed is obtaining stack traces, which [https://bugzilla.mozilla.org/show_bug.cgi?id=856899 bug 856899] goes into more detail on.  Also, currently it requires a Linux build host, but see [[#Experimental MacOS Host Support]] below.
 
== Quick Start ==
 
# Add git://github.com/jld/B2G.git as a remote and check out the "profiling" branch from it.
# ./config.sh.  Don't set BRANCH here; the default is "profiling-v1", which the v1-train manifests with suitable changes.  You can check out gecko and/or gaia to different versions afterwards.
# "export B2G_PROFILING=1" in .userconfig
# Delete "out" and "objdir-gecko" (or whatever your gecko objdir is named), then ./build.sh.
# ./flash.sh
# Now profile something:
## ./run-perf.sh record-sps
## Do something of interest on the device.
## Hit Enter in the shell window, like the message said to.
# There should have been a line like "Writing profile to perf_20130423_122912.txt".  Go to https://people.mozilla.com/~bgirard/cleopatra/ (or a local clone, if you have one) and feed it that file.
 
== Experimental MacOS Host Support ==
 
As above, but use the "miniperf" branch.  This uses Python code to parse the performance event records directly instead of running the Linux "perf" command and scraping its output, to convert them into Cleopatra/SPS format.  (It also replaces the perf command run on the device with a small C program that implements enough of "perf record" for our purposes and outputs it in a simplified format; the perf.data file format is not well documented, and the perf command is large and difficult to cross-compile.)
 
This may also be useful for Linux hosts that don't have the same libraries as the system I built perf on; it has a lot of dependencies, and some of them have compatibility issues between distributions.  The current plan is to make miniperf the default for this.
 
== Fine-Tuning ==
 
By default, perf samples based on the CPU's cycle counter, adjusting the period to gather approximately 4000 samples/sec.  However, it gathers nothing while the CPU is idle, and currently Cleopatra (the Gecko profiler front-end) ignores these times — it doesn't display them in the timeline, and its "real interval" is an average over both real inter-sample intervals and idle times.
 
Other timers are available; use the -e flag to select one.  In particular, "-e cpu-clock" uses a real-time interval timer, which gather samples even when the CPU is idle.  However, at least on unagi it seems to be restricted to 2500 samples/sec.
 
Note that on the miniperf branch, cpu-clock is the default; use "-e cycles" to use the cycle counter.  Or try something else; run "perf list" on a Linux host, or look at the table at the top of gonk-misc/miniperf/miniperf-record.c on the miniperf branch.  The hardware might not support some of them.


To set the sample rate, use the -F flag to specify a target frequency (samples/sec) or -c to give an absolute number of cycles (note that the kernel will adjust the CPU speed in response to demand).  The "cycle time" for cpu-clock appears to be in nanoseconds regardless of the physical timer used.  Note that, at very high rates (empirically, >10 kHz), the CPU may spend enough time gathering samples to noticeably slow down the application being profiled.
Bad at: Requires configure option, higher overhead


== Other Commands ==
*Download android sdk to get systrace tool:
**[http://developer.android.com/sdk/index.html 1. download link]
**2. the systrace.py tool is at path-to-android-sdk/tools/systrace


The original run-perf.sh commands were "record" and "report", corresponding to those perf(1) commands, except that "record" is run on the device (and the perf.data file pulled afterwards, along with kallsyms) and "report" constructs a symlink farm to provide symbol information.  This may be useful to those who are already familiar with perf, but it's not the most obvious interface for new users.
*Enable systrace in B2G:
**Build with '--enable-systrace' config or just uncomment the MOZ_USE_SYSTRACE define in gecko/tools/profiler/GeckoProfilerImpl.h like:
<pre>
#define MOZ_USE_SYSTRACE
#ifdef MOZ_USE_SYSTRACE
# define ATRACE_TAG ATRACE_TAG_ALWAYS
// We need HAVE_ANDROID_OS to be defined for Trace.h.
// If its not set we will set it temporary and remove it.
# ifndef HAVE_ANDROID_OS
#  define HAVE_ANDROID_OS
#  define REMOVE_HAVE_ANDROID_OS
# endif
</pre>


The next layer is "./run-perf.sh sps", which converts the perf.data file pulled by "./run-perf.sh record" to the format used by the Gecko profiler (with symbols), as long as it was made with -a (all CPUs); and "./run-perf.sh record-sps", which combines "record -a -g" and "sps".
*How to use systrace:
**[http://developer.android.com/tools/help/systrace.html systrace.py document]
**./systrace.py --time=10 -o mynewtrace.html sched


On the miniperf branch, there are currently "minirecord" (runs the miniperf-record command instead of perf record), and the "sps" subcommand recognizes miniperf files, and record-sps uses minirecord instead.
Note: Gecko code is tagged as ATRACE_TAG_ALWAYS, so we don't set the category type.

Latest revision as of 13:59, 1 February 2015

Profiling with the gecko profiler

Good at: Native stacks (with runtime options) + javascript profiling, low overhead sampling, familiar for gecko developers

See these instructions. Patches are in-flight to get native stacks in profiles, but that's not in default configurations yet.

Profiling with systrace

Good at: Shows process preemption, shows all calls to instrumented functions, Familiar for android developers

Bad at: Requires configure option, higher overhead

  • Download android sdk to get systrace tool:
    • 1. download link
    • 2. the systrace.py tool is at path-to-android-sdk/tools/systrace
  • Enable systrace in B2G:
    • Build with '--enable-systrace' config or just uncomment the MOZ_USE_SYSTRACE define in gecko/tools/profiler/GeckoProfilerImpl.h like:
#define MOZ_USE_SYSTRACE
#ifdef MOZ_USE_SYSTRACE
# define ATRACE_TAG ATRACE_TAG_ALWAYS
// We need HAVE_ANDROID_OS to be defined for Trace.h.
// If its not set we will set it temporary and remove it.
# ifndef HAVE_ANDROID_OS
#   define HAVE_ANDROID_OS
#   define REMOVE_HAVE_ANDROID_OS
# endif

Note: Gecko code is tagged as ATRACE_TAG_ALWAYS, so we don't set the category type.