Firefox OS/Performance/Profiling: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
m (typo)
No edit summary
Line 1: Line 1:
= Profiling with oprofile =
= Profiling with perf =
OProfile is a system-wide profiler for Linux systems. The detail description about OProfile please refer to below url<br>
http://oprofile.sourceforge.net/news/
 
OProfile consists of three portions, linux kernel driver, userspace applications and collected profiling samplings.<br>
 
== Prepare the Linux Kernel ==
Please make sure you have turned on below features in kernel configuration file. The kernel configuration file will be .config in your linux kernel directory normally. You need to recompile linux kernel after turn on OProfile feature. 
 
<pre>
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
CONFIG_HAVE_OPROFILE=y
</pre>
 
== Userspace applications ==
userspace applications of OProfile includes opcontrol and oprofiled. You can find source code of OProfile in glue/gonk/external/oprofile.<br>
 
== Host application ==
use host utility opreport to analysis profiling samples<br>
you need to install it in your host system.
<pre>
sudo apt-get install oprofile
</pre>
 
== Five Steps to profile your target device ==
To make it easier to use OProfile on B2G project, several Makefile targets have been written.
<pre>
make op_setup        # start up oprofile
make op_start        # start profiling
make op_status      # check status
make op_stop        # stop profiling
make op_pull        # pull profile data from phone
make op_show        # save profiling result in oprofile/oprofile.log
</pre>
===make op_setup===
prepare opsetup script file and push it to target device. <br>
opsetup script will wake up oprofiled and setup trigger event.<br>
The snapshot of opsetup is listed below<br>
<pre>
opcontrol --setup<br>
opcontrol --vmlinux=/home/vincent/project/B2G_20120217/boot/kernel-android-galaxy-s2/vmlinux --kernel-range=0xc059c000, 0xc0c06000 --event=CPU_CYCLES<br>
</pre>
 
===make op_start===
We use "adb shell opcontrol --start" to start profiling and collect samples in target device<br>
===make op_status===
We use "adb shell opcontrol --status" to check profiling status<br>
<pre>
Driver directory: /dev/oprofile
Session directory: /data/oprofile
Counter 0:
    name: CPU_CYCLES
    count: 150000
Counter 1 disabled
Counter 2 disabled
Counter 3 disabled
Counter 4 disabled
oprofiled pid: 3074
profiler is running
      5621 samples received
          0 samples lost overflow
</pre>
===make op_stop===
we use "adb shell opcontrol --stop" to stop profiling<br>
===make op_pull===
pull profiling samples from target device to host PC and copy the related binary files to correlate symbols and memory address
===make op_show===
use opreport to analysis profiling samples<br>
use sudo apt-get install oprofile to install it in your host system
<pre>
CPU: ARM Cortex-A9, speed 0 MHz (estimated)
Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 150000
samples  %        image name              app name                symbol name
5438      9.9701  libmozglue.so            libmozglue.so            __aeabi_idiv
2811      5.1537  libGLESv2_mali.so        libGLESv2_mali.so        /system/lib/egl/libGLESv2_mali.so
2348      4.3049  libc.so                  libc.so                  __aeabi_idiv
2083      3.8190  libxul.so                libxul.so                pixman_composite_over_8888_8_8888_asm_neon
1556      2.8528  libxul.so                libxul.so                pixman_composite_over_8888_8888_asm_neon
1337      2.4513  libxul.so                libxul.so                pixman_scaled_bilinear_scanline_8888_8888_OVER_asm_neon
594      1.0890  libc.so                  libc.so                  timesub
578      1.0597  libxul.so                libxul.so                __aeabi_l2f
547      1.0029  libmozglue.so            libmozglue.so            __aeabi_uidiv
421      0.7719  libc.so                  libc.so                  localsub
383      0.7022  libc.so                  libc.so                  memset
357      0.6545  libxul.so                libxul.so                pixman_composite_over_n_8888_asm_neon
341      0.6252  libxul.so                libxul.so                pixman_composite_over_n_8_8888_asm_neon
308      0.5647  libxul.so                libxul.so                pixman_composite_src_8888_8888_asm_neon
304      0.5574  libm.so                  libm.so                  floor
211      0.3869  libc.so                  libc.so                  __findenv
208      0.3814  libc.so                  libc.so                  pthread_mutex_lock
201      0.3685  libmozglue.so            libmozglue.so            arena_malloc
193      0.3538  libmozglue.so            libmozglue.so            arena_dalloc
180      0.3300  libm.so                  libm.so                  fmod
177      0.3245  libxul.so                libxul.so                nsIFrame::FinishAndStoreOverflow(nsOverflowAreas&, nsSize)
176      0.3227  libc.so                  libc.so                  __system_property_find
174      0.3190  libxul.so                libxul.so                gfx3DMatrix::Transform3D(gfxPoint3D const&) const
171      0.3135  libxul.so                libxul.so                pixman_composite_src_n_8888_asm_neon
162      0.2970  libc.so                  libc.so                  time2sub.clone.2
161      0.2952  libxul.so                libxul.so                PL_DHashTableOperate
</pre>
 
= Profilingwith perf =
The perf utility is a performance analysis tools for Linux.
The perf utility is a performance analysis tools for Linux.



Revision as of 02:07, 1 March 2013

Profiling with perf

The perf utility is a performance analysis tools for Linux.

Setup

The profiling data is collected at target device, and the report been generated at host side.
You need to install perf tool at host side, and create a directory for kernel and libraries with symbols.

  • Install perf at host side for Ubuntu
$ sudo apt-get install linux-tools
$ perf --version
perf version 3.0.17 
  • Create direcotry for libaries with symbols
    Here's a B2G makefile helper to create this directory.
$ make perf-create-symfs

Real time report

On target device, use perf top to generate and display performance counters in real time.

# perf top -p `pidof b2g`

The output will be like this:

  PerfTop:     388 irqs/sec  kernel:13.1%  exact:  0.0% [1000Hz cycles],  (target_pid: 7852)
-------------------------------------------------------------------------------

             samples  pcnt function                           DSO
             _______ _____ __________________________________ _________________

              403.00 31.8% _downsample_2x2_rgba8888           libGLESv2_mali.so
              119.00  9.4% JaegerStubVeneer                   libxul.so        
               93.00  7.3% _raw_spin_unlock_irqrestore        [kernel.kallsyms]
               59.00  4.7% _m200_texture_deinterleave_16x16_b libMali.so       
               56.00  4.4% memcpy                             libc.so          
               40.00  3.2% finish_task_switch                 [kernel.kallsyms]
               37.00  2.9% vfprintf                           libc.so          
               23.00  1.8% _gles_fb_tex_sub_image_2d          libGLESv2_mali.so
               16.00  1.3% __sfvwrite                         libc.so          
               16.00  1.3% __do_softirq                       [kernel.kallsyms]
               15.00  1.2% __memzero                          [kernel.kallsyms]
               13.00  1.0% getnstimeofday                     [kernel.kallsyms]
               12.00  0.9% _gles_generate_mipmaps_sw_16x16blo libGLESv2_mali.so
               12.00  0.9% snprintf                           libc.so          
               12.00  0.9% __divsi3                           libmozglue.so    
              10.00  0.8% v7_dma_clean_range                 [kernel.kallsyms]

Recording for a period and generating report

Record at target side: (Hit CTRL-C to stop recording)

# perf record -o /data/local/perf.data -p `pidof b2g`

Generate report at host side:

$ adb pull /data/local/perf.data .
$ perf report --symfs=/tmp/b2g_symfs_galaxys2 --vmlinux=/vmlinux

The output will be like this:

# Events: 4K cycles
#
# Overhead  Command      Shared Object                                                                                                 
# ........  .......  .................  ...............................................................................................
#
     8.00%      b2g  perf-7852.map      [.] 0x438413fc      
     4.46%      b2g  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     4.36%      b2g  [unknown]          [.] 0x43843500      
     2.61%      b2g  [kernel.kallsyms]  [k] finish_task_switch
     1.69%      b2g  libxul.so          [.] JaegerStubVeneer
     1.20%      b2g  libxul.so          [.] TypedArrayTemplate<float>::obj_getElement(JSContext*, JSObject*, JSObject*, unsigned int, J
     1.06%      b2g  libxul.so          [.] void js::mjit::stubs::SetElem<0>(js::VMFrame&)
     1.05%      b2g  libxul.so          [.] js::mjit::stubs::GetElem(js::VMFrame&)
     1.01%      b2g  libc.so            [.] pthread_mutex_lock
     1.00%      b2g  libc.so            [.] memcpy
     0.90%      b2g  libxul.so          [.] JSObject::nativeLookup(JSContext*, int)
     0.88%      b2g  [kernel.kallsyms]  [k] sub_preempt_count
     0.86%      b2g  libGLESv2_mali.so  [.] 0xa3a0          
     0.82%      b2g  [kernel.kallsyms]  [k] add_preempt_count
     0.80%      b2g  [kernel.kallsyms]  [k] __do_softirq
     0.79%      b2g  libxul.so          [.] js_IsTypedArray(JSObject*)
     0.78%      b2g  libMali.so         [.] 0x13be8         
     0.67%      b2g  libxul.so          [.] js::GetPropertyHelper(JSContext*, JSObject*, int, unsigned int, JS::Value*)
     0.66%      b2g  libxul.so          [.] js::PropertyTable::search(int, bool)
     0.66%      b2g  libxul.so          [.] js_GetProperty(JSContext*, JSObject*, JSObject*, int, JS::Value*)
     0.65%      b2g  libc.so            [.] pthread_mutex_unlock
     0.59%      b2g  libxul.so          [.] castNativeFromWrapper(JSContext*, JSObject*, unsigned int, nsISupports**, JS::Value*, XPCLa
     0.57%      b2g  libmozglue.so      [.] __udivsi3
     0.53%      b2g  libxul.so          [.] mozilla::gl::GLContextEGL::MakeCurrentImpl(bool)
     0.52%      b2g  libxul.so          [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
     0.49%      b2g  libxul.so          [.] js::TypedArray::getTypedArray(JSObject*)
     0.49%      b2g  libxul.so          [.] js::GetPropertyOperation(JSContext*, unsigned char*, JS::Value const&, JS::Value*)
     0.48%      b2g  [kernel.kallsyms]  [k] vector_swi
     0.47%      b2g  [kernel.kallsyms]  [k] get_parent_ip
     0.42%      b2g  libxul.so          [.] DisabledGetElem(js::VMFrame&, js::mjit::ic::GetElementIC*)

Recording with callgraph

Use option '-g' to do callgraph recording:

# perf record -g -o /data/local/perf.data -p `pidof b2g`

Note:

  1. To get correct call graph report, you need to compile libaries with "-fno-omit-frame-pointer".
  2. On SGS2 device, it's easy to crash when doing perf with callgraph, this is an issue to be fixed.

System-wide and specific application profiling

Use option '-a' to do system-wide profiling:

# perf record -o /data/local/perf.data -a

Profiling on specified command:

# perf -o /data/local/perf.data /system/b2g/b2g

Use option '-p' to profile an existing process: (On some devices there's no pidof, and you need to use ps to find out b2g PID)

# perf record  -o /data/local/perf.data -p `pidof b2g`

Makefile helpers for perf

Here are B2G makefile helpers to generate perf reports at host side.

  • Create direcotry for libaries with symbols
$ make perf-create-symfs
  • Remove directory for libaries with symbols
$ make perf-clean-symfs
  • Real time perf report for system wide
$ make perf-top
  • Real time report for B2G process
$ make perf-top-b2g
  • Summary perf report for system wide
$ make perf-report
  • Summary perf report for B2G process
$ make perf-report-b2g
  • Change recording duration
    For perf-report-*, it automatically records for 10 seconds then generate report. You can change it by giving argument "RECORD_DURATION".
    Below is an example to record for 30 seconds:
$ make perf-report RECORD_DURATION=30