Firefox OS/Performance/Profiling: Difference between revisions
m (typo) |
|||
Line 6: | Line 6: | ||
== Prepare the Linux Kernel == | == Prepare the Linux Kernel == | ||
Please make sure you have turned on below features in | Please make sure you have turned on below features in kernel configuration file. The kernel configuration file will be .config in your linux kernel directory normally. You need to recompile linux kernel after turn on OProfile feature. | ||
<pre> | <pre> |
Revision as of 16:20, 19 April 2012
Profiling with oprofile
OProfile is a system-wide profiler for Linux systems. The detail description about OProfile please refer to below url
http://oprofile.sourceforge.net/news/
OProfile consists of three portions, linux kernel driver, userspace applications and collected profiling samplings.
Prepare the Linux Kernel
Please make sure you have turned on below features in kernel configuration file. The kernel configuration file will be .config in your linux kernel directory normally. You need to recompile linux kernel after turn on OProfile feature.
CONFIG_PROFILING=y CONFIG_OPROFILE=y CONFIG_HAVE_OPROFILE=y
Userspace applications
userspace applications of OProfile includes opcontrol and oprofiled. You can find source code of OProfile in glue/gonk/external/oprofile.
Host application
use host utility opreport to analysis profiling samples
you need to install it in your host system.
sudo apt-get install oprofile
Five Steps to profile your target device
To make it easier to use OProfile on B2G project, several Makefile targets have been written.
make op_setup # start up oprofile make op_start # start profiling make op_status # check status make op_stop # stop profiling make op_pull # pull profile data from phone make op_show # save profiling result in oprofile/oprofile.log
make op_setup
prepare opsetup script file and push it to target device.
opsetup script will wake up oprofiled and setup trigger event.
The snapshot of opsetup is listed below
opcontrol --setup<br> opcontrol --vmlinux=/home/vincent/project/B2G_20120217/boot/kernel-android-galaxy-s2/vmlinux --kernel-range=0xc059c000, 0xc0c06000 --event=CPU_CYCLES<br>
make op_start
We use "adb shell opcontrol --start" to start profiling and collect samples in target device
make op_status
We use "adb shell opcontrol --status" to check profiling status
Driver directory: /dev/oprofile Session directory: /data/oprofile Counter 0: name: CPU_CYCLES count: 150000 Counter 1 disabled Counter 2 disabled Counter 3 disabled Counter 4 disabled oprofiled pid: 3074 profiler is running 5621 samples received 0 samples lost overflow
make op_stop
we use "adb shell opcontrol --stop" to stop profiling
make op_pull
pull profiling samples from target device to host PC and copy the related binary files to correlate symbols and memory address
make op_show
use opreport to analysis profiling samples
use sudo apt-get install oprofile to install it in your host system
CPU: ARM Cortex-A9, speed 0 MHz (estimated) Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 150000 samples % image name app name symbol name 5438 9.9701 libmozglue.so libmozglue.so __aeabi_idiv 2811 5.1537 libGLESv2_mali.so libGLESv2_mali.so /system/lib/egl/libGLESv2_mali.so 2348 4.3049 libc.so libc.so __aeabi_idiv 2083 3.8190 libxul.so libxul.so pixman_composite_over_8888_8_8888_asm_neon 1556 2.8528 libxul.so libxul.so pixman_composite_over_8888_8888_asm_neon 1337 2.4513 libxul.so libxul.so pixman_scaled_bilinear_scanline_8888_8888_OVER_asm_neon 594 1.0890 libc.so libc.so timesub 578 1.0597 libxul.so libxul.so __aeabi_l2f 547 1.0029 libmozglue.so libmozglue.so __aeabi_uidiv 421 0.7719 libc.so libc.so localsub 383 0.7022 libc.so libc.so memset 357 0.6545 libxul.so libxul.so pixman_composite_over_n_8888_asm_neon 341 0.6252 libxul.so libxul.so pixman_composite_over_n_8_8888_asm_neon 308 0.5647 libxul.so libxul.so pixman_composite_src_8888_8888_asm_neon 304 0.5574 libm.so libm.so floor 211 0.3869 libc.so libc.so __findenv 208 0.3814 libc.so libc.so pthread_mutex_lock 201 0.3685 libmozglue.so libmozglue.so arena_malloc 193 0.3538 libmozglue.so libmozglue.so arena_dalloc 180 0.3300 libm.so libm.so fmod 177 0.3245 libxul.so libxul.so nsIFrame::FinishAndStoreOverflow(nsOverflowAreas&, nsSize) 176 0.3227 libc.so libc.so __system_property_find 174 0.3190 libxul.so libxul.so gfx3DMatrix::Transform3D(gfxPoint3D const&) const 171 0.3135 libxul.so libxul.so pixman_composite_src_n_8888_asm_neon 162 0.2970 libc.so libc.so time2sub.clone.2 161 0.2952 libxul.so libxul.so PL_DHashTableOperate
Profilingwith perf
The perf utility is a performance analysis tools for Linux.
Setup
The profiling data is collected at target device, and the report been generated at host side.
You need to install perf tool at host side, and create a directory for kernel and libraries with symbols.
- Install perf at host side for Ubuntu
$ sudo apt-get install linux-tools $ perf --version perf version 3.0.17
- Create direcotry for libaries with symbols
Here's a B2G makefile helper to create this directory.
$ make perf-create-symfs
Real time report
On target device, use perf top to generate and display performance counters in real time.
# perf top -p `pidof b2g`
The output will be like this:
PerfTop: 388 irqs/sec kernel:13.1% exact: 0.0% [1000Hz cycles], (target_pid: 7852) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ __________________________________ _________________ 403.00 31.8% _downsample_2x2_rgba8888 libGLESv2_mali.so 119.00 9.4% JaegerStubVeneer libxul.so 93.00 7.3% _raw_spin_unlock_irqrestore [kernel.kallsyms] 59.00 4.7% _m200_texture_deinterleave_16x16_b libMali.so 56.00 4.4% memcpy libc.so 40.00 3.2% finish_task_switch [kernel.kallsyms] 37.00 2.9% vfprintf libc.so 23.00 1.8% _gles_fb_tex_sub_image_2d libGLESv2_mali.so 16.00 1.3% __sfvwrite libc.so 16.00 1.3% __do_softirq [kernel.kallsyms] 15.00 1.2% __memzero [kernel.kallsyms] 13.00 1.0% getnstimeofday [kernel.kallsyms] 12.00 0.9% _gles_generate_mipmaps_sw_16x16blo libGLESv2_mali.so 12.00 0.9% snprintf libc.so 12.00 0.9% __divsi3 libmozglue.so 10.00 0.8% v7_dma_clean_range [kernel.kallsyms]
Recording for a period and generating report
Record at target side: (Hit CTRL-C to stop recording)
# perf record -o /data/local/perf.data -p `pidof b2g`
Generate report at host side:
$ adb pull /data/local/perf.data . $ perf report --symfs=/tmp/b2g_symfs_galaxys2 --vmlinux=/vmlinux
The output will be like this:
# Events: 4K cycles # # Overhead Command Shared Object # ........ ....... ................. ............................................................................................... # 8.00% b2g perf-7852.map [.] 0x438413fc 4.46% b2g [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 4.36% b2g [unknown] [.] 0x43843500 2.61% b2g [kernel.kallsyms] [k] finish_task_switch 1.69% b2g libxul.so [.] JaegerStubVeneer 1.20% b2g libxul.so [.] TypedArrayTemplate<float>::obj_getElement(JSContext*, JSObject*, JSObject*, unsigned int, J 1.06% b2g libxul.so [.] void js::mjit::stubs::SetElem<0>(js::VMFrame&) 1.05% b2g libxul.so [.] js::mjit::stubs::GetElem(js::VMFrame&) 1.01% b2g libc.so [.] pthread_mutex_lock 1.00% b2g libc.so [.] memcpy 0.90% b2g libxul.so [.] JSObject::nativeLookup(JSContext*, int) 0.88% b2g [kernel.kallsyms] [k] sub_preempt_count 0.86% b2g libGLESv2_mali.so [.] 0xa3a0 0.82% b2g [kernel.kallsyms] [k] add_preempt_count 0.80% b2g [kernel.kallsyms] [k] __do_softirq 0.79% b2g libxul.so [.] js_IsTypedArray(JSObject*) 0.78% b2g libMali.so [.] 0x13be8 0.67% b2g libxul.so [.] js::GetPropertyHelper(JSContext*, JSObject*, int, unsigned int, JS::Value*) 0.66% b2g libxul.so [.] js::PropertyTable::search(int, bool) 0.66% b2g libxul.so [.] js_GetProperty(JSContext*, JSObject*, JSObject*, int, JS::Value*) 0.65% b2g libc.so [.] pthread_mutex_unlock 0.59% b2g libxul.so [.] castNativeFromWrapper(JSContext*, JSObject*, unsigned int, nsISupports**, JS::Value*, XPCLa 0.57% b2g libmozglue.so [.] __udivsi3 0.53% b2g libxul.so [.] mozilla::gl::GLContextEGL::MakeCurrentImpl(bool) 0.52% b2g libxul.so [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 0.49% b2g libxul.so [.] js::TypedArray::getTypedArray(JSObject*) 0.49% b2g libxul.so [.] js::GetPropertyOperation(JSContext*, unsigned char*, JS::Value const&, JS::Value*) 0.48% b2g [kernel.kallsyms] [k] vector_swi 0.47% b2g [kernel.kallsyms] [k] get_parent_ip 0.42% b2g libxul.so [.] DisabledGetElem(js::VMFrame&, js::mjit::ic::GetElementIC*)
Recording with callgraph
Use option '-g' to do callgraph recording:
# perf record -g -o /data/local/perf.data -p `pidof b2g`
Note:
- To get correct call graph report, you need to compile libaries with "-fno-omit-frame-pointer".
- On SGS2 device, it's easy to crash when doing perf with callgraph, this is an issue to be fixed.
System-wide and specific application profiling
Use option '-a' to do system-wide profiling:
# perf record -o /data/local/perf.data -a
Profiling on specified command:
# perf -o /data/local/perf.data /system/b2g/b2g
Use option '-p' to profile an existing process: (On some devices there's no pidof, and you need to use ps to find out b2g PID)
# perf record -o /data/local/perf.data -p `pidof b2g`
Makefile helpers for perf
Here are B2G makefile helpers to generate perf reports at host side.
- Create direcotry for libaries with symbols
$ make perf-create-symfs
- Remove directory for libaries with symbols
$ make perf-clean-symfs
- Real time perf report for system wide
$ make perf-top
- Real time report for B2G process
$ make perf-top-b2g
- Summary perf report for system wide
$ make perf-report
- Summary perf report for B2G process
$ make perf-report-b2g
- Change recording duration
For perf-report-*, it automatically records for 10 seconds then generate report. You can change it by giving argument "RECORD_DURATION".
Below is an example to record for 30 seconds:
$ make perf-report RECORD_DURATION=30