|
|
(14 intermediate revisions by 8 users not shown) |
Line 1: |
Line 1: |
| = Profiling with the gecko profiler = | | = Profiling with the gecko profiler = |
| | |
| | Good at: Native stacks (with runtime options) + javascript profiling, low overhead sampling, familiar for gecko developers |
|
| |
|
| See [https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Profiling_Boot_to_Gecko_%28with_a_real_device%29 these instructions]. Patches are in-flight to get native stacks in profiles, but that's not in default configurations yet. | | See [https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Profiling_Boot_to_Gecko_%28with_a_real_device%29 these instructions]. Patches are in-flight to get native stacks in profiles, but that's not in default configurations yet. |
|
| |
|
| = Profiling with perf = | | = Profiling with systrace = |
| The perf utility is a performance analysis tools for Linux.
| | Good at: Shows process preemption, shows all calls to instrumented functions, Familiar for android developers |
| | |
| == Setup ==
| |
| The profiling data is collected at target device, and the report been generated at host side.<br>
| |
| You need to install perf tool at host side, and create a directory for kernel and libraries with symbols.
| |
| | |
| * Install perf at host side for Ubuntu
| |
| $ sudo apt-get install linux-tools
| |
| $ perf --version
| |
| perf version 3.0.17
| |
| | |
| * Create direcotry for libaries with symbols<br>Here's a B2G makefile helper to create this directory.
| |
| $ make perf-create-symfs
| |
| | |
| == Real time report ==
| |
| On target device, use perf top to generate and display performance counters in real time.
| |
| # perf top -p `pidof b2g`
| |
| The output will be like this:
| |
| PerfTop: 388 irqs/sec kernel:13.1% exact: 0.0% [1000Hz cycles], (target_pid: 7852)
| |
| -------------------------------------------------------------------------------
| |
|
| |
| samples pcnt function DSO
| |
| _______ _____ __________________________________ _________________
| |
|
| |
| 403.00 31.8% _downsample_2x2_rgba8888 libGLESv2_mali.so
| |
| 119.00 9.4% JaegerStubVeneer libxul.so
| |
| 93.00 7.3% _raw_spin_unlock_irqrestore [kernel.kallsyms]
| |
| 59.00 4.7% _m200_texture_deinterleave_16x16_b libMali.so
| |
| 56.00 4.4% memcpy libc.so
| |
| 40.00 3.2% finish_task_switch [kernel.kallsyms]
| |
| 37.00 2.9% vfprintf libc.so
| |
| 23.00 1.8% _gles_fb_tex_sub_image_2d libGLESv2_mali.so
| |
| 16.00 1.3% __sfvwrite libc.so
| |
| 16.00 1.3% __do_softirq [kernel.kallsyms]
| |
| 15.00 1.2% __memzero [kernel.kallsyms]
| |
| 13.00 1.0% getnstimeofday [kernel.kallsyms]
| |
| 12.00 0.9% _gles_generate_mipmaps_sw_16x16blo libGLESv2_mali.so
| |
| 12.00 0.9% snprintf libc.so
| |
| 12.00 0.9% __divsi3 libmozglue.so
| |
| 10.00 0.8% v7_dma_clean_range [kernel.kallsyms]
| |
| | |
| == Recording for a period and generating report ==
| |
| Record at target side: (Hit CTRL-C to stop recording)
| |
| # perf record -o /data/local/perf.data -p `pidof b2g`
| |
| | |
| Generate report at host side:
| |
| $ adb pull /data/local/perf.data .
| |
| $ perf report --symfs=/tmp/b2g_symfs_galaxys2 --vmlinux=/vmlinux
| |
| The output will be like this:
| |
| # Events: 4K cycles
| |
| #
| |
| # Overhead Command Shared Object
| |
| # ........ ....... ................. ...............................................................................................
| |
| #
| |
| 8.00% b2g perf-7852.map [.] 0x438413fc
| |
| 4.46% b2g [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
| |
| 4.36% b2g [unknown] [.] 0x43843500
| |
| 2.61% b2g [kernel.kallsyms] [k] finish_task_switch
| |
| 1.69% b2g libxul.so [.] JaegerStubVeneer
| |
| 1.20% b2g libxul.so [.] TypedArrayTemplate<float>::obj_getElement(JSContext*, JSObject*, JSObject*, unsigned int, J
| |
| 1.06% b2g libxul.so [.] void js::mjit::stubs::SetElem<0>(js::VMFrame&)
| |
| 1.05% b2g libxul.so [.] js::mjit::stubs::GetElem(js::VMFrame&)
| |
| 1.01% b2g libc.so [.] pthread_mutex_lock
| |
| 1.00% b2g libc.so [.] memcpy
| |
| 0.90% b2g libxul.so [.] JSObject::nativeLookup(JSContext*, int)
| |
| 0.88% b2g [kernel.kallsyms] [k] sub_preempt_count
| |
| 0.86% b2g libGLESv2_mali.so [.] 0xa3a0
| |
| 0.82% b2g [kernel.kallsyms] [k] add_preempt_count
| |
| 0.80% b2g [kernel.kallsyms] [k] __do_softirq
| |
| 0.79% b2g libxul.so [.] js_IsTypedArray(JSObject*)
| |
| 0.78% b2g libMali.so [.] 0x13be8
| |
| 0.67% b2g libxul.so [.] js::GetPropertyHelper(JSContext*, JSObject*, int, unsigned int, JS::Value*)
| |
| 0.66% b2g libxul.so [.] js::PropertyTable::search(int, bool)
| |
| 0.66% b2g libxul.so [.] js_GetProperty(JSContext*, JSObject*, JSObject*, int, JS::Value*)
| |
| 0.65% b2g libc.so [.] pthread_mutex_unlock
| |
| 0.59% b2g libxul.so [.] castNativeFromWrapper(JSContext*, JSObject*, unsigned int, nsISupports**, JS::Value*, XPCLa
| |
| 0.57% b2g libmozglue.so [.] __udivsi3
| |
| 0.53% b2g libxul.so [.] mozilla::gl::GLContextEGL::MakeCurrentImpl(bool)
| |
| 0.52% b2g libxul.so [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
| |
| 0.49% b2g libxul.so [.] js::TypedArray::getTypedArray(JSObject*)
| |
| 0.49% b2g libxul.so [.] js::GetPropertyOperation(JSContext*, unsigned char*, JS::Value const&, JS::Value*)
| |
| 0.48% b2g [kernel.kallsyms] [k] vector_swi
| |
| 0.47% b2g [kernel.kallsyms] [k] get_parent_ip
| |
| 0.42% b2g libxul.so [.] DisabledGetElem(js::VMFrame&, js::mjit::ic::GetElementIC*)
| |
| | |
| == Recording with callgraph ==
| |
| | |
| Use option '-g' to do callgraph recording:
| |
| # perf record -g -o /data/local/perf.data -p `pidof b2g`
| |
| | |
| Note:
| |
| # To get correct call graph report, you need to compile libaries with "-fno-omit-frame-pointer".
| |
| # On SGS2 device, it's easy to crash when doing perf with callgraph, this is an issue to be fixed.
| |
| | |
| == System-wide and specific application profiling ==
| |
| | |
| Use option '-a' to do system-wide profiling:
| |
| # perf record -o /data/local/perf.data -a
| |
|
| |
|
| Profiling on specified command:
| | Bad at: Requires configure option, higher overhead |
| # perf -o /data/local/perf.data /system/b2g/b2g
| |
|
| |
|
| Use option '-p' to profile an existing process: (On some devices there's no pidof, and you need to use ps to find out b2g PID)
| | *Download android sdk to get systrace tool: |
| # perf record -o /data/local/perf.data -p `pidof b2g`
| | **[http://developer.android.com/sdk/index.html 1. download link] |
| | **2. the systrace.py tool is at path-to-android-sdk/tools/systrace |
|
| |
|
| == Makefile helpers for perf ==
| | *Enable systrace in B2G: |
| | **Build with '--enable-systrace' config or just uncomment the MOZ_USE_SYSTRACE define in gecko/tools/profiler/GeckoProfilerImpl.h like: |
| | <pre> |
| | #define MOZ_USE_SYSTRACE |
| | #ifdef MOZ_USE_SYSTRACE |
| | # define ATRACE_TAG ATRACE_TAG_ALWAYS |
| | // We need HAVE_ANDROID_OS to be defined for Trace.h. |
| | // If its not set we will set it temporary and remove it. |
| | # ifndef HAVE_ANDROID_OS |
| | # define HAVE_ANDROID_OS |
| | # define REMOVE_HAVE_ANDROID_OS |
| | # endif |
| | </pre> |
|
| |
|
| Here are B2G makefile helpers to generate perf reports at host side.
| | *How to use systrace: |
| | **[http://developer.android.com/tools/help/systrace.html systrace.py document] |
| | **./systrace.py --time=10 -o mynewtrace.html sched |
|
| |
|
| * Create direcotry for libaries with symbols
| | Note: Gecko code is tagged as ATRACE_TAG_ALWAYS, so we don't set the category type. |
| $ make perf-create-symfs
| |
| * Remove directory for libaries with symbols
| |
| $ make perf-clean-symfs
| |
| * Real time perf report for system wide
| |
| $ make perf-top
| |
| * Real time report for B2G process
| |
| $ make perf-top-b2g
| |
| * Summary perf report for system wide
| |
| $ make perf-report
| |
| * Summary perf report for B2G process
| |
| $ make perf-report-b2g
| |
| * Change recording duration<br>For perf-report-*, it automatically records for 10 seconds then generate report. You can change it by giving argument "RECORD_DURATION".<br>Below is an example to record for 30 seconds:
| |
| $ make perf-report RECORD_DURATION=30
| |