Performance/Fenix/Performance reviews: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
Line 3: Line 3:
# '''Benchmark in CI:''' not yet available. However, it'd be preferred as it's the most consistent
# '''Benchmark in CI:''' not yet available. However, it'd be preferred as it's the most consistent
# [[#Benchmark locally|'''Benchmark locally:''']] use an automated test to measure the change in duration
# [[#Benchmark locally|'''Benchmark locally:''']] use an automated test to measure the change in duration
# [[#Timestamp benchmark|'''Timestamp benchmark:''']] add temporary code and manually test to measure the change in duration
# [[#Timestamp benchmark|'''Timestamp benchmark:''']] add temporary code and manually measure the change in duration
# [[#Profile|'''Profile:''']] use a profile to measure the change in duration
# [[#Profile|'''Profile:''']] use a profile to measure the change in duration



Revision as of 18:52, 5 November 2021

Do you want to know if your change impacts Fenix or Focus performance? If so, here are the methods you can use, in order of preference:

  1. Benchmark in CI: not yet available. However, it'd be preferred as it's the most consistent
  2. Benchmark locally: use an automated test to measure the change in duration
  3. Timestamp benchmark: add temporary code and manually measure the change in duration
  4. Profile: use a profile to measure the change in duration

The trade-offs for each technique are mentioned in their respective section.

Benchmark locally

A benchmark is an automated test that measures performance, usually the duration from point A to point B. Automated benchmarks have similar trade-offs to automated functionality tests when compared to one-off manual testing: they can continuously catch regressions and minimize human error. For manual benchmarks in particular, it can be tricky to be consistent about how we aggregate each test run into the results. However, automated benchmarks are time consuming and difficult to write so sometimes it's better to perform manual tests.

To benchmark, do the following:

  1. Select a benchmark that measures your change or write a new one yourself
  2. Run the benchmark on the commit before your change
  3. Run the benchmark on the commit after your change
  4. Compare the results: generally, this means comparing the median

We currently support the following benchmarks:

Measuring start up duration

To measure the start up duration, the approach is usually simple:

  1. From the mozilla-mobile/perf-tools repository, use measure_start_up.py.
    The arguments for start-up should include your target (Fenix or Focus).
  2. Determine the start-up path that your code affects this could be:
    1. cold_main_first_frame: when clicking the app's homescreen icon, this is the duration from process start until the first frame drawn
    2. cold_view_nav_start: when opening the browser through an outside link (e.g. a link in gmail), this is the duration from process start until roughly Gecko's Navigation::Start event
  3. After determining the path your changes affect, these are the steps that you should follow:

Example:

  • Run measure_start_up.py located in perf-tools. Note:
    • The usual iteration coumbered list itemnts used is 25. Running less iterations might affect the results due to noise
    • Make sure the application you're testing is a fresh install. If testing the Main intent (which is where the browser ends up on its homepage), make sure to clear the onboarding process before testing
 python3 measure_start_up.py -c=25 --product=fenix nightly cold_view_nav_start results.txt

where -c refers to the iteration count. The default of 25 should be good.

  • Once you have gathered your results, you can analyze them using analyze_durations.py in perf-tools.
  python3 analyze_durations.py results.txt


NOTE:For testing before and after to compare changes made to Fenix: repeat these steps, but this time for the code before the changes. Therefore, you could checkout the parent comment (I.e: using git rev-parse ${SHA}^ where ${SHA} is the first commit on the branch where the changes are)

An example of using these steps to review a PR can be found (here).

Testing non start-up changes

Testing for non start-up changes is a bit different than the steps above since the performance team doesn't have tools as of now to test different part of the browser.

  1. The first step here would be to instrument the code to take (manual timings). By getting timings before and after the changes, it could potentially indicate any changes in performance.
  2. Using profiles and markers.
    1. (Profiles) can be a good visual representative for performance changes. A simple way to find your code and its changes could be either through the call tree, the flame graph or stack graph. NOTE: some code may be missing from the stack since pro-guard may inline it, or the sampling rate of the profiler is more than the time taken by the code.
    2. Another useful tool to find changes in performance is markers. Markers can be good to show the time elapsed between point A and point B or to pin point when a certain action happens.

Timestamp benchmark

TODO

Profile

TODO