Performance/Fenix/Performance reviews: Difference between revisions

Revision as of 00:38, 5 November 2021

Do you want to know if your change impacts Fenix or Focus performance? If so, here are the methods you can use, in order of preference:

Benchmark: use an automated test to measure the change in duration
Timestamp benchmark: add temporary code and manually test to measure the change in duration
Profile: use a profile to measure the change in duration

The trade-offs for each technique are mentioned in their respective section.

Benchmark

A benchmark is an automated test that measures performance, usually the duration from point A to point B. Automated benchmarks have similar trade-offs to automated functionality tests when compared to one-off manual testing: they can continuously catch regressions and minimize human error. For manual benchmarks in particular, it can be tricky to be consistent about how we aggregate each test run into the results. However, automated benchmarks are time consuming and difficult to write so sometimes it's better to perform manual tests.

TODO

Testing Start Up code

To test start up code, the approach is usually simple:

From the mozilla-mobile/perf-tools repository, use measure_start_up.py.
The arguments for start-up should include your target (Fenix or Focus).
Determine the start-up path that your code affects this could be:
1. cold_main_first_frame: when clicking the app's homescreen icon, this is the duration from process start until the first frame drawn
2. cold_view_nav_start: when opening the browser through an outside link (e.g. a link in gmail), this is the duration from process start until roughly Gecko's Navigation::Start event
After determining the path your changes affect, these are the steps that you should follow:

Example:

Run measure_start_up.py located in perf-tools. Note:
- The usual iteration coumbered list itemnts used is 25. Running less iterations might affect the results due to noise
- Make sure the application you're testing is a fresh install. If testing the Main intent (which is where the browser ends up on its homepage), make sure to clear the onboarding process before testing

 python3 measure_start_up.py -c=25 --product=fenix nightly cold_view_nav_start results.txt

where --count refers to the iteration count.

Once you have gathered your results, you can analyze them using analyze_durations.py in perf-tools.

  python3 analyze_durations.py results.txt

NOTE:For testing before and after to compare changes made to Fenix: repeat these steps, but this time for the code before the changes. Therefore, you could checkout the parent comment (I.e: using git rev-parse ${SHA}^ where ${SHA} is the first commit on the branch where the changes are)

An example of using these steps to review a PR can be found (here).

Testing non start-up changes

Testing for non start-up changes is a bit different than the steps above since the performance team doesn't have tools as of now to test different part of the browser.

The first step here would be to instrument the code to take (manual timings). By getting timings before and after the changes, it could potentially indicate any changes in performance.
Using profiles and markers.
1. (Profiles) can be a good visual representative for performance changes. A simple way to find your code and its changes could be either through the call tree, the flame graph or stack graph. NOTE: some code may be missing from the stack since pro-guard may inline it, or the sampling rate of the profiler is more than the time taken by the code.
2. Another useful tool to find changes in performance is markers. Markers can be good to show the time elapsed between point A and point B or to pin point when a certain action happens.

Timestamp benchmark

TODO

Profile

TODO

@@ Line 8: / Line 8: @@
 == Benchmark ==
+A benchmark is an automated test that measures performance, usually the duration from point A to point B. Automated benchmarks have similar trade-offs to automated functionality tests when compared to one-off manual testing: they can continuously catch regressions and minimize human error. For manual benchmarks in particular, it can be tricky to be consistent about how we aggregate each test run into the results. However, automated benchmarks are time consuming and difficult to write so sometimes it's better to perform manual tests.
 TODO

Performance/Fenix/Performance reviews: Difference between revisions

Revision as of 00:38, 5 November 2021

Contents

Benchmark

Testing Start Up code

Testing non start-up changes

Timestamp benchmark

Profile

Navigation menu

Performance/Fenix/Performance reviews: Difference between revisions

Revision as of 00:38, 5 November 2021

Benchmark

Testing Start Up code

Testing non start-up changes

Timestamp benchmark

Profile

Navigation menu

Search