Performance/Fenix/Performance reviews: Difference between revisions
(Reword in order fo preference) |
(Add benchmark intro) |
||
Line 8: | Line 8: | ||
== Benchmark == | == Benchmark == | ||
A benchmark is an automated test that measures performance, usually the duration from point A to point B. Automated benchmarks have similar trade-offs to automated functionality tests when compared to one-off manual testing: they can continuously catch regressions and minimize human error. For manual benchmarks in particular, it can be tricky to be consistent about how we aggregate each test run into the results. However, automated benchmarks are time consuming and difficult to write so sometimes it's better to perform manual tests. | |||
TODO | TODO | ||
Revision as of 00:38, 5 November 2021
Do you want to know if your change impacts Fenix or Focus performance? If so, here are the methods you can use, in order of preference:
- Benchmark: use an automated test to measure the change in duration
- Timestamp benchmark: add temporary code and manually test to measure the change in duration
- Profile: use a profile to measure the change in duration
The trade-offs for each technique are mentioned in their respective section.
Benchmark
A benchmark is an automated test that measures performance, usually the duration from point A to point B. Automated benchmarks have similar trade-offs to automated functionality tests when compared to one-off manual testing: they can continuously catch regressions and minimize human error. For manual benchmarks in particular, it can be tricky to be consistent about how we aggregate each test run into the results. However, automated benchmarks are time consuming and difficult to write so sometimes it's better to perform manual tests.
TODO
Testing Start Up code
To test start up code, the approach is usually simple:
- From the
mozilla-mobile/perf-tools
repository, usemeasure_start_up.py
.
The arguments for start-up should include your target (Fenix
orFocus
). - Determine the start-up path that your code affects this could be:
cold_main_first_frame
: when clicking the app's homescreen icon, this is the duration from process start until the first frame drawncold_view_nav_start
: when opening the browser through an outside link (e.g. a link in gmail), this is the duration from process start until roughly Gecko's Navigation::Start event
- After determining the path your changes affect, these are the steps that you should follow:
Example:
- Run
measure_start_up.py
located in perf-tools. Note:- The usual iteration coumbered list itemnts used is 25. Running less iterations might affect the results due to noise
- Make sure the application you're testing is a fresh install. If testing the Main intent (which is where the browser ends up on its homepage), make sure to clear the onboarding process before testing
python3 measure_start_up.py -c=25 --product=fenix nightly cold_view_nav_start results.txt
where --count
refers to the iteration count.
- Once you have gathered your results, you can analyze them using
analyze_durations.py
in perf-tools.
python3 analyze_durations.py results.txt
NOTE:For testing before and after to compare changes made to Fenix: repeat these steps, but this time for the code before the changes. Therefore, you could checkout the parent comment (I.e: using git rev-parse ${SHA}^
where ${SHA}
is the first commit on the branch where the changes are)
An example of using these steps to review a PR can be found (here).
Testing non start-up changes
Testing for non start-up changes is a bit different than the steps above since the performance team doesn't have tools as of now to test different part of the browser.
- The first step here would be to instrument the code to take (manual timings). By getting timings before and after the changes, it could potentially indicate any changes in performance.
- Using profiles and markers.
- (Profiles) can be a good visual representative for performance changes. A simple way to find your code and its changes could be either through the call tree, the flame graph or stack graph. NOTE: some code may be missing from the stack since pro-guard may inline it, or the sampling rate of the profiler is more than the time taken by the code.
- Another useful tool to find changes in performance is markers. Markers can be good to show the time elapsed between point A and point B or to pin point when a certain action happens.
Timestamp benchmark
TODO
Profile
TODO