Performance/Fenix: Difference between revisions

Revision as of 00:12, 24 February 2021

Performance testing

Performance tests can have a goal of preventing regressions, measuring absolute performance as experienced by users, or measuring performance vs. a baseline (e.g. comparing fenix to fennec). It can be difficult to write tests that manage all of these. We tend to focus on preventing regressions.

List of tests running in fenix

The perftest team is working to dynamically generate the list of tests that run on the fenix application. Some progress can be seen in this query. Until then, we manually list the tests below.

As of Feb. 23, 2021, we run at least the following performance tests on fenix:

Page load duration: see the query above for a list of sites (sometimes run in automation, sometimes run manually; todo: details)
media playback tests (TODO: details; in the query above, they are prefixed with ytp)
Start up duration (see Terminology for start up type definitions)
- COLD VIEW tests on mach perftest. Runs per master merge to fenix on unreleased Nightly builds so we can identify the commit that caused a regression
- COLD MAIN & VIEW tests on FNPRMS. Runs Nightly on production Nightly builds
Speedometer: JS responsiveness tests (todo: details)
tier 3 unity webGL tests (todo: details)

There are other tests that run on desktop that will cover other parts of the platform.

Offense vs. defense

TODO: merge this into sections below and clean up those sections: not sure how useful it is

Performance can be thought in terms of "Offense" – the changes that you make to actively improve performance – and "Defense" – the systems you have in place to prevent performance regression's (this offense/defense idea from this blog post).

Defense: discouraging use of expensive APIs

In some cases, we want to discourage folks from using expensive APIs such as runBlocking. As a first draft solution, we propose a multi-step check:

Compile-time check throughout the codebase: write a code ownered test asserting the number of references to the API.
1. Question: given the lint rule, should we just count the number of `@Suppress` for this?
2. Question: would it help if this was an annotation processor on our lint rule and we look for @Suppress?
3. Add lint rule to discourage use of the API. This overlaps with the compile-time check, however:
  1. We can't just use the compile-time check because in the best case it'll only run before the git push – it won't appear in the IDE – and the feedback loop will be too long for devs
  2. We can't just use the lint rule because it can be suppressed and we won't notice
Run-time check on critical paths: wrap the API and increment a counter each time it is called. For each critical path (e.g. start up, page load), write a code ownered test asserting the number of calls to the API.
1. Question: is this too "perfect is the enemy of the good?"
2. If you're doing this on a built-in API, you'll need to ban use of the old API e.g. with ktlint rule since it's harder to suppress

App start up

Defense

The FE perf team has the following measures in place to prevent regressions:

Show long term start up trends with Nightly performance tests (note: these are not granular enough to practically identify & fix regressions)
Prevent main thread IO by:
- Crashing on main thread IO in debug builds using StrictMode (code)
- Preventing StrictMode suppressions by running tests that assert for the current known suppression count. We are Code Owners for the tests so we will have a discussion if the count changes (code)
Code added to the start up path should be reviewed by the performance team:
- We are Code Owners for a few files

WIP

We're working on adding:

Regression testing per master merge (issue)
Prevent main thread IO with:
- Static analysis to prevent runBlocking, which can circumvent StrictMode (issue)
Code added to the start-up path should be reviewed by the performance team:
- We're investigating other files that can be separated so we can be code owners for the start up parts (issue)
Minimize component initialization with:
- Avoid unnecessary initialization (issue)
Prevent unnecessarily expensive UI with:
- NestedConstraintLayout static analysis (issue)

Offense

TODO: improve this section, if useful (let us know if it is)

We're keeping a list of the biggest known performance improvements we can make. Also, we have a startup improvement plan.

Terminology

COLD/WARM/HOT start up

Our definitions are similar, but not identical to, the Google Play definitions.

COLD = starting up "from scratch": the process and HomeActivity need to be created
WARM = the process is already created but HomeActivity needs to be created (or recreated)
HOT = basically just foregrounding the app: the process and HomeActivity are already created

MAIN/VIEW start up

These are named after the actions passed inside the Intents received by the app such as ACTION_MAIN:

MAIN = a start up where the app icon was clicked. If there are no existing tabs, the homescreen will be shown. If there are existing tabs, the last selected one will be restored
VIEW = a start up where a link was clicked. In the default case, a new tab will be opened and the URL will be loaded

@@ Line 1: / Line 1: @@
 == Performance testing ==
+Performance tests can have a goal of preventing regressions, measuring absolute performance as experienced by users, or measuring performance vs. a baseline (e.g. comparing fenix to fennec). It can be difficult to write tests that manage all of these. We tend to focus on preventing regressions.
+=== List of tests running in fenix ===
+The perftest team is working to dynamically generate the list of tests that run on the fenix application. Some progress can be seen in [https://sql.telemetry.mozilla.org/queries/77734/source this query]. Until then, we manually list the tests below.
+As of Feb. 23, 2021, we run at least the following performance tests on fenix:
+* Page load duration: see the query above for a list of sites (sometimes run in automation, sometimes run manually; todo: details)
+* media playback tests (TODO: details; in the query above, they are prefixed with ytp)
+* Start up duration (see [[#Terminology|Terminology]] for start up type definitions)
+** COLD VIEW tests on mach perftest. Runs per master merge to fenix on unreleased Nightly builds so we can identify the commit that caused a regression
+** COLD MAIN & VIEW tests on FNPRMS. Runs Nightly on production Nightly builds
+* Speedometer: JS responsiveness tests (todo: details)
+* tier 3 unity webGL tests (todo: details)
+There are other tests that run on desktop that will cover other parts of the platform.
 == Offense vs. defense ==

Performance/Fenix: Difference between revisions

Revision as of 00:12, 24 February 2021

Contents

Performance testing

List of tests running in fenix

Offense vs. defense

Defense: discouraging use of expensive APIs

App start up

Defense

WIP

Offense

Terminology

COLD/WARM/HOT start up

MAIN/VIEW start up

Navigation menu

Performance/Fenix: Difference between revisions

Revision as of 00:12, 24 February 2021

Performance testing

List of tests running in fenix

Offense vs. defense

Defense: discouraging use of expensive APIs

App start up

Defense

WIP

Offense

Terminology

COLD/WARM/HOT start up

MAIN/VIEW start up

Navigation menu

Search