Performance/Fenix

Critical flows

Any areas of the app where users might retain significantly less if performance regressed are considered to be critical flows. For example, if pages took twice as long to load, we might expect to see some users stop using the app.

When analyzing a critical flow for performance issues, it's essential to know what code is relevant. The endpoints tell you when the flow starts and stops. When using the profiler, it's common to constrain the timeline to these endpoints. The bottleneck is the resource (e.g. CPU, disk) that is maxed out and preventing the device from reaching the flow's final endpoint sooner (e.g. removing expensive computation may not improve perf if the bottleneck is the desk).

Our critical flows, their endpoints, and their bottlenecks are listed below.

Start up

There are a few different types of start up: see Terminology for clarifications.

COLD MAIN (to homescreen) start
- Endpoints: process start and visual completeness (i.e. the homescreen is fully drawn)
  - Neither endpoint is available in the profiler. Surrogates are Application.onCreate or *Activity.onCreate to the first frame is drawn (the last two have profiler markers)
- Bottleneck: we believe it's the main thread
- Misc: a possibly important event for perceived performance is when the first frame is drawn

WARM MAIN (to homescreen) start
- Endpoints: IntentReceiverActivity.onCreate and visual completeness.
- Bottleneck: see COLD MAIN
- Misc: see COLD MAIN

COLD VIEW start
- Endpoints: process start and GeckoSession.load
  - The latter endpoint is available as a profiler marker
- Bottleneck: unknown
- Misc: a possibly important event for perceived performance is when the first frame is drawn

WARM VIEW start
- Endpoints: IntentReceiverActivity.onCreate and GeckoSession.load
- Bottleneck: see COLD VIEW
- Misc: see COLD VIEW

In addition to these types of start up, there are many states/configurations a client can start up with: for example, they can have FxA set up or not set up, they may have 1000s or 0 bookmarks, etc. We haven't yet found any to have a significant impact on performance but, to be honest, we haven't yet to investigate deeply.

Page load

TODO

Search experience

TODO

Performance testing

Performance tests can have a goal of preventing regressions, measuring absolute performance as experienced by users, or measuring performance vs. a baseline (e.g. comparing fenix to fennec). It can be difficult to write tests that manage all of these. We tend to focus on preventing regressions.

List of tests running in fenix

The perftest team is working to dynamically generate the list of tests that run on the fenix application. Some progress can be seen in this query and this treeherder page. Until then, we manually list the tests below.

As of Feb. 23, 2021, we run at least the following performance tests on fenix:

Page load duration: see the query above for a list of sites (sometimes run in automation, sometimes run manually; todo: details)
media playback tests (TODO: details; in the query above, they are prefixed with ytp)
Start up duration (see Terminology for start up type definitions)
- COLD VIEW tests on mach perftest. Runs per master merge to fenix on unreleased Nightly builds so we can identify the commit that caused a regression
- COLD MAIN & VIEW tests on FNPRMS. Runs Nightly on production Nightly builds. This is being transitioned out in favor of mach perftest.
Speedometer: JS responsiveness tests (todo: details)
tier 3 unity webGL tests (todo: details)

There are other tests that run on desktop that will cover other parts of the platform. We also have other methodologies to check for excessive resource use including lint rules and UI tests that measure things such as

Notable gaps in our test coverage includes:

Duration testing for front-end UI flows such as the search experience
Testing on non-Nightly builds (does this apply outside of start up?)

Offense vs. defense

TODO: merge this into sections below and clean up those sections: not sure how useful it is

Performance can be thought in terms of "Offense" – the changes that you make to actively improve performance – and "Defense" – the systems you have in place to prevent performance regression's (this offense/defense idea from this blog post).

Defense: discouraging use of expensive APIs

In some cases, we want to discourage folks from using expensive APIs such as runBlocking. As a first draft solution, we propose a multi-step check:

Compile-time check throughout the codebase: write a code ownered test asserting the number of references to the API.
1. Question: given the lint rule, should we just count the number of `@Suppress` for this?
2. Question: would it help if this was an annotation processor on our lint rule and we look for @Suppress?
3. Add lint rule to discourage use of the API. This overlaps with the compile-time check, however:
  1. We can't just use the compile-time check because in the best case it'll only run before the git push – it won't appear in the IDE – and the feedback loop will be too long for devs
  2. We can't just use the lint rule because it can be suppressed and we won't notice
Run-time check on critical paths: wrap the API and increment a counter each time it is called. For each critical path (e.g. start up, page load), write a code ownered test asserting the number of calls to the API.
1. Question: is this too "perfect is the enemy of the good?"
2. If you're doing this on a built-in API, you'll need to ban use of the old API e.g. with ktlint rule since it's harder to suppress

App start up

Defense

The FE perf team has the following measures in place to prevent regressions:

Show long term start up trends with Nightly performance tests (note: these are not granular enough to practically identify & fix regressions)
Prevent main thread IO by:
- Crashing on main thread IO in debug builds using StrictMode (code)
- Preventing StrictMode suppressions by running tests that assert for the current known suppression count. We are Code Owners for the tests so we will have a discussion if the count changes (code)
Code added to the start up path should be reviewed by the performance team:
- We are Code Owners for a few files

WIP

We're working on adding:

Regression testing per master merge (issue)
Prevent main thread IO with:
- Static analysis to prevent runBlocking, which can circumvent StrictMode (issue)
Code added to the start-up path should be reviewed by the performance team:
- We're investigating other files that can be separated so we can be code owners for the start up parts (issue)
Minimize component initialization with:
- Avoid unnecessary initialization (issue)
Prevent unnecessarily expensive UI with:
- NestedConstraintLayout static analysis (issue)

Offense

TODO: improve this section, if useful (let us know if it is)

We're keeping a list of the biggest known performance improvements we can make. Also, we have a startup improvement plan.

Terminology

COLD/WARM/HOT start up

Our definitions are similar, but not identical to, the Google Play definitions.

COLD = starting up "from scratch": the process and HomeActivity need to be created
WARM = the process is already created but HomeActivity needs to be created (or recreated)
HOT = basically just foregrounding the app: the process and HomeActivity are already created

MAIN/VIEW start up

These are named after the actions passed inside the Intents received by the app such as ACTION_MAIN:

MAIN = a start up where the app icon was clicked. If there are no existing tabs, the homescreen will be shown. If there are existing tabs, the last selected one will be restored
VIEW = a start up where a link was clicked. In the default case, a new tab will be opened and the URL will be loaded