Performance/Fenix: Difference between revisions

(Add best practices)
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page describes some basics of Fenix performance. For an in-depth look at some specific topics, see:
This page describes some basics of Fenix performance. For an in-depth look at some specific topics, see:
* [[Performance/Fenix/Best Practices|Best Practices]] for tips to write performant code
* [[Performance/Fenix/Best Practices|Best Practices]] for tips to write performant code
* [[Performance/Fenix/Getting Started|Getting Started]] for comparison of profiling and benchmarking tools
* [[Performance/Fenix/Profilers and Tools|Profilers and Tools]] for comparison of profiling and benchmarking tools
 
* [[Performance/Fenix/Performance reviews|Performance reviews]] for how to know if your change impacts performance
== Critical flows ==
Any areas of the app where users might retain significantly less if performance regressed are considered to be '''critical flows'''. For example, if pages took twice as long to load, we might expect to see some users stop using the app.
 
When analyzing a critical flow for performance issues, it's essential to know what code is relevant. The '''endpoints''' tell you when the flow starts and stops. When using the profiler, it's common to constrain the timeline to these endpoints. The '''bottleneck''' is the resource (e.g. CPU, disk) that is maxed out and preventing the device from reaching the flow's final endpoint sooner (e.g. removing expensive computation may not improve perf if the bottleneck is the desk).
 
Our critical flows, their endpoints, and their bottlenecks are listed below.
 
=== Start up ===
There are a few different types of start up: see [[#Terminology|Terminology]] for clarifications.
 
* COLD MAIN (to homescreen) start
**Endpoints: process start and visual completeness (i.e. the homescreen is fully drawn)
***Neither endpoint is available in the profiler. Surrogates are <code>Application.onCreate</code> or <code>*Activity.onCreate</code> to the first frame is drawn (the last two have profiler markers)
**Bottleneck: we believe it's the main thread
**Misc: a possibly important event for perceived performance is when the first frame is drawn
 
* WARM MAIN (to homescreen) start
**Endpoints: <code>MigrationDecisionActivity.onCreate</code> (Beta & Release builds) or <code>HomeActivity.onCreate</code> (Nightly & debug builds) and visual completeness.
**Bottleneck: see COLD MAIN
**Misc: see COLD MAIN
 
* COLD VIEW start
**Endpoints: process start and <code>GeckoSession.load</code>
***The latter endpoint is available as a profiler marker
**Bottleneck: unknown
**Misc: a possibly important event for perceived performance is when the first frame is drawn
 
* WARM VIEW start
**Endpoints: <code>IntentReceiverActivity.onCreate</code> and <code>GeckoSession.load</code>
**Bottleneck: see COLD VIEW
**Misc: see COLD VIEW
 
In addition to these types of start up, there are many states/configurations a client can start up with: for example, they can have FxA set up or not set up, they may have 1000s or 0 bookmarks, etc. We haven't yet found any to have a significant impact on performance but, to be honest, we haven't yet to investigate deeply.
 
=== Page load ===
TODO
 
=== Search experience ===
TODO


== Performance testing ==
== Performance testing ==
Performance tests can have a goal of preventing regressions, measuring absolute performance as experienced by users, or measuring performance vs. a baseline (e.g. comparing fenix to fennec). It can be difficult to write tests that manage all of these. We tend to focus on preventing regressions.
Performance tests can have a goal of preventing regressions, measuring absolute performance as experienced by users, or measuring performance vs. a baseline (e.g. comparing fenix to fennec). It can be difficult to write tests that manage all of these. We tend to focus on preventing regressions.


=== List of tests running in fenix ===
=== Dashboards ===
The perftest team is working to dynamically generate the list of tests that run on the fenix application. Some progress can be seen in [https://sql.telemetry.mozilla.org/queries/77734/source this query] and [https://treeherder.mozilla.org/perfherder/tests this treeherder page]. Until then, we manually list the tests below.
We have a series of dashboards that we review during our biweekly performance sync meeting. The dashboards may be unstable and '''may be difficult to interpret so be cautious when drawing conclusions from the results.''' A shortcoming is that we only run these tests on Nightly builds. Here are the current dashboards:
* [https://earthangel-b40313e5.influxcloud.net/d/DfK1IhzGz/fenix-startup-testing-per-device?orgId=1&refresh=1d Start up duration]: this is represents COLD MAIN (app icon launch) to <code>reportFullyDrawn</code>/visual completeness and COLD VIEW (app link) to page load complete (i.e. this includes a variable-duration network call) across a range of devices. We're trying to replace this with more stable tests
* [https://earthangel-b40313e5.influxcloud.net/d/uYAfY3eGk/fenix-page-load-speedindex-geomean?orgId=1 Page load duration]: we're iterating on the presentation to make this more useful. More complex visualizations are available [https://earthangel-b40313e5.influxcloud.net/d/uYAfY3eGk/fenix-page-load-speedindex-geomean?orgId=1&search=open&folder=current in the grafana folder], such as [https://earthangel-b40313e5.influxcloud.net/d/ZmV33sqMk/fenix-page-load-pixel-2-vizrange-combined?orgId=1 the tests for Pixel 2]
* App size: via Google Play
 
=== Unmonitored tests running in fenix ===
In addition to the tests we actively look at above, there are other tests that run in mozilla-central on fenix or GeckoView example. '''We're not sure who looks at these.''' The perftest team is working to dynamically generate the list of tests that run. Some progress can be seen in [https://sql.telemetry.mozilla.org/queries/77734/source this query] and [https://treeherder.mozilla.org/perfherder/tests this treeherder page]. Until then, we manually list the tests below.


As of Feb. 23, 2021, we run at least the following performance tests on fenix:
As of Feb. 23, 2021, we run at least the following performance tests on fenix:
* Page load duration: see the query above for a list of sites (sometimes run in automation, sometimes run manually; todo: details)
* Additional page load duration tests: see the query above for a list of sites (sometimes run in automation, sometimes run manually; todo: details)
* media playback tests (TODO: details; in the query above, they are prefixed with ytp)
* media playback tests (TODO: details; in the query above, they are prefixed with ytp)
* Start up duration (see [[#Terminology|Terminology]] for start up type definitions)
* Start up duration via mach perftest
** COLD VIEW tests on mach perftest. Runs per master merge to fenix on unreleased Nightly builds so we can identify the commit that caused a regression
** COLD MAIN & VIEW tests on FNPRMS. Runs Nightly on production Nightly builds. This is being transitioned out in favor of mach perftest.
* Speedometer: JS responsiveness tests (todo: details)
* Speedometer: JS responsiveness tests (todo: details)
* tier 3 unity webGL tests (todo: details)
* tier 3 unity webGL tests (todo: details)


There are other tests that run on desktop that will cover other parts of the platform. We also have other methodologies to check for excessive resource use including lint rules and UI tests that measure things such as
There are other tests that run on desktop that will cover other parts of the platform.
 
Notable gaps in our test coverage includes:
* Duration testing for front-end UI flows such as the search experience
* Testing on non-Nightly builds (does this apply outside of start up?)
 
== Offense vs. defense ==
TODO: merge this into sections below and clean up those sections: not sure how useful it is
 
Performance can be thought in terms of "Offense" – the changes that you make to actively improve performance – and "Defense" – the systems you have in place to prevent performance regression's (this offense/defense idea from [https://medium.com/@ricomariani/dos-and-don-ts-for-performance-teams-7f52c41b5355?source=rss-e6e91dab0708------2 this blog post]).


== Defense: discouraging use of expensive APIs ==
== Preventing regressions automatically ==
In some cases, we want to discourage folks from using expensive APIs such as <code>runBlocking</code>. As a first draft solution, we propose a multi-step check:
We use the following measures:
# '''Compile-time check throughout the codebase:''' write a code ownered test asserting the number of references to the API.
* Crash on main thread IO in debug builds using <code>StrictMode</code> ([https://github.com/mozilla-mobile/fenix/blob/13f33049122e0f06c026632812dee405360c53b0/app/src/main/java/org/mozilla/fenix/StrictModeManager.kt#L63-L69 code])
## ''Question: given the lint rule, should we just count the number of `@Suppress` for this?''
* Use [https://searchfox.org/mozilla-mobile/rev/3af703be7790ff00f78d15465f3b8bb0fde0dccc/fenix/app/src/androidTest/java/org/mozilla/fenix/perf/StartupExcessiveResourceUseTest.kt#103 our StartupExcessiveResourceUseTest], for which we are Code Owners, to:
## ''Question: would it help if this was an annotation processor on our lint rule and we look for <code>@Suppress</code>?''
** Avoid StrictMode suppressions
## '''Add lint rule to discourage use of the API.''' This overlaps with the compile-time check, however:
** Avoid <code>runBlocking</code> calls
### We can't just use the compile-time check because in the best case it'll only run before the git push – it won't appear in the IDE – and the feedback loop will be too long for devs
** Avoid additional component initialization
### We can't just use the lint rule because it can be suppressed and we won't notice
** Avoid increasing the view hierarchy depth
# '''Run-time check on critical paths:''' wrap the API and increment a counter each time it is called. For each critical path (e.g. start up, page load), write a code ownered test asserting the number of calls to the API.
** Avoid having ConstraintLayout as a RecyclerView child
## ''Question: is this too "perfect is the enemy of the good?"''
** Avoid increasing the number of inflations
## '''If you're doing this on a built-in API, you'll need to ban use of the old API e.g. with ktlint rule since it's harder to suppress'''
* Use lint to avoid multiple ConstraintLayouts in the same file ([https://searchfox.org/mozilla-mobile/rev/3af703be7790ff00f78d15465f3b8bb0fde0dccc/fenix/mozilla-lint-rules/src/main/java/org/mozilla/fenix/lintrules/perf/ConstraintLayoutPerfDetector.kt code])


== App start up ==
== How to measure what users experience ==
=== Defense ===
When analyzing performance, it's critical to measure the app as users experience it. This section describes how to do that and avoid pitfalls. Note: our automated measurement tools, such as the [https://github.com/mozilla-mobile/perf-tools/blob/main/measure_start_up.py <code>measure_start_up.py</code> script], will always use our most up-to-date techniques while this page may get outdated. Prefer to use automated systems if practical and read the source if you have questions!
The FE perf team has the following measures in place to prevent regressions:
* Show long term start up trends with Nightly performance tests (note: these are not granular enough to practically identify & fix regressions)
* Prevent main thread IO by:
** Crashing on main thread IO in debug builds using <code>StrictMode</code> ([https://github.com/mozilla-mobile/fenix/blob/13f33049122e0f06c026632812dee405360c53b0/app/src/main/java/org/mozilla/fenix/StrictModeManager.kt#L63-L69 code])
** Preventing <code>StrictMode</code> suppressions by running tests that assert for the current known suppression count. We are Code Owners for the tests so we will have a discussion if the count changes ([https://github.com/mozilla-mobile/fenix/blob/13f33049122e0f06c026632812dee405360c53b0/app/src/androidTest/java/org/mozilla/fenix/ui/StrictModeStartupSuppressionCountTest.kt#L48-L57 code])
* Code added to the start up path should be reviewed by the performance team:
** We are Code Owners for a few files


==== WIP ====
When measuring performance manually, you might follow a pattern like the following (see the footnotes for explanations):
We're working on adding:
* Regression testing per master merge ([https://github.com/mozilla-mobile/perf-frontend-issues/issues/162 issue])
* Prevent main thread IO with:
** Static analysis to prevent <code>runBlocking</code>, which can circumvent <code>StrictMode</code> ([https://github.com/mozilla-mobile/fenix/issues/15278 issue])
* Code added to the start-up path should be reviewed by the performance team:
** We're investigating other files that can be separated so we can be code owners for the start up parts ([https://github.com/mozilla-mobile/fenix/issues/15274 issue])
* Minimize component initialization with:
** Avoid unnecessary initialization ([https://github.com/mozilla-mobile/fenix/issues/15279 issue])
* Prevent unnecessarily expensive UI with:
** NestedConstraintLayout static analysis ([https://github.com/mozilla-mobile/fenix/issues/15280 issue])


=== Offense ===
* Configure your device and build. Use:
TODO: improve this section, if useful (let us know if it is)
** a low-end device<sup>1</sup> (a Samsung Galaxy A51 is preferred)
** a <code>debuggable=false</code> build such as Nightly<sup>2</sup>
** enable any compile-time options that are enabled in the production app (e.g. Sentry, Nimbus, Adjust, etc.)<sup>3</sup>
* Warm-up run:
**Start the app, especially after an installation<sup>4</sup>
**Wait at least 60 seconds<sup>5</sup>. To be extra safe, wait 2 minutes.
**Set the state of the app as you want to test it (e.g. clear onboarding)
**Force-stop the app (to make sure you're measuring at least the 2nd run after installation)
* Measure or test:
**Start the app and measure what you want to measure
**If you force-stop the app, wait a few seconds before starting the app to let the device settle
**If you're testing code that waits for gecko initialization (e.g. page loads) and need to force-stop the app before measuring, make sure to 1) load a page and 2) wait 15 seconds before force-stopping the app<sup>6</sup>


We're keeping a list of the biggest known performance improvements we can make. Also, we have a startup improvement plan.
Footnotes:
* 1: high-end devices may be fast enough to hide performance problems. For context, a Pixel 2 is still relatively high-end in our user base
* 2: `debuggable=true` builds (e.g. Debug builds) have performance characteristics that don't represent what users experience. See https://www.youtube.com/watch?v=ZffMCJdA5Qc&feature=youtu.be&t=625 for details
* 3: if these SDKs are disabled, you may miss performance issues introduced by them or their absence will change the timing of our operations, possibly hiding performance issues. Note: the performance team would prefer for all SDKs to be enabled by default so developers can error-free build an APK similar to production APKs
* 4: we've observed the first run after installation is always slower than subsequent runs for an unknown reason
* 5: on first run, we populate certain caches, e.g. we'll fetch Pocket data and start a Gecko cache. 60 seconds will address most of these
* 6: the ScriptPreloader will generate a new cache on each app start up. If you don't let the cache fill (i.e. by loading a page and waiting until it caches ([https://searchfox.org/mozilla-central/rev/fc4d4a8d01b0e50d20c238acbb1739ccab317ebc/js/xpconnect/loader/ScriptPreloader.cpp#769 source]), the cache will be empty and you won't page load it as most users experience it


=={{#if:Glossary|<span id="Glossary"></span>|<span class="error">Error in {{tl|anchor}}: no anchor name has been specified.</span>}}<!--
=={{#if:Glossary|<span id="Glossary"></span>|<span class="error">Error in {{tl|anchor}}: no anchor name has been specified.</span>}}<!--
Confirmed users
975

edits