Performance/Fenix: Difference between revisions

(→‎Defense: add WIP header)
 
(37 intermediate revisions by the same user not shown)
Line 1: Line 1:
Performance can be thought in terms of "Offense" – the changes that you make to actively improve performance – and "Defense" – the systems you have in place to prevent performance regression's (this offense/defense idea from [https://medium.com/@ricomariani/dos-and-don-ts-for-performance-teams-7f52c41b5355?source=rss-e6e91dab0708------2 this blog post]).
This page describes some basics of Fenix performance. For an in-depth look at some specific topics, see:
* [[Performance/Fenix/Best Practices|Best Practices]] for tips to write performant code
* [[Performance/Fenix/Profilers and Tools|Profilers and Tools]] for comparison of profiling and benchmarking tools
* [[Performance/Fenix/Performance reviews|Performance reviews]] for how to know if your change impacts performance


== App start up ==
== Performance testing ==
=== Defense ===
Performance tests can have a goal of preventing regressions, measuring absolute performance as experienced by users, or measuring performance vs. a baseline (e.g. comparing fenix to fennec). It can be difficult to write tests that manage all of these. We tend to focus on preventing regressions.
The FE perf team has the following measures in place to prevent regressions:
* Show long term start up trends with Nightly performance tests (note: these are not granular enough to practically identify & fix regressions)
* Prevent main thread IO with:
** <code>StrictMode</code> crashing in debug builds on disk or network IO on the main thread
* Code added to the start up path should be reviewed by the performance team:
** We are Code Owners for a few files


==== WIP ====
=== Dashboards ===
We're working on adding:
We have a series of dashboards that we review during our biweekly performance sync meeting. The dashboards may be unstable and '''may be difficult to interpret so be cautious when drawing conclusions from the results.''' A shortcoming is that we only run these tests on Nightly builds. Here are the current dashboards:
* Regression testing per master merge ([https://github.com/mozilla-mobile/perf-frontend-issues/issues/162 issue])
* [https://earthangel-b40313e5.influxcloud.net/d/DfK1IhzGz/fenix-startup-testing-per-device?orgId=1&refresh=1d Start up duration]: this is represents COLD MAIN (app icon launch) to <code>reportFullyDrawn</code>/visual completeness and COLD VIEW (app link) to page load complete (i.e. this includes a variable-duration network call) across a range of devices. We're trying to replace this with more stable tests
* Prevent main thread IO with:
* [https://earthangel-b40313e5.influxcloud.net/d/uYAfY3eGk/fenix-page-load-speedindex-geomean?orgId=1 Page load duration]: we're iterating on the presentation to make this more useful. More complex visualizations are available [https://earthangel-b40313e5.influxcloud.net/d/uYAfY3eGk/fenix-page-load-speedindex-geomean?orgId=1&search=open&folder=current in the grafana folder], such as [https://earthangel-b40313e5.influxcloud.net/d/ZmV33sqMk/fenix-page-load-pixel-2-vizrange-combined?orgId=1 the tests for Pixel 2]
** Tests to prevent suppression of StrictMode without discussion ([https://github.com/mozilla-mobile/fenix/issues/13959 issue])
* App size: via Google Play
** Static analysis to prevent <code>runBlocking</code>, which can circumvent <code>StrictMode</code> ([https://github.com/mozilla-mobile/fenix/issues/15278 issue])
* Code added to the start-up path should be reviewed by the performance team:
** We're investigating other files that can be separated so we can be code owners for the start up parts ([https://github.com/mozilla-mobile/fenix/issues/15274 issue])
* Minimize component initialization with:
** Avoid unnecessary initialization ([https://github.com/mozilla-mobile/fenix/issues/15279 issue])
* Prevent unnecessarily expensive UI with:
** NestedConstraintLayout static analysis ([https://github.com/mozilla-mobile/fenix/issues/15280 issue])


=== Offense ===
=== Unmonitored tests running in fenix ===
TODO: improve this section, if useful (let us know if it is)
In addition to the tests we actively look at above, there are other tests that run in mozilla-central on fenix or GeckoView example. '''We're not sure who looks at these.''' The perftest team is working to dynamically generate the list of tests that run. Some progress can be seen in [https://sql.telemetry.mozilla.org/queries/77734/source this query] and [https://treeherder.mozilla.org/perfherder/tests this treeherder page]. Until then, we manually list the tests below.


We're keeping a list of the biggest known performance improvements we can make. Also, we have a startup improvement plan.
As of Feb. 23, 2021, we run at least the following performance tests on fenix:
* Additional page load duration tests: see the query above for a list of sites (sometimes run in automation, sometimes run manually; todo: details)
* media playback tests (TODO: details; in the query above, they are prefixed with ytp)
* Start up duration via mach perftest
* Speedometer: JS responsiveness tests (todo: details)
* tier 3 unity webGL tests (todo: details)
 
There are other tests that run on desktop that will cover other parts of the platform.
 
== Preventing regressions automatically ==
We use the following measures:
* Crash on main thread IO in debug builds using <code>StrictMode</code> ([https://github.com/mozilla-mobile/fenix/blob/13f33049122e0f06c026632812dee405360c53b0/app/src/main/java/org/mozilla/fenix/StrictModeManager.kt#L63-L69 code])
* Use [https://searchfox.org/mozilla-mobile/rev/3af703be7790ff00f78d15465f3b8bb0fde0dccc/fenix/app/src/androidTest/java/org/mozilla/fenix/perf/StartupExcessiveResourceUseTest.kt#103 our StartupExcessiveResourceUseTest], for which we are Code Owners, to:
** Avoid StrictMode suppressions
** Avoid <code>runBlocking</code> calls
** Avoid additional component initialization
** Avoid increasing the view hierarchy depth
** Avoid having ConstraintLayout as a RecyclerView child
** Avoid increasing the number of inflations
* Use lint to avoid multiple ConstraintLayouts in the same file ([https://searchfox.org/mozilla-mobile/rev/3af703be7790ff00f78d15465f3b8bb0fde0dccc/fenix/mozilla-lint-rules/src/main/java/org/mozilla/fenix/lintrules/perf/ConstraintLayoutPerfDetector.kt code])
 
== How to measure what users experience ==
When analyzing performance, it's critical to measure the app as users experience it. This section describes how to do that and avoid pitfalls. Note: our automated measurement tools, such as the [https://github.com/mozilla-mobile/perf-tools/blob/main/measure_start_up.py <code>measure_start_up.py</code> script], will always use our most up-to-date techniques while this page may get outdated. Prefer to use automated systems if practical and read the source if you have questions!
 
When measuring performance manually, you might follow a pattern like the following (see the footnotes for explanations):
 
* Configure your device and build. Use:
** a low-end device<sup>1</sup> (a Samsung Galaxy A51 is preferred)
** a <code>debuggable=false</code> build such as Nightly<sup>2</sup>
** enable any compile-time options that are enabled in the production app (e.g. Sentry, Nimbus, Adjust, etc.)<sup>3</sup>
* Warm-up run:
**Start the app, especially after an installation<sup>4</sup>
**Wait at least 60 seconds<sup>5</sup>. To be extra safe, wait 2 minutes.
**Set the state of the app as you want to test it (e.g. clear onboarding)
**Force-stop the app (to make sure you're measuring at least the 2nd run after installation)
* Measure or test:
**Start the app and measure what you want to measure
**If you force-stop the app, wait a few seconds before starting the app to let the device settle
**If you're testing code that waits for gecko initialization (e.g. page loads) and need to force-stop the app before measuring, make sure to 1) load a page and 2) wait 15 seconds before force-stopping the app<sup>6</sup>
 
Footnotes:
* 1: high-end devices may be fast enough to hide performance problems. For context, a Pixel 2 is still relatively high-end in our user base
* 2: `debuggable=true` builds (e.g. Debug builds) have performance characteristics that don't represent what users experience. See https://www.youtube.com/watch?v=ZffMCJdA5Qc&feature=youtu.be&t=625 for details
* 3: if these SDKs are disabled, you may miss performance issues introduced by them or their absence will change the timing of our operations, possibly hiding performance issues. Note: the performance team would prefer for all SDKs to be enabled by default so developers can error-free build an APK similar to production APKs
* 4: we've observed the first run after installation is always slower than subsequent runs for an unknown reason
* 5: on first run, we populate certain caches, e.g. we'll fetch Pocket data and start a Gecko cache. 60 seconds will address most of these
* 6: the ScriptPreloader will generate a new cache on each app start up. If you don't let the cache fill (i.e. by loading a page and waiting until it caches ([https://searchfox.org/mozilla-central/rev/fc4d4a8d01b0e50d20c238acbb1739ccab317ebc/js/xpconnect/loader/ScriptPreloader.cpp#769 source]), the cache will be empty and you won't page load it as most users experience it
 
=={{#if:Glossary|<span id="Glossary"></span>|<span class="error">Error in {{tl|anchor}}: no anchor name has been specified.</span>}}<!--
-->{{#if:|<span id=""></span>}}<!--
-->{{#if:|<span id=""></span>}}<!--
-->{{#if:|<span id=""></span>}}<!--
-->{{#if:|<span id=""></span>}}<!--
-->{{#if:|<span id=""></span>}}<!--
-->{{#if:|<span id=""></span>}}<!--
-->{{#if:|<span id=""></span>}}<!--
-->{{#if:|<span id=""></span>}}<!--
-->{{#if:|<span id=""></span>}}<!--
-->{{#if:|<span class="error">Error in {{tl|anchor}}: too many anchors, maximum is 10.</span>}}Glossary ==
=== Start up "type" ===
This is an aggregation of all of the variables that make up a start up, described more fully below. Currently, these variables are:
* state
* path
 
For example, a type of start up could be described as <code>cold_main</code>.
 
=== Start up "state": COLD/WARM/HOT ===
"State" refers to how cached the application is, which will impact how quickly it starts up.
 
[https://developer.android.com/topic/performance/vitals/launch-time#internals Google Play provides a set of definitions] and our definitions are similar to, but not identical, to them:
* COLD = starting up "from scratch": the process and HomeActivity need to be created
* WARM = the process is already created but HomeActivity needs to be created (or recreated)
* HOT = basically just foregrounding the app: the process and HomeActivity are already created
 
=== Start up "path": MAIN/VIEW ===
"Path" refers to the code path taken for this start up. We name these after the <code>action</code> inside the <code>Intent</code>s received by the app [https://developer.android.com/reference/android/content/Intent#ACTION_MAIN such as <code>ACTION_MAIN</code>] that tell the app what to do:
* MAIN = a start up where the app icon was clicked. If there are no existing tabs, the homescreen will be shown. If there are existing tabs, the last selected one will be restored
* VIEW = a start up where a link was clicked. In the default case, a new tab will be opened and the URL will be loaded
 
Caveat: if an <code>Intent</code> is invalid, we may end up on a different screen (and thus taking a different code path) than the one specified by the <code>Intent</code>. For example, an invalid VIEW <code>Intent</code> may instead be treated as a MAIN <code>Intent</code>.
Confirmed users
975

edits